1. Home
  2. Research Topics
  3. Intangible Cultural Assets

Intangible Cultural Assets

Recording/Analysis and Information Processing Closing in on the essence of human communication Analyzing differences in "timing" and "gesture"

Takashi Matsuyama

Takashi Matsuyama
Professor,
Graduate School of Informatics,
Kyoto University

  • More Details

How do recognized masters differ from lay people in the practice of buyo (classical Japanese dance), manzai (comic dialogue), rakugo (comic storytelling) and other arts? It is often said there are subtle differences in timing and gestures. Such differences are very difficult to explain in words, and they are thought to be perceived by the ’sixth sense.’ Therefore, presenting the differences in data form was felt to have been impossible. Professor Takashi Matsuyama of the Kyoto University Graduate School of Informatics says: “If information processing could be applied to a person’s sensibility (sense of time), the knowledge obtained could be useful for recording and analysis of dance, theatrical performances, speaking arts and other intangible cultural assets. In the future, it may lead also to better understanding of the essence of communication between people, and of cultural differences between countries and ethnic groups.”

Conversation as an interweave of talk and gesture

In manzai, the comic dialogue develops with the overlap of talk by the comic and the comic foil (straight man). Analysis of the timing of the overlap showed responses were faster when one person reacted affirmatively or was in agreement with the other, and delayed when there was negative reaction or disagreement. The degree of overlap varies among manzai duos. A duo might also exhibit variation between the start and the climax of the dialogue. In rakugo, the sole performer plays two roles by turning the face left or right. Here too the timing was faster for an affirmative response or agreement.


In both manzai and rakugo, when responding to the partner’s talk, affirmative replies and agreement tended to be delivered faster, whereas negative replies and disagreement were delayed.

Changes in facial expression were also analyzed. Expressions were categorized into multiple movement patterns of mouth, nose, eyebrows and eyes. (For example, mouth movements: the mouth is closed, remains closed, is opening or remains open.) The timing of these movements for artificial laughter and natural laughter is slightly different, enabling recognition of artificial laughter.

“People react similarly in daily conversation. For instance, when talking and facing the other person, replies are delayed when there is disagreement. With artificial laughter, there seems to be intuitive recognition of a mismatch of timing. In human conversation, talk and gestures are interwoven. At present, computers are not even capable of overlap in repetition of questions and answers. True communication with humans is thus not currently possible.”

The same piece of music may sound completely different when played by different musicians. This occurs because individual musicians interpret and play the music with an individual sense of timing (and tempo).


Timing of the movements of eyes, nose and mouth differ slightly for spontaneous laughter (right) and intentional laughter (left). This appears to explain human ability to intuitively distinguish between the two.

Using 3D video to see differences in gestures

Subtle differences in human gestures are being studied using 3D video. The subject is surrounded and filmed by multiple cameras. A three-dimensional image (voxel data) is created using groups of small cubes based on the image data. A ‘mesh’ of small triangles is laid over the surface and colored to give a sense of texture. Moving 3D images are created by repeating this process for successive still images.

“The 3D video instantaneously processes images taken from all angles and directions, and can freely move in for close-ups or out as required. It is also capable of isolating just part of the face, and discerning the direction in which the performer is looking.”


Production process of 3-D video.
Numerous cameras positioned around the photographic target record movements. Three-dimensional images are created from the video. A triangular mesh is drawn over the surface in technique used to raise accuracy, and colored to give a sense of texture.

In the case of a dancing Maiko (a young apprentice geisha), the system reveals how she is moving her hands, feet, eyes, and other features at any point in the dance. It can analyze gestures in detail from all angles. The movements of Living National Treasures and Olympic athletes could be recorded. If the performer’s purpose and the meaning of the movements are preserved at the same time, the system could be used for preservation of Intangible Cultural Assets. Furthermore, such information could also be applied in training, through comparison with ‘models’ from diverse angles.


Japanese dance performed by a Maiko, edited into 3-D video.

From “tangible” to “intangible” — cultural research enters a new era

“Research of tangible facets of human culture such as structures and Buddhist images has progressed. Research of dance, speaking arts and other intangible events now lies before us. Such Intangible Cultural Assets are understood not through reasoning, but with the sixth sense. Why? Perhaps because time as perceived by humans is not a physical time capable of continuous numerical conversion. Rather, it may a subjective perception of time in the order of non-continuous events. If these two perceptions of time could be integrated in theory, and a time sense shared with the computer, then true human-computer communication may become possible.”

(Akira Miki December 15, 2009)