1. Home
  2. Research Topics
  3. Document Research Tool SMART-GS and the HCP Project

Document Research Tool SMART-GS and the HCP Project

World Wide Web publication of old documents with SMART-GS

Susumu Hayashi

Susumu Hayashi
Professor,
Graduate School of Letters,
Kyoto University

  • More Details

Creating digital images of intellectual assets and old documents – the heritage of humankind
Linking all interests, from amateur historians to specialists

Reference to primary historical documents is vital in humanities research, and especially historical fields. Professor Susumu Hayashi of Kyoto University Graduate School of Letters is developing the SMART-GS for primary historical materials. SMART-GS is a system for digital imaging of materials, adding search and link functions, and even storing research memos. The system will be capable of linking everyone from amateur historians to specialists, and is said to have potential for development into a worldwide network for old documents.


With SMART-GS, a word search can be conducted using an image search function. In this illustration, all pages of Hilbert’s “Mathematics Notes” have been searched for the word [Gruppe] (a German word meaning “group”). Numerous candidates with similar word forms have been retrieved.

SMART-GS

The tool at the heart of the HCP concept. SMART-GS is already being used for multiple historical research projects, and for teaching in Kyoto University Graduate School of Letters / Faculty of Letters.

Digital imaging of historical materials has changed research methodologies

“Deciphering historical materials and circulating ideas is the groundwork of research. However, accessing the actual materials stored in various parts of the world has been physically and economically problematic. Over the last few years, though, the situation has suddenly changed, due to advances in digital archiving.” The world’s main university libraries, the National Diet Library and National Archives in Japan, and other institutions are opening many valuable historical materials to the public as digital images. Primary historical materials are now becoming easily accessible via personal computers.

However, many of the digital archives are in PDF or HTML format. Although search and memo functions are available, they are “slow and inconvenient,” and are thus regarded as not practical for research purposes. Professor Hayashi was originally an information engineer involved also in software development. He found the SMART system developed jointly with graduate students was useful when deciphering David Hilbert’s handwritten “Mathematics Notes” (Hilbert is regarded by some as the father of 20th-century mathematics.) Professor Hayashi modified the system for use as a document research tool. The outcome was SMART-GS, a new system resulting from joint research. SMART-GS uses an image search engine developed by the laboratory of Professor Yuzuru Tanaka at Hokkaido University. One of its principal features is using image search to “decipher unreadable materials.”


If a search result is deemed to match the query image, the user checks
□Yes; if it does not match then □No; if the result is to be set aside for now, □Suspend. Matching words (images) are stored in the bucket. Subsequent searches can then broaden the query target to find images with shapes similar to multiple stored images. By repeating this process several times, a document with over 100 pages can be thoroughly searched for matching words in just five to 10 minutes. The human operator determines whether a search result is correct. Due to the heightened level of accuracy, this can become a highly practical research tool.

Professor Hayashi, who specializes in information science and historical materials, is presently engaged in researching manuscripts by Hajime Tanabe, a philosopher of the Kyoto School of philosophy. Kitaro Nishida invited Tanabe to teach at Kyoto Imperial University (now Kyoto University), where he had an influence on many students, many of whom were later sent off to fight. Professor Hayashi is deciphering the handwritten lecture notes Tanabe put together in the process of forming his philosophy theory “Logic of Species,” and thereby tracing the development of Tanabe’s thinking. Tanabe’s handwriting was almost illegible, and is said to have baffled even his closest followers. However, many of the notes were photographed with a digital camera and fed as images into SMART-GS, which then deciphered them with surprising speed.

After the material has been converted to image form, an appropriate section of lines is ’selected and cut out’ using the mouse. Words containing illegible characters are marked on the monitor. A search of the entire document is then conducted to find words with similar characters. In document research, deciphering while looking for and referencing characters of similar form is a classic method known as “reproduction.” Done manually, this method usually required an enormous amount of time and labor. However, using the SMART-GS, a search for target words in more than 100 pages of handwritten material can be completed in about five minutes. The system also has dictionary functions for registering word hits along with their readings. Furthermore, the system can easily search the entire manuscript for keywords forming part of Tanabe’s line of thought.

Reproduction

Text in old documents, prints, etc., with characters in abbreviated or other non-standard form is modified into a generally readable standard writing style. The term is often used when changing handwritten text into typeface text.

A new humanities research tool, able to store everything from the original material to research notes

SMART-GS allows the user to set links to marked sections in the imaged document, to record reproductions and annotations in new windows, and to write research memos in separate windows. Moreover, in another simple function, keywords can be displayed in time series in a yet different window. Clicking on a keyword returns the user to the relevant section of the document. Professor Hayashi smiled when he said, “Until now, the original material being studied and the research memos had to be stored separately. Referencing was therefore time-consuming and troublesome. Now, everything can be done with one notebook computer. SMART-GS easily traces the path of one’s ideas, unifying everything to the point of writing a thesis.”


The SMART-GS link functions are outstanding. The upper-right window shows an image of a lecture note written by Hajime Tanabe in 1934. The lower-right window shows the reproduction. The symbol [ @ ] denotes unreadable characters; @@@ colored red indicates a link. This link leads to the bottom-center window with the text headed by an asterisk [*]. Here the user can write thoughts on the @@@ section of the reproduction. Furthermore, the dictionary function for handwritten characters can be used at any time. The left-center window shows that a dictionary search of the character framed in red in the upper-right window corresponds to the character [ 内 ]. The upper-left window displays the corresponding dictionary entry.

SMART-GS is displayed on and provided from the HCP (Humanities Cyber Platform) site on Professor Hayashi’s homepage. Professor Kazu Nagai (Kyoto University Graduate School of Letters), a specialist in contemporary history, believes this tool will bring dramatic changes to methods of history research. Researchers are using the system in study of “The Diary of Yuzaburo Kuratomi,” to decipher previously unreadable characters in the Ch’l-tan Script, and in research of the Tibetan-language Buddhist sutras. Little by little, the use of SMART-GS is spreading as a new tool for humanities research.

HCP

Humanities Cyber Platform is a phrase coined by Professor Hayashi. HCP is an information technology group aiding research of humanities, and especially research of documents. Instead of word processor documents, e-mail and other ‘encoded’ writings, HCP is used for handwritten materials and early printed matter. HCP has two core features: it works on the “imaging principle” with digital images of the material as the basic information; and it is designed for maximum practical benefit.

Yuzaburo Kuratomi

A government official who was active from the late 19th century to the middle of the 20th century. He also served as President of the Privy Council in the mid-1920s.

A tool enabling amateur historians and specialist researchers to debate on the same level

SMART-GS can be used for printed documents as well as handwritten materials. The documents need only be digitally imaged. Web archive documents have no editorial limitations. Therefore, with a browser able to mark, write on and link a document, anyone can add annotations and post the document on the Web as a personal study. Other viewers can then add comments. Old documents lying half-forgotten in homes, companies and public offices, rare early printed materials and other valuable but generally inaccessible materials can be digitally imaged and then made available to everyone.


The upper window shows an image of a passage handwritten by Hajime Tanabe in a printed book; the lower window shows the reproduction of the handwritten passage. As illustrated, SMART-GS can be used to mark out important sections of an image.

“Old documents are intellectual assets belonging to the public. Secreting them away in private collections is out of the question; public accessibility is the fundamental principle. I have a dream of building a World Wide Web of old documents. Old documents are posted on the Web, where anyone from amateur historians to experts has free access and is able to add comments.” Professor Hayashi speaks with pride as he looks ahead to realization of an open knowledge network.


The lower-right window shows material for a special lecture, typed by Professor Hayashi with SMART-GS. The historical material image is from Hajime Tanabe’s “Notebook (Diary).” As demonstrated here, Professor Hayashi uses SMART-GS to prepare daily lecture notes linked with the historical material being researched.


http://www.shayashi.jp/HCP/SMART-GS/FromHayashisResearch.avi (7.5MB)
SMART-GS includes a function. This allows the user to record thoughts and inferences regarding the historical material and to create a graphical bird’s-eye view of the entire research process. Professor Hayashi used the figure shown here in researching Hilbert’s diary. The black vertical line on the left refers to pages; the line on the right shows the timeline. Lines connect entries on certain pages with the year in which each was written. Clicking on any point brings up the corresponding original image of the historical material and any annotations.

(Kaori Nagano November 11, 2009)