Main

Search, Extraction, and Organization Archives

Kanada, Y., 3rd ACM Conference on Digital Libraries, pp. 108-117, 1998, (C) Copyright 1998 by ACM.

[ 日本語のページ ]
[ Paper PDF file (ACM DL)] [ Paper PDF file ] [ Paper PostScript file ]

Abstract: A text search method, which is called an axis-specified search method, is proposed. This method is suitable for full-text searches of a large-scale text collection. In this method, in addition to specifying search strings, the user selects an axis from a predefined set. The system outputs excerpts and hyper-links that are ordered along the axis. The search strings express the specific subject of the search, and the axis specifies a general-purpose method of ordering results. Short sub-topics, which cannot be easily caught by statistical methods, are effectively gathered from the text collection. The user can get satisfactory results using a simple search string. Even if the number of results is very large, the user can easily survey them, because they are well structured. This method has been applied to an electronic encyclopedia and a newspaper database. In these applications, distributed descriptions that were related to each other could be gathered, and the user could discover their relationships from the results. For example, by specifying "semiconductor" for a search string and "year" for an axis, a table listing seven decades of semiconductor-related topics sorted by year was generated from newspaper issues published over a single year. By specifying "basin" for a search string and "area" (m2) for an axis, descriptions of the world's largest rivers were extracted from the encyclopedia and sorted according to their basin areas.

Introduction to this research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Axis-specified search, Year-axis search, Time-axis search, Area-axis search, Quantity-axis search, Encyclopedia search, Information extraction, Information organization, Search result organization, Organizing search, Search result sorting, Search result structurization, Structurizing search

Kanada, Y., IPSJ SIGFI Technical Report 98-FI-50-4, pp. 25-32, 1998, Published by IPSJ (in Japanese).

[ 日本語のページ ]
[ Paper PDF file (in Japanese) ] [ Paper PostScript file (in Japanese) ]

Abstract: A full-text search method, which is called an axis-specified search method, is proposed. Excerpts are extracted from documents and ordered by using this method. The user selects an axis, such as year, area or quantity, from a menu, in addition to typing strings to be searched. Then, excerpts related to the axis and strings, and hyperlinks to the original sentences are ordered along the axis and displayed. Even if the number of results is very large, the user can easily survey them, because they are well structured. This method has been applied to an encyclopedia and a newspaper articles. In these applications, distributed descriptions that were related to each other could be gathered, and the user could discover their relationships from the results. For example, by specifying "basin" for a search string and "area" (m2) for an axis, descriptions of the world's largest rivers were extracted from the encyclopedia and sorted according to their basin areas.

Introduction to this research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Axis-specified search, Year-axis search, Time-axis search, Area-axis search, Quantity-axis search, Encyclopedia search, Newspaper article search, Newspaper search, Quantity information extraction, Information organization, Search result organization, Organizing search, Search result sorting, Search result structurization, Structurizing search

Kanada, Y., Hirano, Y., Sawada, M., Yamazaki, M., and Fujii, Y., 58th National Conference of the Information Processing Society of Japan, 1J-3, 1999, Published by IPSJ (in Japanese).

[ 日本語のページ ]
[ Paper PDF file (in Japanese) ] [ Paper PostScript file (in Japanese) ]

No abstract available.

Introduction to this research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Axis-specified search, Thematic geographical search, Area-axis search, Encyclopedia search, Newspaper article search, Newspaper search, Geographical information extraction, Geographical name extraction, Information organization, Search result organization, Organizing search, Search result structurization, Structurizing search

Kanada, Y., IPSJ SIGFI Technical Report, 99-FI-?, 1999, Published by IPSJ (in Japanese).

[ 日本語のページ ]
[ Paper PDF file (in Japanese) ] [ Paper PostScript file (in Japanese) ]

Abstract: A method of textual information retrieval, which is called the thematic chronological-table search method, has been developed. In this method, an index is generated by extracting and collecting year references from a text collection, the index and a statement-by-statement full-text index are used for searching for year references and search words when the user inputs the words, and the result items are sorted by year and displayed. The result item contains a year reference, a sentence that contains the year, and a hyperlink to the original text. In this paper, the method of information extraction in the thematic chronological-table searching is explained. This method has been applied to a Japanese encyclopedia. An evaluation shows the precision of extraction is higher than 99% in most cases. An efficient and less error-prone data representation for year expression that may contain several units such as century, year, month, day, and so on, are also explained.

Introduction to this research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Axis-specified search, Thematic chronological search, Year-axis search, Time-axis search, Information extraction, Information organization, Encyclopedia search, Search result organization, Organizing search, Search result structurization, Structurizing search

Kanada, Y., Yamazaki, M., Sawada, M., Hirano, Y., and Fujii, Y., 59th National Conference,3P-9, 1999, Information Processing Society of Japan (published by IPSJ)

[ 日本語のページ ]
[ Paper PDF file (in Japanese) ]

Abstract: In the member's only network called "Net-de-hyakka", a service called the thematic mapping search, in which results of encyclopedia text search is ordered along a geographical axis, is offered. In this search, the statements are searched and sorted by geographical names that occur in the text. A map of one of the geographical names can also be opened. The function and implementation method of this search are summarized here.

Introduction to this research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Axis-specified search, Thematic geographical search, Geographical-axis search, Encyclopedia search, Net-de-hyakka, Thematic Mapping Search

Kanada, Y., IPSJ SIGNL Technical Report, 99-NL-132-2, 1999, Published by IPSJ (in Japanese).

[ 日本語のページ ]
[ Paper PDF file (in Japanese) ] [ Paper PostScript file (in Japanese) ]

Abstract: A text retrieval method called the thematic mapping search method has been developed for Japanese texts. In this method, the user specifies a search theme using free words, then obtains a sorted list of excerpts and hyperlinks to sentences that contain geographical names. Using this list, the user can open maps that indicate the location of the names. To generate an index of names for this searching, a method of geographical name extraction has been developed. In this method, geographical names are extracted, matched to names in a geographical name database, and identified. Geographical names, however, often have several types of ambiguities. Ambiguities are resolved using context analysis and several other techniques. As a result, the precision of extracted names is more than 96% on average when applied to the World Encyclopedia. The rules for information extraction depends on features of the Japanese language, but the strategy and most of the techniques can be applied to texts in English or other languages.

Introduction to this research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Axis-specified search, Area-axis search, Thematic mapping search, Thematic geographical name search, Geographical information extraction, Geographical name extraction, Encyclopedia search

Kanada, Y., International Symposium on Digital Library 1999, pp. 135-142, 1999

[ 日本語のページ ]
[ Paper PDF file ] [ Paper PostScript file ]

Abstract: A method of extracting year references for a textual information retrieval method called the thematic chronological-table search method is explained in this paper. This search method generates an index by extracting and collecting year references from a text collection. The resulting index and a full-text index are used for searching statements that contain year references and search words. The results are displayed in the form of a chronological table with hyperlinks to the original text. Seven forms of year or century references are extracted and normalized using string matching patterns. The extraction error rate is reduced by using both local and nonlocal contexts. If the lower two digits of a Gregorian year, which matches a form, occurs, it is normalized by supplementing the upper digits using the non-local context. This method has been applied to a Japanese encyclopedia. An evaluation shows the precision of extraction to be higher than 99% in most cases.

Introduction to this research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Axis-specified search, Thematic chronological search, Year-axis search, Time-axis search, Encyclopedia search, Chronological information extraction, Information organization, Search result organization, Organizing search, Search result structurization, Structurizing search

Kanada, Y., 18th International Conference on Information and Knowledge Management (CIKM'99), pp. 46-54, November 1999

[ 日本語のページ ]
[ Paper PDF file (ACM DL)] [ Paper PDF file ]

Abstract: A text retrieval method called the thematic geographical search method has been developed and applied to a Japanese encyclopedia called the World Encyclopedia. In this method, the user specifies a search theme using free words, then obtains a sorted list of excerpts and references to encyclopedia sentences that contain geographical names. Using this list, the user can open maps that indicate the location of the names. To generate an index of names for this searching, a method of geographical name extraction has been developed. In this method, geographical names are extracted, matched to names in a geographical name database, and identified. Geographical names, however, often have several types of ambiguities. Ambiguities are resolved using context analysis and several other techniques. As a result, the precision of extracted names is more than 96% on average. This method depends on features of the Japanese language, but the strategy and most of the techniques can be applied to texts in English or other languages.

Introduction to this research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Axis-specified search, Encyclopedia search, Thematic mapping search, Thematic geographical name search, Area-axis search, Geographical information extraction, Geographical name extraction, Information organization, Search result organization, Organizing search, Search result structurization, Structurizing search

Yasusi Kanada, Not yet published, 1998.

[ Paper PDF file ]
[ Paper PostScript file ]

Abstract: Most conventional text retrieval methods are designed to search for documents. However, users often do not require documents themselves, but are searching for spe-cific information that may come from a large collection of texts quickly. To satisfy this need, we have developed a model and two methods for fine-grained searching. The unit of search in this model is called an atom, and it can be a sentence or smaller syntactic unit. A score, i.e., a relevance value, is defined for each atom and for each query, and the score is propagated between atoms. By using the two methods, excerpts from texts surrounding the search-result items and/or hyperlinks to the document parts that include the items are displayed. Multiple topics in a document can be separately listed in a search result. Evaluation of two prototypes, using a conventional full-text search engine as is or with only a small modification, has demonstrated that these methods are feasible and can decrease the search cost in terms of time and effort for users.

Introduction to the research theme: Axis-Specified Search (Thematic Search)

Keywords: Text search, Fine-grained search, Passage retrieval