Document Clustering, classification and Data Mining


Oard and Dorr (1996) give several motivations for research into cross-language information retrieval:

In another article Oard (1997) points out that cross-language information retrieval would also be very helpful for those who read and write only one language, but need information that may not be available in that language. 

       For all of these reasons, cross-language information retrieval is an important and rapidly growing area of IR, and as such, it merits exploration.  To this purpose, I provide a general overview of cross-language information retrieval, including a definition, problems involved in creating cross-language systems, basic IR approaches used, major work and projects undertaken, and possible directions for future research.  This work is not meant to be comprehensive, as the field has expanded exponentially in the last decade.  Rather, it is an attempt to introduce the major components of cross-language information retrieval and to summarize the actions that have been taken so far in this area.

 Definitions of Cross-Language Information Retrieval

       Before delving into the unique problems that multiple languages pose to the world of IR and the basic techniques used to make multiple-language systems functional, it is appropriate to present a clear definition and a word about the terminology used in the body of research about the subject.  Much of the literature uses the more common term multilingual information retrieval (MLIR) to represent all that occurs in IR having to do with other languages.  Hull and Grefenstette (1996, p.484) give five definitions of MLIR:
 

 Another definition comes from Oard (1997, p.1), which says that MLIR is, selection of useful documents from collections that may contain several languages. This broad definition seems to encompass the last three definitions given by Hull and Grefenstette, and it is on Oard's definition that this paper is based, since the focus is on IR across languages and not on IR in monolingual settings, whatever the language.  In using Oard's definition, I also commit to using the term cross-language information retrieval (CLIR), as he does,  because it speaks specifically to IR work across languages, while multilingual information retrieval covers more concepts, including the first two of Hull and Grefenstette's definitions.  Therefore, while many of the works and projects in the area of CLIR use the terms multilingual information retrieval or even translingual retrieval, in order to specify this overview's focus only cross-language information retrieval and its abbreviation will be used. 

next    previous


©2005 Jatit