Document Clustering, classification and Data Mining


In the past 20 years, the area of information retrieval has grown well beyond its primary goals of indexing text and searching for useful documents in a collection. With the advent of WWW, IR world is greatly enlarged and IR becomes a hot research area in IS research. The brief literature review focuses on the following questions:

 What is information retrieval?

Simply speaking, IR is to help users get their useful information from digital resources including digital libraries, WWW and documents.

 What are the sub disciplines of IR research?

In traditional or non-WWW world, IR was primarily to index the text (terms) and search for useful documents in a collection. Nowadays, research in IR includes modeling, document classification and categorization, search engines, user interfaces, data visualization, information filtering, natural language processing or query language, systems architecture, etc. From the angle of digital resources, the IR research includes text mining, multimedia (audio, image and video) retrieval, and digital libraries (hybrid of all digital resources).

 What are the schools of IR research?

 Computer Science School, which focuses on algorithms, data structure and techniques.
Information Science School, which adopts a human-centered interpretation of the IR problem and focuses on how people interpret and use information.
Others: Economics school which evaluate the economic value of IR products such as search engines, caches; Psychology school investigates human factor on the user interface, closely related with information science school.

Cross Language Information Retrieval

 Overview :-

       The rapid spread of communications technologies, such as the World Wide Web, and improvements in general information retrieval (IR) techniques have allowed people worldwide to access previously unavailable information.  With these advances, however, it has become increasingly clear that there is a growing need for access to information in many languages.  Until recently, monolingual IR has been the main research focus of scientists all over the world, and much of what is available has been in English, although English is the native language for only 6% of the world�s population (Haddouti, 1999).  This has left a need for non-English speakers to access information in their own language and a global desire to obtain access to information in multiple languages.

next


©2005 Jatit