Document Clustering, classification and Data Mining

It is important to note that while databases such as Lexis/Nexis and Dialog have begun to incorporate CLIR into their systems, the results are still quite limited (Oard, 1997). I have found that for search engines like Google, Yahoo, and Altavista, a user can perform an advanced search in a specific language or in several languages, but the results often return in English. When the results come back in other languages like Arabic, the pages are only readable by a non-Arabic speaker if the web page already contains other languages that the user can read. It is a signal of progress that there is movement toward multilingual search capabilities in these popular products, but it is clear that much improvement must be made for CLIR tools to reach the same levels of performance as regular IR systems.

Conclusion

The need for IR systems capable of handling cross-language issues is increasing as the world becomes more connected by technology. I have attempted here to give a general overview of the rapidly expanding work in the field of cross-language information retrieval by exploring its purpose, difficulties, basic tools, major works and future research goals. In reviewing this information, it becomes possible to gain a larger picture of the CLIR field and where it will go from here.

Introduction to Data Mining

Data mining is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses.

Definition of Data Mining

The automated analysis of large or complex data sets in order to discover significant patterns or trends that would otherwise go unrecognised.

The key elements that make data mining tools a distinct form of software are:

Automated analysis

Data mining automates the process of sifting through historical data in order to discover new information. This is one of the main differences between data mining and statistics, where a model is usually devised by a statistician to deal with a specific analysis problem. It also distinguishes data mining from expert systems, where the model is built by a knowledge engineer from rules extracted from the experience of an expert.

The emphasis on automated discovery also separates data mining from OLAP and simpler query and reporting tools, which are used to verify hypotheses formulated by the user. Data mining does not rely on a user to define a specific query, merely to formulate a goal - such as the identification of fraudulent claims.

Large or complex data sets

One of the attractions of data mining is that it makes it possible to analyse very large data sets in a reasonable time scale. Data mining is also suitable for complex problems involving relatively small amounts of data but where there are many fields or variables to analyse

next previous