Data mining and knowledge discovery (DMKD) is a rapidly expanding field in computer science. It has become very important because of an increased demand for methodologies and tools that can help the analysis and understanding of huge amounts of data generated on a daily basis by institutions like hospitals, research laboratories, banks, insurance companies, and retail stores and by Internet users. This explosion is a result of the growing use of electronic media. But what is data mining (DM)? A Web search using the Google search engine retrieves many (really many) definitions of data mining. We include here a few interesting ones.
One of the simpler definitions is: “As the term suggests, data mining is the analysis of data to establish relationships and identify patterns”. It focuses on identifying relations in data. Our next example is more elaborate: An information extraction activity whose goal is to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.
This one suggests that data mining tries to find useful “information” from data that can help predict the future. These definitions do not explicitly emphasize a large volume of data, an issue in the next definition: “The process of analyzing large amounts of data in order to extract new kinds of useful information (such as implicit relationships between different pieces of information)”.