Now updated—the systematic introductory guide to modern analysis of large data sets
As data sets continue to grow in size and complexity, there has been an inevitable move towards indirect, automatic, and intelligent data analysis in which the analyst works via more complex and sophisticated software tools. This book reviews stateoftheart methodologies and techniques for analyzing enormous quantities of raw data in highdimensional data spaces to extract new information for decisionmaking.
This Second Edition of Data Mining: Concepts, Models, Methods, and Algorithms discusses data mining principles and then describes representative stateoftheart methods and algorithms originating from different disciplines such as statistics, machine learning, neural networks, fuzzy logic, and evolutionary computation. Detailed algorithms are provided with necessary explanations and illustrative examples, and questions and exercises for practice at the end of each chapter. This new edition features the following new techniques/methodologies:

Support Vector Machines (SVM)—developed based on statistical learning theory, they have a large potential for applications in predictive data mining

Kohonen Maps (SelfOrganizing Maps  SOM)—one of very applicative neuralnetworksbased methodologies for descriptive data mining and multidimensional data visualizations

DBSCAN, BIRCH, and distributed DBSCAN clustering algorithms—representatives of an important class of densitybased clustering methodologies

Bayesian Networks (BN) methodology often used for causality modeling

Algorithms for measuring Betweeness and Centrality parameters in graphs, important for applications in mining large social networks

CART algorithm and Gini index in building decision trees

Bagging & Boosting approaches to ensemblelearning methodologies, with details of AdaBoost algorithm

Relief algorithm, one of the core feature selection algorithms inspired by instancebased learning

PageRank algorithm for mining and authority ranking of web pages

Latent Semantic Analysis (LSA) for text mining and measuring semantic similarities between textbased documents

New sections on temporal, spatial, web, text, parallel, and distributed data mining

More emphasis on business, privacy, security, and legal aspects of data mining technology
This text offers guidance on how and when to use a particular software tool (with the companion data sets) from among the hundreds offered when faced with a data set to mine. This allows analysts to create and perform their own data mining experiments using their knowledge of the methodologies and techniques provided. The book emphasizes the selection of appropriate methodologies and data analysis software, as well as parameter tuning. These critically important, qualitative decisions can only be made with the deeper understanding of parameter meaning and its role in the technique that is offered here.
This volume is primarily intended as a datamining textbook for computer science, computer engineering, and computer information systems majors at the graduate level. Senior students at the undergraduate level and with the appropriate background can also successfully comprehend all topics presented here.