| Data Mining Using SAS Enterprise Miner introduces the readers to data mining using SAS Enterprise Miner v4. This book will reveal the power and ease of use of the powerful new module in SAS that will introduce the readers to the various configuration settings and subsequent results that are generated from the various nodes in Enterprise Miner that are designed to perform data mining analysis. This book consists of step-by-step instructions along with an assortment of illustrations for the reader to get acquainted with the various nodes and the corresponding working environment in SAS Enterprise Miner. The book provides an in-depth guide in the field of data mining that is clear enough for novice statisticians or the die-hard expert.
The process of extracting information from large data :jets is known as data mining. The objective in data mining is making discoveries from the data. That is, discovering unknown patterns and relationships by summarizing or compressing the data in a concise and efficient way that is both understandable and useful to the subsequent analysis. Extracting information from the original data that will result in an accurate representation of the population of interest, summarizing the data in order to make statistical inferences or statements about the population from which the data was drawn and observing patterns that seem most interesting, which might lead you to discover abnormal departures from the general distribution or trend in the data; for example, discovering patterns between two separate variables with an usually strong linear relationship, a combination of variables that have an extremely high correlation on a certain variable, or grouping the data to identify certain characteristics in the variables between each group, and so on. In predictive modeling, it is important to identify variables to determine certain distributional relationships in the data set in order to generate future observations or even discover unusual patterns and identifying unusual observations in the data that are well beyond the general trend of the rest of the other data points.
The basic difference between data mining and the more traditional statistical applications is the difference in size of the data set. In traditional statistical designs, a hundred observations might constitute an extremely large data set. Conversely, the size of the data mining data set in the analysis might consist of several million or even billions of records. The basic strategy that is usually applied in reducing the size of the file in data mining is iampling the data set into a smaller, more manageable subset that is an accurate representation of the Ipopulation of interest. The other strategy is summarizing the variables in the data by their corresponding mean, inedian or sum-of-squares. Also, reducing the number of variables in the data set is extremely important in the various modeling designs. This is especially true in nonlinear modeling where an iterative grid search twocedure must be performed in finding the smallest error from the multidimensional error surface. |