Data Mining Using SAS Enterprise Miner (Wiley Series in Computational Statistics)

Data Mining Using SAS Enterprise Miner (Wiley Series in Computational Statistics), 9780470149010 (0470149019), John Wiley & Sons, 2007

Data Mining Using SAS Enterprise Miner introduces the readers to data mining using SAS Enterprise Miner v4. This book will reveal the power and ease of use of the powerful new module in SAS that will introduce the readers to the various configuration settings and subsequent results that are generated from the various nodes in Enterprise Miner that are designed to perform data mining analysis. This book consists of step-by-step instructions along with an assortment of illustrations for the reader to get acquainted with the various nodes and the corresponding working environment in SAS Enterprise Miner. The book provides an in-depth guide in the field of data mining that is clear enough for novice statisticians or the die-hard expert.

The process of extracting information from large data :jets is known as data mining. The objective in data mining is making discoveries from the data. That is, discovering unknown patterns and relationships by summarizing or compressing the data in a concise and efficient way that is both understandable and useful to the subsequent analysis. Extracting information from the original data that will result in an accurate representation of the population of interest, summarizing the data in order to make statistical inferences or statements about the population from which the data was drawn and observing patterns that seem most interesting, which might lead you to discover abnormal departures from the general distribution or trend in the data; for example, discovering patterns between two separate variables with an usually strong linear relationship, a combination of variables that have an extremely high correlation on a certain variable, or grouping the data to identify certain characteristics in the variables between each group, and so on. In predictive modeling, it is important to identify variables to determine certain distributional relationships in the data set in order to generate future observations or even discover unusual patterns and identifying unusual observations in the data that are well beyond the general trend of the rest of the other data points.

The basic difference between data mining and the more traditional statistical applications is the difference in size of the data set. In traditional statistical designs, a hundred observations might constitute an extremely large data set. Conversely, the size of the data mining data set in the analysis might consist of several million or even billions of records. The basic strategy that is usually applied in reducing the size of the file in data mining is iampling the data set into a smaller, more manageable subset that is an accurate representation of the Ipopulation of interest. The other strategy is summarizing the variables in the data by their corresponding mean, inedian or sum-of-squares. Also, reducing the number of variables in the data set is extremely important in the various modeling designs. This is especially true in nonlinear modeling where an iterative grid search twocedure must be performed in finding the smallest error from the multidimensional error surface.

Comments

Amazing Books

Combating Piracy: Intellectual Property Theft and Fraud

Transaction Publishers, 2006

Manifestations of fraud in the early twenty-first century are showing signs of innovations and adaptation in response to shifting opportunities. This book reports on new analyses of intellectual property theft as its most recent expression. Fraud and piracy of products and ideas have become common as the opportunities to commit them expand, and...

Adobe ColdFusion 9 Web Application Construction Kit, Volume 1: Getting Started

Adobe Press, 2010

Written by the best known and most trusted name in the ColdFusion community, Ben Forta, The ColdFusion Web Application Construction Kit is the best-selling ColdFusion series of all time - the books that most ColdFusion developers used to learn the product. This Getting Started volume starts with Web and Internet fundamentals and database...

Alfresco 3 Business Solutions

Packt Publishing, 2011

Alfresco is a renowned and multiple award-winning open source Enterprise Content Management System that allows you to build, design, and implement your very own ECM solutions. It offers much more advanced and cutting-edge features than its commercial counterparts with its modularity and scalability. If you are looking for quick and effective...

Image Processing in C: Analyzing and Enhancing Digital Images with 3.5 Disk

CMP Books, 1997

This book is a tutorial on image processing. Each chapter explains basic concepts with words and gures, shows image processing results with photographs, and implements the operations in C. Information herein comes from articles published in The C/C++ Users Journal from 1990 through 1998 and from the first edition of this book published in 1994....

Programming Python

CreateSpace Independent Publishing Platform, 1996

As Python's creator. I'd like to say a few words about its origins, adding a bit of personal philosophy.

Over six years ago. in December 1989; I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office (a government-run research lab in Amsterdam)...

Volume 7A: XView Programming Manual (Definitive Guides to the X Window System)

O'Reilly, 1994

The XView Programming Manual has been revised and expanded for XView Version 3.2. XView was developed by Sun Microsystems and is derived from Sun's proprietary programming toolkit, SunView. It is an easy-to-use object-oriented toolkit that provides an OPEN LOOK user interface for X applications.The major additions for...