The aim of this book is to provide an up-to-date review of different approaches to classification,
compare their performance on a wide range of challenging data-sets, and draw
conclusions on their applicability to realistic industrial problems.
Before describing the contents, we first need to define what we mean by classification,
give some background to the different perspectives on the task, and introduce the European
Community StatLog project whose results form the basis for this book.
The task of classification occurs in a wide range of human activity. At its broadest, the
term could cover any context in which some decision or forecast is made on the basis of
currently available information, and a classification procedure is then some formal method
for repeatedly making such judgments in new situations. In this book we shall consider a
more restricted interpretation. We shall assume that the problem concerns the construction
of a procedure that will be applied to a continuing sequence of cases, in which each newcase
must be assigned to one of a set of pre-defined classes on the basis of observed attributes
or features. The construction of a classification procedure from a set of data for which the
true classes are known has also been variously termed pattern recognition, discrimination,
or supervised learning (in order to distinguish it from unsupervised learning or clustering
in which the classes are inferred from the data).