In this textbook, I take an unconventional approach to data analysis. Its contents
are heavily influenced by the idea that data analysis should help in enhancing and
augmenting knowledge of the domain as represented by the concepts and statements
of relation between them. According to this view, two main pathways for
data analysis are summarization, for developing and augmenting concepts, and correlation,
for enhancing and establishing relations. Visualization, in this context, is a
way of presenting results in a cognitively comfortable way. The term summarization
is understood quite broadly here to embrace not only simple summaries like totals
and means, but also more complex summaries such as the principal components of
a set of features or cluster structures in a set of entities.
The material presented in this perspective makes a unique mix of subjects from
the fields of statistical data analysis, data mining, and computational intelligence,
which follow different systems of presentation.
Another feature of the text is that its main thrust is to give an in-depth understanding
of a few basic techniques rather than to cover a broad spectrum of approaches
developed so far. Most of the described methods fall under the same least-squares
paradigm for mapping an “idealized” structure to the data. This allows me to bring
forward a number of relations between methods that are usually overlooked. Just
one example: a relation between the choice of a scoring function for classification
trees and normalization options for dummies representing the target categories.
Although the in-depth study approach involves a great deal of technical details,
these are encapsulated in specific fragments of the text termed “formulation” parts.
The main, “presentation”, part is written in a very different style. The presentation
involves no mathematical formulas and explains a method by actually applying
it to a small real-world dataset – this part can be read and studied with no concern
for the formulation at all. There is one more part, “computation”, targeted at a
computer-oriented reader. This part describes the computational implementation of
the methods, illustrated using the MatLab computing environment. I have arrived at
this three-way narrative style as a result of my experiences in teaching data analysis
and computational intelligence to students in Computer Science. Some students
might be mainly interested in just one of the parts, whereas others might try to get
to grips with two or even all three of them.