THIS BOOK GREW OUT OFMY EXPERIENCE OF WORKING WITH DATA FOR VARIOUS COMPANIES IN THE TECH
industry. It is a collection of those concepts and techniques that I have found to be the
most useful, including many topics that I wish I had known earlier—but didn’t.
My degree is in physics, but I also worked as a software engineer for several years. The
book reflects this dual heritage. On the one hand, it is written for programmers and others
in the software field: I assume that you, like me, have the ability to write your own
programs to manipulate data in any way you want.
On the other hand, the way I think about data has been shaped by my background and
education. As a physicist, I am not content merely to describe data or to make black-box
predictions: the purpose of an analysis is always to develop an understanding for the
processes or mechanisms that give rise to the data that we observe.
The instrument to express such understanding is the model: a description of the system
under study (in other words, not just a description of the data!), simplified as necessary
but nevertheless capturing the relevant information. A model may be crude (“Assume a
spherical cow . . . ”), but if it helps us develop better insight on how the system works, it is
a successful model nevertheless. (Additional precision can often be obtained at a later
time, if it is really necessary.)
This emphasis on models and simplified descriptions is not universal: other authors and
practitioners will make different choices. But it is essential to my approach and point of
view.
This is a rather personal book. Although I have tried to be reasonably comprehensive, I
have selected the topics that I consider relevant and useful in practice—whether they are
part of the “canon” or not. Also included are several topics that you won’t find in any
other book on data analysis. Although neither new nor original, they are usually not used
or discussed in this particular context—but I find them indispensable.
Throughout the book, I freely offer specific, explicit advice, opinions, and assessments.
These remarks are reflections of my personal interest, experience, and understanding. I do
not claim that my point of view is necessarily correct: evaluate what I say for yourself and
feel free to adapt it to your needs. In my view, a specific, well-argued position is of greater
use than a sterile laundry list of possible algorithms—even if you later decide to disagree
with me. The value is not in the opinion but rather in the arguments leading up to it. If
your arguments are better than mine, or even just more agreeable to you, then I will have
achieved my purpose!