A variety of problems m machine learning and digital communication deal with complex but structured natural or artificial systems. Natural patterns mat we wish to automatically classify' are a consequence of a hierarchical causal physical process. Learning about the world
m which we live requires mat we extract useful sensor)' features and abstract concepts and men form a model for how these interact. Universal data compression involves estimating the probability distribution of a data source, which is often produced by some natural
hierarchical process. Error-correcting codes used in telephone modems and deep-space communication consist of electrical signals that are linked together in a complex fashion determined by the designed code and the physical nature of the communication channel. Not
only are these tasks characterized by complex structure, but they also contain random elements. Graphical models such as Bayesian belief networks and Markov random fields pro\ide a way to describe the relationships between random variables in a complex stochastic
system.
In this book. I use graphical models as an overarching framework to describe and solve problems in the areas of pattern classification, unsupervised learning, data compression, and channel coding. This book covers research I did while in my doctoral program at the
University of Toronto. Rather than being a textbook, this book is a treatise that covers several leading-edge areas of research in machine learning and digital communication.
The book begins with a review of graphical models and algorithms for inferring probabilities in these models, including the probability propagation algorithm. Markov chain Monte Carlo (Gibbs sampling), variational inference, and Helmholtz machines. I then turn to the
practical problem of learning models for pattern classification. Results on the classification of handwritten digits show that Bayesian network pattern classifiers outperform other standard methods, such as the fc-nearest neighbor method.