This book will show you how to use OpenCVs Python bindings to capture video, manipulate images, and track objects with either a normal webcam or a specialized depth sensor, such as the Microsoft Kinect. OpenCV is an open source, cross-platform libran' that provides building blocks for computer vision experiments and applications. It provides high-level interfaces for capturing, processing, and presenting image data. For example, it abstracts details about camera hardware and array allocation. OpenCV is widely used in both academia and industry.
Today, computer vision can reach consumers in many contexts via webcams, camera phones, and gaming sensors such as the Kinect. For better or worse, people love to be on camera, and as developers, we face a demand for applications that capture images, change their appearance, and extract information from them. OpenCVs Python bindings can help us explore solutions to these requirements in a high-level language and in a standardized data format that is interoperable with scientific libraries such as NumPy and SciPy.
Although OpenCV is high-level and interoperable, it is not necessarily easy for new users. Depending on your needs, OpenCVs versatility may come at the cost of a complicated setup process and some uncertainty about how to translate the available functionality into organized and optimized application code. To help you with these problems, I have endeavored to deliver a concise book with an emphasis on clean setup, clean application design, and a simple understanding of each function's purpose. I hope you will learn from this book's project, outgrow it, and still be able to reuse the development environment and parts of the modular code that we have created together.
Specifically, by the end of this book's first chapter, you can have a development environment that links Python, OpenCV, depth camera libraries (OpenNI, SensorKinect), and general-purpose scientific libraries (NumPy, SciPy). After five chapters, you can have several variations of an entertaining application that manipulates users' faces in a live camera feed. Behind this application, you will have a small library of reusable functions and classes that you can apply in your future computer vision projects. Let's look at the book's progression in more detail.