Traditional data mining methods are designed to deal with “static” databases, i.e. databases where the ordering of records (or other database objects) has nothing to do with the patterns of interest. Though the assumption of order irrelevance may be sufficiently accurate in some applications, there are certainly many other cases, where sequential information, such as a time-stamp associated with every record, can significantly enhance our knowledge about the mined data. One example is a series of stock values: a specific closing price recorded yesterday has a completely different meaning than the same value a year ago. Since most today’s databases already include temporal data in the form of “date created”, “date modified”, and other time-related fields, the only problem is how to exploit this valuable information to our benefit. In other words, the question we are currently facing is: How to mine time series data?
The purpose of this volume is to present some recent advances in preprocessing, mining, and interpretation of temporal data that is stored by modern information systems. Adding the time dimension to a database produces a Time Series Database (TSDB) and introduces new aspects and challenges to the tasks of data mining and knowledge discovery. These new challenges include: finding the most efficient representation of time series data, measuring similarity of time series, detecting change points in time series, and time series classification and clustering. Some of these problems have been treated in the past by experts in time series analysis. However, statistical methods of time series analysis are focused on sequences of values representing a single numeric variable (e.g., price of a specific stock). In a real-world database, a time-stamped record may include several numerical and nominal attributes, which may depend not only on the time dimension but also on each other. To make the data mining task even more complicated, the objects in a time series may represent some complex graph structures rather than vectors of feature-values.
Adding the time dimension to real-world databases produces Time Series Databases (TSDB) and introduces new aspects and difficulties to data mining and knowledge discovery. This book covers the state-of-the-art methodology for mining time series databases. The novel data mining methods presented in the book include techniques for efficient segmentation, indexing, and classification of noisy and dynamic time series. A graph-based method for anomaly detection in time series is described and the book also studies the implications of a novel and potentially useful representation of time series as strings. The problem of detecting changes in data mining models that are induced from temporal databases is additionally discussed.