|
Sequences are an important type of data which occur frequently in many scientific,
medical, security, business and other applications. For example, DNA
sequences encode the genetic makeup of humans and all species, and protein
sequences describe the amino acid composition of proteins and encode
the structure and function of proteins. Moreover, sequences can be used to
capture how individual humans behave through various temporal activity histories
such as weblogs and customer purchase histories. Sequences can also be
used to describe how organizations behave through sales histories such as the
total sales of various items over time for a supermarket, etc.
Huge amounts of sequence data have been and continue to be collected in
genomic and medical studies, in security applications, in business applications,
etc. In these applications, the analysis of the data needs to be carried out in
different ways to satisfy different application requirements, and it needs to
be carried out in an efficient manner. Sequence data mining provides the
necessary tools and approaches for unlocking useful knowledge hidden in the
mountains of sequence data. The purpose of this book is to present some of
the main concepts, techniques, algorithms, and references on sequence data
mining.
This introductory chapter has four goals. First, it will provide some example
applications of sequence data. Second, it will define several basic/generic
concepts for sequences and sequence data mining. Third, it will discuss the major
issues of interest in data mining research. Fourth, it will give an overview
of the entire book. |