Data mining and data modeling are hot topics and are under fast development. Because of its wide applications and rich research contents, a lot of practitioners and academics are attracted to work on these areas. In the view of promoting the communications and collaborations among the practitioners and researchers in Hong Kong, a two-day workshop on data mining and modeling was held on 27-28 June 2002. Prof. Ngaiming Mok, the Director of the Institute of Mathematical Research, The University of Hong Kong and Prof. Tze Leung Lai, Stanford University and the C.V. Starr Professor of the University of Hong Kong initialized the workshop. The workshop was organized by Dr. Michael Kwok-Po Ng, Department of Mathematics, The University of Hong Kong, and supported by the Institute of Mathematical Research and Hong Kong Mathematical Society. The two-day workshop is the first workshop on data mining and modeling in Hong Kong. It aims at promoting research interest in mathematical, statistical and computational methods and models in data mining for computer scientists, mathematicians, engineers and statisticians to foster contacts and inter-flow.
This book contains selected papers presented in the workshop. The papers fall into two main categories: data mining and data modeling. Data mining papers contain pattern discovery, clustering algorithms, classification and practical applications in stock market. Data modeling papers deal with neural network models, time series models, statistical models and practical applications. In the following, we give brief summaries for individual paper.
The problem of mining frequent sequences is to extract frequently occurring subsequences in a sequence database. Many algorithms have been proposed to solve the problem efficiently. Kao and Zhang survey several notable algorithms for mining frequent sequences, and analyze their characteristics.
Feng et al. discuss the gene selection problem which is an important issue in microarray data analysis and has critical implications for the discovery of genes related to serious diseases. They propose a Fisher optimization model in gene selection and uses Fisher linear discriminant in classification. They also demonstrated the validity of this method by using public data.