High Performance Data Mining - Scaling Algorithms, Applications and Systems

High Performance Data Mining - Scaling Algorithms, Applications and Systems, 9780792377450 (0792377451), Springer, 2000

This special issue of Data Mining and Knowledge Discovery addresses the issue of scaling data mining algorithms, applications and systems to massive data sets by applying high performance computing technology. With the commoditization of high performance computing using clusters of workstations and related technologies, it is becoming more and more common to have the necessary infrastructure for high performance data mining. On the other hand, many of the commonly used data mining algorithms do not scale to large data sets. Two fundamental challenges are: to develop scalable versions of the commonly used data mining algorithms and to develop new algorithms for mining very large data sets. In other words, today it is easy to spin a terabyte of disk, but difficult to analyze and mine a terabyte of data.

Developing algorithms which scale takes time. As an example, consider the successful scale up and parallelization of linear algebra algorithms during the past two decades. This success was due to several factors, including: a) developing versions of some standard algorithms which exploit the specialized structure of some linear systems, such as blockstructured systems, symmetric systems, or Toeplitz systems; b) developing new algorithms such as the Wierderman and Lancos algorithms for solving sparse systems; and c) developing software tools providing high performance implementations of linear algebra primitives, such as Linpack, LA Pack, and PVM.

In some sense, the state of the art for scalable and high performance algorithms for data mining is in the same position that linear algebra was in two decades ago. We suspect that strategies a)–c) will work in data mining also.

High performance data mining is still a very new subject with challenges. Roughly speaking, some data mining algorithms can be characterised as a heuristic search process involving many scans of the data. Thus, irregularity in computation, large numbers of data access, and non-deterministic search strategies make efficient parallelization of a data mining algorithms a difficult task. Research in this area will not only contribute to large scale data mining applications but also enrich high performance computing technology itself. This was part of the motivation for this special issue.

Comments

Amazing Books

I Win, You Win: The Essential Guide to Principled Negotiation

A & C Black Publishers, 2007

Negotiation is an essential skill in all areas of life. It is a series of maneuvers that we move through in order to get the best possible deal for ourselves, our company, or our organization. How far we will go to achieve our goals is where the rub lies. Full of useful exercises, case studies, and accessible advice, this book will...

Automated Database Applications Testing: Specification Representation for Automated Reasoning

World Scientific Publishing, 2010

This book introduces SpecDB, an intelligent database created to represent and host software specifications in a machine-readable format, based on the principles of artificial intelligence and unit testing database operations. SpecDB is demonstrated via two automated intelligent tools. The first automatically generates database constraints...

Object-Oriented User Interfaces for Personalized Mobile Learning (Intelligent Systems Reference Library)

Springer, 2014

This book presents recent research in mobile learning and advanced user interfaces. It is shown how the combination of these fields can result in personalized educational software that meets the requirements of state-of-the-art mobile learning software. This book provides a framework that is capable of incorporating the software technologies,...

Biology: The Dynamic Science

Cengage Learning, 2011

Learn how to think and engage like a scientist! BIOLOGY: THE DYNAMIC SCIENCE, Second Edition, provides you with a deep understanding of the core concepts in Biology, building a strong foundation for additional study. In a fresh presentation, the authors explain complex ideas clearly and describe how biologists collect and interpret evidence...

PayPal APIs: Up and Running

O'Reilly, 2012

There has never been a better time to have a keen interest in commerce. The Web has truly accelerated globalization and connected us all through a common network. Information can now be shared at mind-boggling rates, and entrepreneurs everywhere can truly reach a global audience if they’re clever (and...

Node for Front-End Developers

O'Reilly, 2012

Node.js has brought the JavaScript revolution of the past few years to the server. Java- Script, it turns out, has uses beyond the client, and many techniques for effective clientside development are applicable on the server side as well. Front-end developers can use their existing skills to work with Node today....