Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning

Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning, 9783540316817 (3540316817), Springer, 2006

"Kernel Based Algorithms for Mining Huge Data Sets" is the first book treating the fields of supervised, semi-supervised and unsupervised machine learning collectively. The book presents both the theory and the algorithms for mining huge data sets by using support vector machines (SVMs) in an iterative way. It demonstrates how kernel based SVMs can be used for dimensionality reduction (feature elimination) and shows the similarities and differences between the two most popular unsupervised techniques, the principal component analysis (PCA) and the independent component analysis (ICA). The book presents various examples, software, algorithmic solutions enabling the reader to develop their own codes for solving the problems. The book is accompanied by a website for downloading both data and software for huge data sets modeling in a supervised and semisupervised manner, as well as MATLAB based PCA and ICA routines for unsupervised learning. The book focuses on a broad range of machine learning algorithms and it is particularly aimed at students, scientists, and practicing researchers in bioinformatics (gene microarrays), text-categorization, numerals recognition, as well as in the images and audio signals de-mixing (blind source separation) areas.

This is a book about (machine) learning from (experimental) data. Many books devoted to this broad field have been published recently. One even feels tempted to begin the previous sentence with an adjective extremely. Thus, there is an urgent need to introduce both the motives for and the content of the present volume in order to highlight its distinguishing features.

Before doing that, few words about the very broad meaning of data are in order. Today, we are surrounded by an ocean of all kind of experimental data (i.e., examples, samples, measurements, records, patterns, pictures, tunes, observations,..., etc) produced by various sensors, cameras, microphones, pieces of software and/or other human made devices. The amount of data produced is enormous and ever increasing. The first obvious consequence of such a fact is - humans can’t handle such massive quantity of data which are usually appearing in the numeric shape as the huge (rectangular or square) matrices. Typically, the number of their rows (n) tells about the number of data pairs collected, and the number of columns (m) represent the dimensionality of data. Thus, faced with the Giga- and Terabyte sized data files one has to develop new approaches, algorithms and procedures. Few techniques for coping with huge data size problems are presented here. This, possibly, explains the appearance of a wording ’huge data sets’ in the title of the book.

Comments

Amazing Books

PC World, April 2008: 50 Best Software Secrets

PC World, 2008

Increase the muscularity of your PC's browser, productivity apps and multimedia programs with these 50 plus finds; The Best Bargain PCs: Bargain desktops and laptops put to the test; The Right Social Network for You: 17 social networks that promote friendships, career growth and shared interest; Reviews and Rankings: 50 and 52-Inch HDTVs : Tested...

Adobe Premiere Pro CS5 Classroom in a Book

Adobe Press, 2010

Those creative professionals seeking the fastest, easiest, most comprehensive way to learn Adobe Premiere Pro CS5 choose Adobe Premiere Pro CS5 Classroom in a Book from the Adobe Creative Team at Adobe Press.

The 22 project-based lessons in this book show readers step-by-step the key techniques for working in Premiere Pro...

Big Data Analysis with Python: Combine Spark and Python to unlock the powers of parallel computing and machine learning

Packt Publishing, 2019

Get to grips with processing large volumes of data and presenting it as engaging, interactive insights using Spark and Python.

Key Features

Get a hands-on, fast-paced introduction to the Python data science stack

Explore ways to create useful metrics and statistics from...

Protecting Victims of Human Trafficking From Liability: The European Approach (Palgrave Studies in Victims and Victimology)

Palgrave Macmillan, 2018

This books demonstrates the difficulty of protecting victims of human trafficking from being held liable for crimes they were compelled to commit in the course, or as a consequence, of being trafficked, under current European law. The legislation remains vague and potentially inadequate to recognise victimhood, safeguard the human...

Design and Analysis of Randomized Algorithms: Introduction to Design Paradigms

Springer, 2005

Randomization has become a standard approach in algorithm design. Efficiency and simplicity are the main features of randomized algorithms that often made randomization a miraculous springboard for solving complex problems in various applications. Especially in the areas of communication, cryptography, data management, and discrete...

Programming Wireless Devices with the Java2 Platform, Micro Second Edition

Addison Wesley, 2003

This book presents the Java(TM) 2 Platform, Micro Edition (J2ME(TM)) standards that support the development of applications for consumer devices such as mobile phones, two-way pagers, and wireless personal organizers. To create these standards, Sun collaborated with such consumer device companies as Motorola, Nokia, NTT DoCoMo,...