Advanced Analytics with Spark: Patterns for Learning from Data at Scale

Advanced Analytics with Spark: Patterns for Learning from Data at Scale, 9781491912768 (1491912766), O'Reilly, 2015

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example.

You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

Patterns include:

Recommending music and the Audioscrobbler data set
Predicting forest cover with decision trees
Anomaly detection in network traffic with K-means clustering
Understanding Wikipedia with Latent Semantic Analysis
Analyzing co-occurrence networks with GraphX
Geospatial and temporal data analysis on the New York City Taxi Trips data
Estimating financial risk through Monte Carlo simulation
Analyzing genomics data and the BDG project
Analyzing neuroimaging data with PySpark and Thunder

Comments

Amazing Books

Excel Pivot Tables Recipe Book: A Problem-Solution Approach

Apress, 2006

Excel’s pivot tables are a powerful tool for analyzing data. With only a few minutes of
work, a new user can create an attractively formatted table that summarizes thousands
of rows of data. This book assumes that you know the basics of Excel and pivot tables,
and provides troubleshooting tips and techniques, as well as...

A Comparative Study of Very Large Data Bases (Lecture Notes in Computer Science)

Springer, 1978

This monograph presents a comparison of methods for organizing very large amounts of stored data called a very large database to facilitate fast retrieval of desired information on direct access storage devices. In a very large data base involving retrieval and updating, the major factor of immediate concern is the average number of accesses...

CINEMA 4D 11 Workshop

Focal Press, 2009

Create stunning 3D graphics with the tutorials and techniques in this book.

Model, texture and animate with Cinema 4D 11 using the techniques and tips provided in Cinema 4D 11 Workshop. Starting with all of the basic concepts, functions, and tools - follow along to the workshop tutorials that deliver a hands-on knowledge of...

CYA Securing IIS 6.0

Syngress Publishing, 2004

Network System Administrators operate in a high-stress environment, where the competitive demands of the business often run counter to textbook “best practices”. Design and planning lead times can be nonexistent and deployed systems are subject to constant end-runs; but at the end of the day, you, as the Administrator, are held...

The Ancient Egyptians For Dummies History, Biography & Politics)

For Dummies, 2007

Unravel the history behind of one of the most fascinating ancient civilisations with this engaging, entertaining and educational guide to the ancient Egyptians. With a complete rundown of ancient Egyptian history and culture alongside insights in to the everyday lives of the Egyptians, you’ll discover how they kept themselves entertained, the...

The Cybersecurity Dilemma: Hacking, Trust and Fear Between Nations

Oxford University Press, 2017

Why do nations break into one another's most important computer networks? There is an obvious answer: to steal valuable information or to attack. But this isn't the full story. This book draws on often-overlooked documents leaked by Edward Snowden, real-world case studies of cyber operations, and policymaker perspectives to show that...