Hadoop Beginner's Guide

Hadoop Beginner's Guide, 9781849517300 (1849517304), Packt Publishing, 2013

Get your mountain of data under control with Hadoop. This guide requires no prior knowledge of the software or cloud services - just a willingness to learn the basics from this practical step-by-step tutorial.

Overview

Learn tools and techniques that let you approach big data with relish and not fear.
Shows how to build a complete infrastructure to handle your needs as your data grows.
Hands-on examples in each chapter give the big picture while also giving direct experience.

In Detail

Data is arriving faster than you can process it and the overall volumes keep growing at a rate that keeps you awake at night. Hadoop can help you tame the data beast. Effective use of Hadoop however requires a mixture of programming, design, and system administration skills.

"Hadoop Beginner's Guide" removes the mystery from Hadoop, presenting Hadoop and related technologies with a focus on building working systems and getting the job done, using cloud services to do so when it makes sense. From basic concepts and initial setup through developing applications and keeping the system running as the data grows, the book gives the understanding needed to effectively use Hadoop to solve real world problems.

Starting with the basics of installing and configuring Hadoop, the book explains how to develop applications, maintain the system, and how to use additional products to integrate with other systems.

While learning different ways to develop applications to run on Hadoop the book also covers tools such as Hive, Sqoop, and Flume that show how Hadoop can be integrated with relational databases and log collection .

In addition to examples on Hadoop clusters on Ubuntu uses of cloud services such as Amazon, EC2 and Elastic MapReduce are covered.

What you will learn from this book

The trends that led to Hadoop and cloud services, giving the background to know when to use the technology.
Best practices for setup and configuration of Hadoop clusters, tailoring the system to the problem at hand
Developing applications to run on Hadoop with examples in Java and Ruby
How Amazon Web Services can be used to deliver a hosted Hadoop solution and how this differs from directly-managed environments
Integration with relational databases, using Hive for SQL queries and Sqoop for data transfer
How Flume can collect data from multiple sources and deliver it to Hadoop for processing
What other projects and tools make up the broader Hadoop ecosystem and where to go next

Approach

As a Packt Beginner's Guide, the book is packed with clear step-by-step instructions for performing the most useful tasks, getting you up and running quickly, and learning by doing.

Who this book is written for

This book assumes no existing experience with Hadoop or cloud services. It assumes you have familiarity with a programming language such as Java or Ruby but gives you the needed background on the other topics.

Comments

Amazing Books

A Tiny Handbook of R (SpringerBriefs in Statistics)

Springer, 2011

R has a command-line interface, not a point-and-click GUI (graphical user interface).1 A GUI is easier to learn, and is the best way to interact with graphics. But you can be more expressive with command lines. The commands have a syntax. It’s a language, and like any language the more fluent you are the more expressive you...

Handbook of Cancer Chemotherapy

Lippincott Williams & Wilkins, 2007

Advances in the treatment of cancer have continued at an intense pace over the 25 years since the Handbook of Cancer Chemotherapy was first published in 1982. This is reflected in the expansion of the list of clinically useful antineoplastic drugs from 43 to over 115 and the growth of the ...

The Facts on File Dictionary of Chemistry (Facts on File Science Dictionary)

Facts on File, 2005

This dictionary is one of a series designed for use in schools. It is intended for students of chemistry, but we hope that it will also be helpful to other science students and to anyone interested in science. Facts On File also publishes dictionaries in a variety of disciplines, including biology, physics, mathematics, forensic science,...

Computer Modeling in Bioengineering: Theoretical Background, Examples and Software

John Wiley & Sons, 2008

Bioengineering is a broad-based engineering discipline that applies engineering principles and design to challenges in human health and medicine, dealing with bio-molecular and molecular processes, product design, sustainability and analysis of biological systems. Applications that benefit from bioengineering include medical devices, diagnostic...

Pro CSS Techniques

Apress, 2006

Pro CSS Techniques is the ultimate CSS book for the modern web developer. If youve already got web design and development basics under your belt, but want to take your knowledge to the next level and unleash the full power of CSS in your web sites, then this is the book for you. It is a collection of proven CSS techniques that you...

Mastering Linux

CRC Press, 2010

Encouraging hands-on practice, Mastering Linux provides a comprehensive, up-to-date guide to Linux concepts, usage, and programming. Through a set of carefully selected topics and practical examples, the book imparts a sound understanding of operating system concepts and shows how to use Linux effectively.

...