Home | Amazing | Today | Tags | Publishers | Years | Account | Search 
Apache Sqoop Cookbook

Buy
Apache Sqoop Cookbook, 9781449364625 (1449364624), O'Reilly, 2013
It’s been four years since, via a post to the Apache JIRA, the first version of Sqoop was released to the world as an addition to Hadoop. Since then, the project has taken several turns, most recently landing as a top-level Apache project. I’ve been amazed at how many people use this small tool for a variety of large tasks. Sqoop users have imported everything from humble test data sets to mammoth enterprise data warehouses into the Hadoop Distributed Filesystem, HDFS. Sqoop is a core member of the Hadoop ecosystem, and plug-ins are provided and supported by several major SQL and ETL vendors. And Sqoop is now part of integral ETL and processing pipelines run by some of the largest users of Hadoop.

The software industry moves in cycles. At the time of Sqoop’s origin, a major concern was in “unlocking” data stored in an organization’s RDBMS and transferring it to Hadoop. Sqoop enabled users with vast troves of information stored in existing SQL tables to use new analytic tools like MapReduce and Apache Pig. As Sqoop matures, a renewed focus on SQL-oriented analytics continues to make it relevant: systems like Cloudera Impala and Dremel-style analytic engines offer powerful distributed analytics with SQLbased languages, using the common data substrate offered by HDFS.

The variety of data sources and analytic targets presents a challenge in setting up effective data transfer pipelines. Data sources can have a variety of subtle inconsistencies: different DBMS providers may use different dialects of SQL, treat data types differently, or use distinct techniques to offer optimal transfer speeds. Depending on whether you’re importing to Hive, Pig, Impala, or your own MapReduce pipeline, you may want to use a different file format or compression algorithm when writing data to HDFS. Sqoop helps the data engineer tasked with scripting such transfers by providing a compact but powerful tool that flexibly negotiates the boundaries between these systems and their data layouts.
(HTML tags aren't allowed.)

Integral Mechanical Attachment: A Resurgence of the Oldest Method of Joining
Integral Mechanical Attachment: A Resurgence of the Oldest Method of Joining
First reference of its kind to address one of the most fundamental of all methods for joining separate manufactured components: built-in (integral) mechanical fasteners.

Integral Mechanical Attachment, highlights on one of the world's oldest technologies and makes it new again. Think of buttons and toggles updated to innovative
...
Digital Convergence - Libraries of the Future
Digital Convergence - Libraries of the Future
Clay tablets have been used to keep records from the earliest times. However, they were used for archives rather than libraries and consisted mainly of administrative records.Private and personal libraries containing books first appeared in Greece in the 5th century BC.The Royal Library of Alexandria was founded in the 3rd century BC and was...
Analyzing Social Media Networks with NodeXL: Insights from a Connected World
Analyzing Social Media Networks with NodeXL: Insights from a Connected World

Businesses, entrepreneurs, individuals, and government agencies alike are looking to social network analysis (SNA) tools for insight into trends, connections, and fluctuations in social media. Microsoft's NodeXL is a free, open-source SNA plug-in for use with Excel. It provides instant graphical representation of relationships of complex...


Data-Warehouse-Systeme kompakt: Aufbau, Architektur, Grundfunktionen (Xpert.press) (German Edition)
Data-Warehouse-Systeme kompakt: Aufbau, Architektur, Grundfunktionen (Xpert.press) (German Edition)

In dem Buch werden Data-Warehouse-Systeme als einheitliche, zentrale, vollständige, historisierte und analytische IT-Plattform untersucht und ihre Rolle für die Datenanalyse und für Entscheidungsfindungsprozesse dargestellt. Dabei behandelt der Autor die einzelnen Komponenten, die für den Aufbau, die Architektur und den...

JavaScript: The Good Parts
JavaScript: The Good Parts

Most programming languages contain good and bad parts, but JavaScript has more than its share of the bad, having been developed and released in a hurry before it could be refined. This authoritative book scrapes away these bad features to reveal a subset of JavaScript that's more reliable, readable, and maintainable than the language as a...

Clever Algorithms: Nature-Inspired Programming Recipes
Clever Algorithms: Nature-Inspired Programming Recipes

The need for this project was born of frustration while working towards my PhD. I was investigating optimization algorithms and was implementing a large number of them for a software platform called the Optimization Algorithm Toolkit (OAT)1. Each algorithm required considerable effort to locate the relevant source material (from...

©2021 LearnIT (support@pdfchm.net) - Privacy Policy