Programming Hive

Programming Hive, 9781449319335 (1449319335), O'Reilly, 2012

Programming Hive introduces Hive, an essential tool in the Hadoop ecosystem that provides an SQL (Structured Query Language) dialect for querying data stored in the Hadoop Distributed Filesystem (HDFS), other filesystems that integrate with Hadoop, such as MapR-FS and Amazon’s S3 and databases like HBase (the Hadoop database) and Cassandra.

Most data warehouse applications are implemented using relational databases that use SQL as the query language. Hive lowers the barrier for moving these applications to Hadoop. People who know SQL can learn Hive easily. Without Hive, these users must learn new languages and tools to become productive again. Similarly, Hive makes it easier for developers to port SQL-based applications to Hadoop, compared to other tool options. Without Hive, developers would face a daunting challenge when porting their SQL applications to Hadoop.

Still, there are aspects of Hive that are different from other SQL-based environments. Documentation for Hive users and Hadoop developers has been sparse. We decided to write this book to fill that gap. We provide a pragmatic, comprehensive introduction to Hive that is suitable for SQL experts, such as database designers and business analysts. We also cover the in-depth technical details that Hadoop developers require for tuning and customizing Hive.

Comments

Amazing Books

Expert Network Time Protocol: An Experience in Time with NTP

Apress, 2005

Have you ever tried to figure out why your computer clock is off, or why your emails somehow have the wrong timestamp? Most likely, it’s due to an incorrect network time synchronization, which can be reset using the Network Time Protocol. Until now, most network administrators have been too paranoid to work with this, afraid that they...

Autodesk Drainage Design for InfraWorks 360 Essentials

Sybex, 2015

Master the advanced functionality of the drainage-specific InfraWorks add-on

Autodesk Drainage Design for InfraWorks 360 Essentials, 2nd Edition provides hands-on guidance to the tools and capabilities of this drainage-specific InfraWorks module. Straightforward explanations coupled with real-world exercises help you...

Final Cut Pro Workflows: The Independent Studio Handbook

Focal Press, 2007

Film/Video/Production/Final Cut Pro

. . . with an easy style and great depth, Final Cut Pro Workflows: The Independent Studio Handbook is an enjoyable and important read. Osder and Carman offer a diverse background and extensive experience with Final Cut Pro.
Richard Harrington, president of RHED Pixel and author of Photoshop for...

Curves and Surfaces for CAGD: A Practical Guide (The Morgan Kaufmann Series in Computer Graphics)

Morgan Kaufmann, 2001

This fifth edition has been fully updated to cover the many advances made in CAGD and curve and surface theory since 1997, when the fourth edition appeared. Material has been restructured into theory and applications chapters. The theory material has been streamlined using the blossoming approach; the applications material includes least squares...

Asymptotics for Associated Random Variables

Springer, 2012

The book concerns the notion of association in probability and statistics. Association and some other positive dependence notions were introduced in 1966 and 1967 but received little attention from the probabilistic and statistics community. The interest in these dependence notions increased in the last 15 to 20 years, and many asymptotic...

New Trends in Software Methodologies, Tools and Techniques: Proceedings of the fifth SoMeT_06, Volume 147 Frontiers in Artificial Intelligence and Applications

IOS Press, 2006

Software is the essential enabler for the new economy and science. It creates new markets and new directions for a more reliable, flexible, and robust society. It empowers the exploration of our world in ever more depth. However, software often falls short behind our expectations. Current software methodologies, tools, and techniques remain...