Skip to main content

Accessing Hadoop Data Using Hive BDU

Enrollment in this course is by invitation only

About This Course

Writing MapReduce programs to analyze your Big Data can get complex. Hive can help make querying your data much easier. Apache Hive, first created at Facebook, is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. This course will get you started so that you can use Hive for Data Warehousing tasks on your Big Data projects.

Course Syllabus

  • Lesson 1 - Introduction to Hive
    • Describe what Hive is, what it’s used for and how it compares to other similar technologies
    • Describe the Hive architecture
    • Describe the main components of Hive
    • List interesting ways others are using Hive
  • Lesson 2 - Hive DDL
    • Create databases and tables in Hive, while using a variety of different Data Types
    • Run a variety of different DDL commands
    • Use Partitioning to improve performance of Hive queries
    • Create Managed and External tables in Hive
  • Lesson 3 - Hive DML
    • Load data into Hive
    • Export data out of Hive
    • Run a variety of different HiveQL DML queries
  • Lesson 4 - Hive Operators and Functions
    • Use a variety of Hive Operators in your queries
    • Utilize Hive’s Built-in Functions
    • Explain ways to extend Hive functionality
  • Lesson 5 - Hive Storage Formats
    • Use a variety of File Formats in Hive
    • Use different SerDes with Hive
    • Convert data between file formats
    • Understand compression in Hive

Recommended skills prior to taking this course

  • Basic understanding of Apache Hadoop and BigData.
  • Working knowledge of SQL
  • Basic Linux Operating System knowledge

Requirements

  • None.

Course Staff

Course Staff Image #1

Aaron Ritchie

Aaron Ritchie has worked in the Information Management division of IBM for over 11 years and has held a variety of roles within the Center of Excellence and Education groups. Aaron has worked as an IT Specialist, Learning Developer, and Project Manager. He is certified in multiple IBM products and enjoys working with an assortment of open-source technologies. Aaron holds a Bachelor of Science in Computer Science degree from Clarkson University and a Master of Science in Information Technology degree from WPI.

  1. Course Number

    BD0141EN
  2. Classes Start

    Any Time, Self-Paced
  3. Estimated Effort

    4:00