HDPCD Certification – Post 1

In this post series, I am going to talk about the Hortonworks Data Platform Certified Developer Certification, also known as HDPCD.

We’ll kick-off the proceedings with the introduction to this certification exam and the things need to be covered in order to earn a verified digital badge, proving your certification. Therefore, few facts about this certification are as follows.

Cost: $250 (enough motivation to take this exam seriously)

Duration: 2 hours

Passing Criteria: Solve either 5 out of 7 or 7 out 10 questions successfully.

Hadoop Sections to study: HDFS, Sqoop, Flume, Hive and Pig

Now, above mentioned five sections are divided into three major categories in HDPCD certification. These categories are as follows.


Above mentioned categories contain specific tasks which you should be familiar with to clear the certification with flying colors.

We will see tasks in each category. These tasks are as follows.


This category contains following six tasks which you should be aware of.

  1. SQOOP Import.
  2. Free form SQOOP import.
  3. Importing data into Hive using SQOOP.
  4. SQOOP Export.
  5. FLUME Agent.
  6. FLUME Memory Channel.


The second category in this certification is focused entirely on Apache Pig, therefore all of the below tasks are related to Apache Pig.

  1. Write and Execute a PIG Script.
  2. Load data into PIG relation without a schema.
  3. Load data into PIG relation with a schema.
  4. Load data from hive table into a PIG relation.
  5. Use PIG to transform data into specified format.
  6. Transform PIG data to match a given hive schema.
  7. Group the data of one or more PIG relation(s).
  8. Use PIG to remove records with NULL values from a relation.
  9. Store the data from a PIG relation into a folder in HDFS.
  10. Store the data from a PIG relation into a hive table.
  11. Sort the output of a PIG relation.
  12. Remove the duplicate tuples of a PIG relation.
  13. Specify the number of reduce tasks for a PIG MapReduce Job.
  14. Join the two datasets using PIG.
  15. Perform a replicate join using PIG.
  16. Run a PIG job using TeZ.
  17. Within a PIG script, register a JAR using UDF.
  18. Within a PIG script, define an alias for the UDF.
  19. Within a PIG script, invoke a UDF.


As above category is dedicated to Apache Pig, this category is entirely dedicated to Apache Hive. It contains following subtasks.

  1. Write and execute HIVE query.
  2. Define a HIVE-managed table.
  3. Define a HIVE external table.
  4. Define a partitioned HIVE table.
  5. Define a bucketed HIVE table.
  6. Define a HIVE table from a select query.
  7. Define a HIVE table that uses ORCFile format.
  8. Create a new ORCFile table from the existing data in a non-ORCFile table in HIVE.
  9. Specify the storage format of a HIVE table.
  10. Specify the delimiter of a HIVE table.
  11. Load data into a HIVE table from a local directory.
  12. Load data into a HIVE table from an HDFS directory.
  13. Load data into a HIVE table as the result of the query.
  14. Load compressed data into a HIVE table.
  15. Update a row in a HIVE table.
  16. Delete a row in a HIVE table.
  17. Insert a row in a HIVE table.
  18. Join two HIVE tables.
  19. Run a HIVE query using Tez.
  20. Run a HIVE query using vectorization.
  21. Output the execution plan for a HIVE query.
  22. Use a subquery within a HIVE query
  23. Output data from a HIVE query that is totally ordered across multiple reducers.
  24. Set a Hadoop or HIVE configuration property from within a Hive query.

Though these tasks seem too much, but if done regularly, will hardly take any time of your schedule.

We will cover each task in each post on this blog, therefore at the end, we are going to have a total of 51 posts, including this post, 49 tasks, and one conclusive post.

I hope this series will help the HDPCD certification aspirants.

The link for the certification is HDPCD Certification

The certification objectives are taken from HDPCD Objectives

Suggestions are welcome.

Thank you!


Published by milindjagre

I founded my blog www.milindjagre.co four years ago and am currently working as a Data Scientist Analyst at the Ford Motor Company. I graduated from the University of Connecticut pursuing Master of Science in Business Analytics and Project Management. I am working hard and learning a lot of new things in the field of Data Science. I am a strong believer of constant and directional efforts keeping the teamwork at the highest priority. Please reach out to me at milindjagre@gmail.com for further information. Cheers!

2 thoughts on “HDPCD Certification – Post 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: