Post 12 | HDPCD | Load data from Hive to Pig

Hello, everyone. Thanks for coming back! I Hope the tutorials are inspiring you to take each task seriously and perform each operation by understanding why we are performing each step.

In the last tutorial, we saw how to create the Pig Relation with a defined schema. This tutorial is about creating a Pig Relation, but instead of loading data from a flat file, the data will be loaded from the already existing Hive Table. So, let us take a quick look at the steps included in performing this operation.

Below picture gives you a clear idea about the steps we are going to follow to load data from Apache Hive to Apache Pig.

Load data from hive to pig
Load data from hive to pig

The first thing that we are going to do is to check the hive table and its schema. Based on the schema, we will have an idea about the structure of the imported pig relation. We are going to log into hive terminal and then look for the schema of products table as it is the table whose data we would like to import into the Pig Relation.

We use the following command to log into hive and look for the products table.


Once you are in the hive terminal, we can run the following command to get the list of tables.

show tables;

For your reference, following screenshot gives the output of the commands that are shown above.

hive tables list
hive tables list

Once we confirm that we have the products table in the hive database, let us look at the structure of the products table. For doing this, we can use describe and select commands which are shown below.

describe products;

select * from products limit 10;

Following screenshot shows how the above two commands run.

table structure and data sample
table structure and data sample

Once we know the data structure and sample data, it is time to write a Pig script which will import this hive data into the pig relation.

The script file to load data from Apache Hive to Apache Pig is uploaded to my GitHub profile and it looks as follows.

–execute this file by using -useHCatalog flag
–the flag "-useHCatalog" enables Pig to pick jars for HCatalog
–HCatalog is used for loading data from Apache Hive to Pig
–loading the data in "products" hive table into a Pig Relation
hive_data = LOAD 'products' USING org.apache.hive.hcatalog.pig.HCatLoader();
–looking at the structure of the data
DESCRIBE hive_data;
–dumping the data on the terminal window
DUMP hive_data;

view raw
hosted with ❤ by GitHub

Now, let us go through each line to understand what is going on here.

hive_data = LOAD ‘products’ USING org.apache.hive.hcatalog.pig.HCatLoader();

EXPLANATION: Above line contains the meat of our objective of the tutorial. It loads the data from the products table in hive in a pig relation called hive_data. As you can see, there is a class involved in this import operation. The fully qualified name of this class is “org.apache.hive.hcatalog.pig.HCatLoader”. 

The above-mentioned class resides in one of the jar files in HCatalog directory and when you run above command, that jar file is used to successfully execute this operation. This is the sole reason we run above post12.pig file with the -useHCatalog flag.

DESCRIBE hive_data;

EXPLANATION: As you might be aware of it now, DESCRIBE command is used for viewing the datatypes and column in the Pig Relation hive_data.

DUMP hive_data;

EXPLANATION: The DUMP command is used for printing the contents stored in the hive_data Pig Relation. This command is not required as part of the objective, but we are still executing it to confirm that hive table data got loaded into the Pig Relation successfully.

We use the vi command to create this file in the terminal window. Once, the contents of the pig script file are created, we run the cat command to verify the file got created successfully. The following screenshot gives you a clear idea about this.

creating pig script file
creating pig script file

Once, the pig script is ready, we can run it. Let us see what happens if we use the traditional pig command to run this script.

Error in Pig script
Error in Pig script

As you can see from the above screenshot if you don’t use -useHcatalog flag with the pig command, then the command is going to fail and you will get an error saying “Could not resolve org.apache.hive.hcatalog.pig.HCatLoader using imports“. This error clearly indicates Pig was not able to find the jar files required to kick-off HCatalog functionality.

To resolve this issue, we should run above command with -useCatalog flag. Once we use this flag, pig will pick up jar files required to run HCatalog services required to import hive data into pig relation. For your reference, following is the correct command used for this tutorial.

pig -useHcatalog -f post12.pig

The following screenshot shows that the file ran successfully and we got to see the output as well.

pig run script
pig run script

Following is the output of the DUMP command.

dump pig relation
dump pig relation

And the structure of the pig relation looks as follows.

describe pig relation
describe pig relation

Above screenshot shows that we got the output as expected.

I hope all the tutorials are helping you in understanding the requirements to clear the certification. In the next tutorial, we are going to see how to format the data in the specified format using pig relation.

Please stay tuned for the further updates.

You can click here to subscribe to my YouTube channel. Please like my Facebook page here and follow on twitter here.

Thanks for having a read.


Published by milindjagre

I founded my blog four years ago and am currently working as a Data Scientist Analyst at the Ford Motor Company. I graduated from the University of Connecticut pursuing Master of Science in Business Analytics and Project Management. I am working hard and learning a lot of new things in the field of Data Science. I am a strong believer of constant and directional efforts keeping the teamwork at the highest priority. Please reach out to me at for further information. Cheers!

One thought on “Post 12 | HDPCD | Load data from Hive to Pig

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: