Hello, everyone. Thanks for returning for the next tutorial in the HDPCD certification series. In the last tutorial, we saw how to load data into a Hive table from a SELECT query. In this tutorial, we are going to see how to load the compressed data from into the Hive table.
Let us begin then.

The above infographics show the step by step process to perform this activity.
We will perform these tasks as shown below.
- CREATING INPUT CSV FILE IN THE LOCAL FILE SYSTEM
We use the vi editor for creating the input CSV file in the local file system.
I have uploaded this input CSV file to my GitHub profile under HDPCD repository with name “54_input_file_for_compressed_data.csv“. You can download this file by clicking here and it looks as follows.
1 | tina | F | |
---|---|---|---|
2 | jerry | M | |
3 | tom | M | |
4 | wonder woman | F | |
5 | scoobydoo | M | |
6 | donald duck | M | |
7 | pink panther | F | |
8 | oggy | M | |
9 | shinchan | M |
The following commands are used for creating this input CSV file.
vi post41.csv
################################
PASTE THE COPIED CONTENTS HERE
################################
cat post41.csv
The output of the above commands looks as follows.

The above screenshot shows that the file post41.csv was created successfully and it is present in the local file system.
Let us compress this file.
- COMPRESSING THE INPUT CSV FILE
We use the “gzip” command for compressing the input CSV file. Once compressed, this file will be used for loading into the Hive table.
We use the following commands to check, create, and confirm the existence of the compressed input file.
ls
gzip post41.csv
ls
The output of the above commands looks as follows.

The above screenshot confirms that the compressed file was successfully created.
Let us check whether table post41 exists or not.
- CHECKING THE PRE-EXISTENCE OF THE HIVE TABLE POST41
We use the “show tables;” command to get the list of tables present in the “default” database of the Hive.
show tables;
The output of the above command is as follows.

As you can see from the above screenshot, the list of tables does not contain a table with name post41. This indicates that we can go ahead and create the Hive table with name post41.
- CREATING HIVE TABLE WITH NAME POST41
We use the “CREATE” command to create the Hive table to load the data from the local file system.
I have uploaded this input CSV file to my GitHub profile under HDPCD repository with name “55_hive_table_for_compressed_data.sql“. You can download this file by clicking here and it looks as follows.
create table post41 ( | |
id int, | |
name string, | |
gender string | |
) | |
row format delimited | |
fields terminated by ',' | |
stored as textfile; |
The execution of the above command looks as follows.

The above screenshot shows that we get “OK” message after executing this CREATE statement. This indicates that the table post41 was created successfully.
Let us confirm the existence of the Hive table post41.
- CONFIRMING THE EXISTENCE AND SCHEMA OF THE HIVE TABLE POST41
We use the same command “show tables;” to check the existence of the Hive table.
show tables;
The output of the above command is as follows.

As you can see, the Hive table post39 was created successfully. This allows us to continue with the further checks of schema confirmation.
We use the same command “desc formatted post41;” to check the schema of the Hive table post41.
desc formatted post41;
The output of the above command is as follows.

As you can see from the above screenshot, the table post41 contains three columns with defined name and data types. The input format of the Hive table post41 is also as expected and that is the TextInputFormat.
This confirms that the Hive table post41 got created successfully with correct schema.
It is time to check how many records exist in the table post41 before actually loading the data.
- RECORDS BEFORE DATA LOADING FROM COMPRESSED FILE
We use the “SELECT” command to get the records stored in the Hive table.
select * from post41;
The output of the above command is as follows.

As you can see, we got zero records after executing the above command. This indicates that, by default, there are no records in the Hive table post41.
From this, we can conclude that after loading the data from the post41.csv.gz file, the Hive table post41 should have 9 records with 3 columns.
Let us load the data into the Hive table post41 from compressed file.
- LOADING DATA INTO HIVE TABLE FROM COMPRESSED FILE
We use the “LOAD” command to load the data from the compressed file into the Hive table. The Hive command is as follows.
load data local inpath ‘post41.csv.gz’ into table post41;
The output of the above command is as follows.

The above screenshot shows that the input compressed file post41.csv.gz was successfully loaded into the Hive table post41. The screenshot shows the stats like the number of files and the size of the input file(s).
This enables us to look for the records stored in the Hive table post41.
- CHECKING THE RECORDS IN HIVE TABLE POST41
We use the same command “SELECT” command to check the records that got stored in the Hive table.
select * from post41;
The output of the above command is as follows.

The above screenshot shows the that there is a total of 9 records in the Hive table post41 with 3 columns. This completes our expectations and matches exactly with the input compressed file post41.csv.gz.
We can conclude this tutorial here.
I hope you guys like the contents. Keep sharing it with others.
In the next tutorial, we are going to see how to update a row in a Hive table.
You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.
One thought on “Post 41 | HDPCD | Loading compressed data into a Hive table”