Post 44 | HDPCD | Insert a row in the Hive table

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series.

In the last tutorial, we saw how to delete a row in a Hive table.

In this tutorial, we are going to see how to insert a row in the Hive table.

It is quite interesting to see that Hive supports ACID operations now, though the data is stored in HDFS. This is what excites me.

Let us begin, then.

Apache Hive: Insert a new row in Hive table
Apache Hive: Insert a new row in Hive table

The above info-graphics show the step by step process of implementing the objective of this tutorial.

We will perform these steps in the following way.

  • CHECK EXISTENCE OF THE HIVE TABLE

We use the “show tables;” command to check the existence of the Hive table.

show tables;

The output of the above command looks as follows.

Step 1: Checking the pre-existence of hive table post44
Step 1: Checking the pre-existence of hive table post44

As you can see, the Hive table post44 does not exist in the “default” database. This confirms that we can go ahead and create the Hive table with name “post44”.

  • CREATE HIVE TABLE WITH TRANSACTIONAL PROPERTY = TRUE

We use the “CREATE” command to create the Hive table. Here, in this tutorial, we are looking to delete the records stored in the Hive table. Therefore, we have to take an extra measure of setting a table property to make this Hive table as a transactional table. This Hive table is created with the help of the following command.

create table post44 (
id int,
name string,
gender string
)
clustered by (id) into 4 buckets
row format delimited
fields terminated by ‘,’
stored as orc
tblproperties(‘transactional’ = ‘true’);

As you can see from the above code snippet, we have taken the following extra measure while creating this transactional Hive table.

  • The table is bucketed into 4 buckets with “id” as the column
  • The table is made to store the data in ORC file format
  • The table property transactional is set to TRUE to make it a transactional table

If you miss any of the above properties, you won’t be able to insert a new row in the Hive table.

The output of the above command looks as follows.

Step 2: Creating hive table post44
Step 2: Creating hive table post44

As you can see, the “OK” message indicates that the Hive table was created successfully.

Now, we can confirm the existence of the Hive table post44.

  • CONFIRM EXISTENCE AND SCHEMA OF THE HIVE TABLE

We use the same “show tables;” command to confirm the existence of the Hive table.

show tables;

The output of the above command looks as follows.

Step 3: Confirming the existence of hive table post44
Step 3: Confirming the existence of hive table post44

As you can see from the above screenshot, the Hive table post44 was created successfully. We will go one step further and check the schema and other properties of the Hive table post44.

We use the following command for doing this.

desc formatted post44;

The output of the above command looks as follows.

Step 4: Confirming the schema of hive table post44
Step 4: Confirming the schema of hive table post44

As you can see from the above screenshot, the table contains three columns as defined. It has the ORCInputFormat as the file input format and the transactional property is set to TRUE.

As all these properties are satisfied, it is time to populate the Hive table post44 with some sample records.

  • POPULATE HIVE TABLE WITH SAMPLE RECORDS

We use the “INSERT” command to insert the records into the Hive transactional table.

insert into post44 select * from post41;

As you can see from the above command, we are going to load the records from the Hive table post41 into the Hive table post44.

The output of the above command looks like this.

Step 5: Inserting records into table post44
Step 5: Inserting records into table post44

As you can see from the above screenshot, the Hive command triggers a MapReduce job to load the data into the Hive table post44.

In the end, the “OK” message is shown which indicates that the records got inserted successfully into the Hive table post44. Apart from this, a basic stats is shown of the data size that got loaded into the Hive table post44.

This enables us to confirm the existence of the inserted records.

  • CONFIRM RECORDS GOT INSERTED SUCCESSFULLY

We use the “SELECT” command to check the records in the Hive table post44.

select * from post44;

The output of the above command looks as follows.

Step 6: Confirming records got inserted into table post44 successfully
Step 6: Confirming records got inserted into table post44 successfully

As can be seen from the above screenshot, a total of 9 records were successfully inserted into the Hive table post44.

Now, let us add one new row in the Hive table post44.

  • INSERT A NEW ROW IN THE HIVE TABLE

We use the “INSERT” command to update the records stored in the Hive table.

insert into POST44 values (10,’Rahul’,’M’);

In the above command, we are trying to add a new row in the Hive table with id=10 and name = “Rahul” and gender = “M”.

Therefore, to set the expectations right, once we execute the above command, the record with id=10 should be an additional record in the Hive table post44 and we should have a total of 10 records in the Hive table post44.

Let us see how the output of the above command looks like.

Step 7: Performing the INSERT operation on hive table post44
Step 7: Performing the INSERT operation on hive table post44

From the screenshot above, you can see that the “INSERT” command triggers a MapReduce operation. The “OK” message shows that this operation was successful.

The last thing that we would like to do is to check the records again to see the changes reflected or not.

  • CONFIRM THAT RECORDS GOT INSERTED SUCCESSFULLY

We use the “SELECT” command to check the updated records in the Hive table post44.

select * from post44;

The output of the above command is as follows.

Step 8: Confirming the INSERT command execution output
Step 8: Confirming the INSERT command execution output

A total of 10 records can be seen from the above screenshot. Please pay special attention to the record with id=10. As you can see, the record with id=10 is inserted into the table after executing the “INSERT” command.

This meets our expectations and therefore, we can conclude this tutorial here.

Hope you guys like it.

Please stay tuned for more updates.

In the next tutorial, we are going to see how to join two Hive tables.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

1 thought on “Post 44 | HDPCD | Insert a row in the Hive table

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: