Post 42 | HDPCD | Update a row in a Hive table

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series.

In the last tutorial, we saw how to load the compressed data into a Hive table.

In this tutorial, we are going to see how to update a row in the Hive table.

It is quite interesting to see that Hive supports ACID operations now, though the data is stored in HDFS. This is what excites me.

Let us begin, then.

Apache Hive: Update a row in Hive table
Apache Hive: Update a row in Hive table

The above info-graphics show the step by step process of implementing the objective of this tutorial.

We will perform these steps in the following way.

  • CHECK EXISTENCE OF THE HIVE TABLE

We use the “show tables;” command to check the existence of the Hive table.

show tables;

The output of the above command looks as follows.

Step 1: Checking the pre-existence of hive table post42
Step 1: Checking the pre-existence of hive table post42

As you can see, the Hive table post42 does not exist in the “default” database. This confirms that we can go ahead and create the Hive table with name “post42”.

  • CREATE HIVE TABLE WITH TRANSACTIONAL PROPERTY = TRUE

We use the “CREATE” command to create the Hive table. Here, in this tutorial, we are looking to update the records stored in the Hive table. Therefore, we have to take an extra measure of setting a table property to make this Hive table as a transactional table. This Hive table is created with the help of the following command.

create table post42 (
id int,
name string,
gender string
)
clustered by (id) into 4 buckets
row format delimited
fields terminated by ‘,’
stored as orc
tblproperties(‘transactional’ = ‘true’);

As you can see from the above code snippet, we have taken the following extra measure while creating this transactional Hive table.

  • The table is bucketed into 4 buckets with “id” as the column
  • The table is made to store the data in ORC file format
  • The table property transactional is set to TRUE to make it a transactional table

If you miss any of the above properties, you won’t be able to update a row in the Hive table.

The output of the above command looks as follows.

Step 2: Creating hive table post42
Step 2: Creating hive table post42

As you can see, the “OK” message indicates that the Hive table was created successfully.

Now, we can confirm the existence of the Hive table post42.

  • CONFIRM EXISTENCE AND SCHEMA OF THE HIVE TABLE

We use the same “show tables;” command to confirm the existence of the Hive table.

show tables;

The output of the above command looks as follows.

Step 3: Confirming the existence of hive table post42
Step 3: Confirming the existence of hive table post42

As you can see from the above screenshot, the Hive table post42 was created successfully. We will go one step further and check the schema and other properties of the Hive table post42.

We use the following command for doing this.

desc formatted post42;

The output of the above command looks as follows.

Step 4: Confirming the schema of hive table post42
Step 4: Confirming the schema of hive table post42

As you can see from the above screenshot, the table contains three columns as defined. It has the ORCInputFormat as the file input format and the transactional property is set to TRUE.

As all these properties are satisfied, it is time to populate the Hive table post42 with some sample records.

  • POPULATE HIVE TABLE WITH SAMPLE RECORDS

We use the “INSERT” command to insert the records into the Hive transactional table.

insert into post42 select * from post41;

As you can see from the above command, we are going to load the records from the Hive table post41 into the Hive table post42.

The output of the above command looks like this.

Step 5: Inserting records into table post42
Step 5: Inserting records into table post42

As you can see from the above screenshot, the Hive command triggers a MapReduce job to load the data into the Hive table post42.

In the end, the “OK” message is shown which indicates that the records got inserted successfully into the Hive table post42. Apart from this, a basic stats is shown of the data size that got loaded into the Hive table post42.

This enables us to confirm the existence of the inserted records.

  • CONFIRM RECORDS GOT INSERTED SUCCESSFULLY

We use the “SELECT” command to check the records in the Hive table post42.

select * from post42;

The output of the above command looks as follows.

Step 6: Confirming records got inserted into table post42 successfully
Step 6: Confirming records got inserted into table post42 successfully

As can be seen from the above screenshot, a total of 9 records were successfully inserted into the Hive table post42. Please pay special attention to the record with id=2, because we are going to update that record.

Now, let us update the already inserted records in the Hive table.

  • UPDATE RECORDS IN THE HIVE TABLE USING WHERE CLAUSE

We use the “UPDATE” command to update the records stored in the Hive table.

update post42 set name=’Milind’ where id=2;

In the above command, we are trying to update the name of the student with id=2. The old name is “Jerry” and the new name is “Milind”.

Therefore, to set the expectations right, once we execute the above command, the record with id=2 should have name=”Milind”.

Let us see how the output of the above command looks like.

Step 7: Performing the UPDATE operation on hive table post42
Step 7: Performing the UPDATE operation on hive table post42

From the screenshot above, you can see that the “UPDATE” command triggers a MapReduce operation. The “OK” message shows that this operation was successful.

The last thing that we would like to do is to check the records again to see the changes reflected or not.

  • CONFIRM THAT RECORDS GOT UPDATED SUCCESSFULLY

We use the “SELECT” command to check the updated records in the Hive table post42.

select * from post42;

The output of the above command is as follows.

Step 8: Confirming the UPDATE command execution output
Step 8: Confirming the UPDATE command execution output

A total of 9 records can be seen from the above screenshot. Please pay special attention to the record with id=2. As you can see, the old name “Jerry” gets overwritten by the new name “Milind”.

This meets our expectations and therefore, we can conclude this tutorial here.

Hope you guys like it.

Please stay tuned for more updates.

In the next tutorial, we are going to see how to see how to delete a row in the Hive table.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

1 thought on “Post 42 | HDPCD | Update a row in a Hive table

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: