Post 43 | HDPCD | Delete a row in a Hive table

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series.

In the last tutorial, we saw how to update a row in a Hive table.

In this tutorial, we are going to see how to delete a row in the Hive table.

It is quite interesting to see that Hive supports ACID operations now, though the data is stored in HDFS. This is what excites me.

Let us begin, then.

Apache Hive: Delete a row in Hive table
Apache Hive: Delete a row in Hive table

The above info-graphics show the step by step process of implementing the objective of this tutorial.

We will perform these steps in the following way.

  • CHECK EXISTENCE OF THE HIVE TABLE

We use the “show tables;” command to check the existence of the Hive table.

show tables;

The output of the above command looks as follows.

Step 1: Checking the pre-existence of hive table post43
Step 1: Checking the pre-existence of hive table post43

As you can see, the Hive table post43 does not exist in the “default” database. This confirms that we can go ahead and create the Hive table with name “post43”.

  • CREATE HIVE TABLE WITH TRANSACTIONAL PROPERTY = TRUE

We use the “CREATE” command to create the Hive table. Here, in this tutorial, we are looking to delete the records stored in the Hive table. Therefore, we have to take an extra measure of setting a table property to make this Hive table as a transactional table. This Hive table is created with the help of the following command.

create table post43 (
id int,
name string,
gender string
)
clustered by (id) into 4 buckets
row format delimited
fields terminated by ‘,’
stored as orc
tblproperties(‘transactional’ = ‘true’);

As you can see from the above code snippet, we have taken the following extra measure while creating this transactional Hive table.

  • The table is bucketed into 4 buckets with “id” as the column
  • The table is made to store the data in ORC file format
  • The table property transactional is set to TRUE to make it a transactional table

If you miss any of the above properties, you won’t be able to update a row in the Hive table.

The output of the above command looks as follows.

Step 2: Creating hive table post43
Step 2: Creating hive table post43

As you can see, the “OK” message indicates that the Hive table was created successfully.

Now, we can confirm the existence of the Hive table post43.

  • CONFIRM EXISTENCE AND SCHEMA OF THE HIVE TABLE

We use the same “show tables;” command to confirm the existence of the Hive table.

show tables;

The output of the above command looks as follows.

Step 3: Confirming the existence of hive table post43
Step 3: Confirming the existence of hive table post43

As you can see from the above screenshot, the Hive table post43 was created successfully. We will go one step further and check the schema and other properties of the Hive table post43.

We use the following command for doing this.

desc formatted post43;

The output of the above command looks as follows.

Step 4: Confirming the schema of hive table post43
Step 4: Confirming the schema of hive table post43

As you can see from the above screenshot, the table contains three columns as defined. It has the ORCInputFormat as the file input format and the transactional property is set to TRUE.

As all these properties are satisfied, it is time to populate the Hive table post43 with some sample records.

  • POPULATE HIVE TABLE WITH SAMPLE RECORDS

We use the “INSERT” command to insert the records into the Hive transactional table.

insert into post43 select * from post41;

As you can see from the above command, we are going to load the records from the Hive table post41 into the Hive table post43.

The output of the above command looks like this.

Step 5: Inserting records into table post43

As you can see from the above screenshot, the Hive command triggers a MapReduce job to load the data into the Hive table post43.

In the end, the “OK” message is shown which indicates that the records got inserted successfully into the Hive table post43. Apart from this, a basic stats is shown of the data size that got loaded into the Hive table post43.

This enables us to confirm the existence of the inserted records.

  • CONFIRM RECORDS GOT INSERTED SUCCESSFULLY

We use the “SELECT” command to check the records in the Hive table post43.

select * from post43;

The output of the above command looks as follows.

Step 6: Confirming records got inserted into table post43 successfully
Step 6: Confirming records got inserted into table post43 successfully

As can be seen from the above screenshot, a total of 9 records were successfully inserted into the Hive table post43. Please pay special attention to the record with id=3, because we are going to delete that record.

Now, let us update the already inserted records in the Hive table.

  • DELETE RECORDS IN THE HIVE TABLE USING WHERE CLAUSE

We use the “DELETE” command to update the records stored in the Hive table.

delete from post43 where id=3;

In the above command, we are trying to delete the record of the student with id=3.

Therefore, to set the expectations right, once we execute the above command, the record with id=3 should be gone from the existing Hive table post43 and we should have only 8 records in the Hive table post43.

Let us see how the output of the above command looks like.

Step 7: Performing the DELETE operation on hive table post43

From the screenshot above, you can see that the “DELETE” command triggers a MapReduce operation. The “OK” message shows that this operation was successful.

The last thing that we would like to do is to check the records again to see the changes reflected or not.

  • CONFIRM THAT RECORDS GOT DELETED SUCCESSFULLY

We use the “SELECT” command to check the updated records in the Hive table post43.

select * from post43;

The output of the above command is as follows.

Step 8: Confirming the DELETE command execution output
Step 8: Confirming the DELETE command execution output

A total of 8 records can be seen from the above screenshot. Please pay special attention to the record with id=3. As you can see, the record with id=3 is deleted from the table after executing the “DELETE” command.

This meets our expectations and therefore, we can conclude this tutorial here.

Hope you guys like it.

Please stay tuned for more updates.

In the next tutorial, we are going to see how to insert a new row in the Hive table.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

3 thoughts on “Post 43 | HDPCD | Delete a row in a Hive table

  1. Hi,
    Very useful example.
    Can you please update this example with post41 table structure as i am getting error while inserting data into orc table.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: