Hello, everyone. Welcome to the second post in the Data Analysis section of the HDPCD certification series.
In the last tutorial, we saw the three ways in which we run the hive commands. In this tutorial, we are going to create the hive-managed table i.e. hive internal table.
For creating a hive-managed or internal table, we are going to follow the process mentioned in the following infographics.
As you can see from the above picture, we are going to follow the four-step process for creating a hive-managed table. These four processes are as follows.
- Check whether the table already exists
- Create the table with defined schema
- Check whether table got created or not
- Check the schema of the table for verification
Let us start with each step one-by-one.
- CHECKING WHETHER THE TABLE ALREADY EXISTS
We should first check whether the table that we are going to create already exists or not. If it exists, then we will get an error saying so. If it doesn’t, then its good to move forward.
After logging into hive session with “hive” command, you can run “show tables;” command to check the list of the hive tables that already exists. Following is the command.
The execution of the above command looks something like this for my Hortonworks Sandbox.
As you can see from the above picture, there is a total of 30 tables already existing in hive’s “default” database. Since we are trying to create a table called post29, you can see that the table with that name does not exist, therefore we can move ahead to the next step of creating the hive-managed table post29.
- CREATING HIVE-MANAGED/INTERNAL TABLE
You can use the create table command in Hive to create the hive-managed table. The funny thing about this create command is that it resembles with SQL’s create table statement.
We are going to use the following schema for creating this hive internal table.
|create table post29(|
|row format delimited|
|fields terminated by ','|
|stored as textfile;|
Now, let me explain this command briefly.
create table post29(
The above statement defines the name of the table and the column names along with their respective data types. Here, the statement “create table” indicates that it is a hive-managed or internal table. There is a list of data types supported by hive which you can check at the hive documentation page.
The next statement is
row format delimited
This statement defined the formatting of each record. Here, we are telling Hive that each record corresponds to a ROW and therefore, we have mentioned it to be ROW format delimited.
fields terminated by ‘,’
Through this command, we are mentioning that the records that we are going to store in this table have columns separated by a comma (,).
stored as textfile;
This command tells Hive that the loaded data should be stored in the warehouse directory in the TEXTFILE format. I am going to explain this bit a little more in the future posts when I will talk about the process of loading the data in the hive table.
If the explanation looks fine to you, it is time now to execute this “create table” command. The screenshot depicting the command execution is as follows.
As you can see from the above screenshot, the hive internal table was created successfully and we got an “OK” message after the command execution.
Let us move on to the next step.
- CHECKING WHETHER HIVE TABLE SUCCESSFULLY CREATED OR NOT
You can run the same “show tables;” command to check whether the hive internal table was successfully created or not. The execution screenshot of this command looks as follows.
As can be seen from the screenshot, the post29 table was successfully created. Now, it is time to check the schema of the table post29.
- CHECKING SCHEMA OF THE NEWLY CREATED HIVE INTERNAL TABLE
The schema of the hive internal table can be checked with the help of either “desc” or “describe” command. The command for this case looks as follows.
You can use any of these commands to check the schema of the hive table post29.
The execution of the above commands looks as follows.
You can see from the above screenshot that the created schema matches with the defined schema while creating the hive internal table.
This completes the hive-managed/internal table creation.
Hope the screenshots and the commands help you in understanding the concepts.
Please visit my website www.milindjagre.com for further information.