Hello everyone, in this tutorial, we are going to see the 3rd objective in data ingestion category.
The objective is listed on Hortonworks website under data ingestion and looks like this.
In the previous post, we imported data into HDFS, here, we are going to import the data directly into Hive table.
So, let us begin.
|sqoop import \|
|–connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db \|
|–username retail_dba \|
|–password hadoop \|
The above command to import data into Hive is similar to the one we saw in the previous post, except the last two lines. We will see what last two lines mean.
Line 5: – -hive-import \
Explanation: This option indicates that the Sqoop Import operation is to import the data into hive table, not in HDFS directory. This is the line which enables data population in hive table from RDBMS data source.
Line 6: – -table customers
Explanation: – -table flag is followed by the hive table name. The data that is being imported should be put in this table name in hive. If the hive table does not exist, then it will be created automatically and if there is some already existing data in the hive table, then the newly imported data will be appended and not overwritten. You must make sure that the data which is imported follow the hive table column format correctly.
This hive import is executed by importing data in hive warehouse HDFS directory, so that hive table automatically gets loaded with imported data.
Now, let us see the MySQL table which is going to be imported.
Following screenshot might be helpful for this.
As can be seen from above screenshot, the table name is customers whereas the database name is retail_db. The customers table has 9 columns in it with datatypes like int and varchar. I am pointing this out now itself so that once we import the data into the hive table, it will be easier for us to compare the two tables.
After confirming with MySQL, it is time to check hive database. We will see whether the table already exists or not. For that, we will log into hive database server and run show tables command as follows.
hive> show tables;
The screenshot below shows the output of above command.
As you can see from above screenshot, currently, customers table does not exist in hive database.
Now, it is time to execute sqoop import command. The sqoop command is already shown above. Following screenshot shows the sqoop command execution.
Above sqoop command gives us the following output on the terminal window.
As you can see, the screenshot indicates that there is a total of 12435 records which were imported into Hive Table.
Now, let us check the same by logging into hive database.
As you can see from above screenshot, the newly created/imported customers table got listed in show tables; command output. It indicates that the table was created successfully by the sqoop import command.
Now, let us check the structure of the table, whether it contains 9 columns or not, and their respective datatypes.
As you can see, customers table contains 9 columns and their datatypes also match with the corresponding columns’ datatypes in MySQL table.
Now, the next step is to check for 10 sample records and see the import process executed smoothly without any glitches.
It looks like the records were imported correctly.
Now, it is time to check the total number of records in hive table and see whether it is equal to 12435 or not.
Wow, the total number of records in hive table exactly matches with that of MySQL table.
This indicates that our sqoop import command as a whole executed successfully without any issues/problems.
The last thing that I would like to show over here is that if the table already exists and it contains some records, then how sqoop import behaves.
Let us run the sqoop import command one more time and then check hive table. Following is the hive table status.
As you can see, the total number of records in a hive table doubled after running the same import command one more time. It is evident from this that sqoop will append the newly imported records to hive table and in the process, the total number of records in hive table will increase.
I believe, this finally completes our sqoop import into hive table tutorial.
I hope that all the screenshots and proper description will help you out in figuring out how it works.
Suggestions and edits are welcome.
Stay tuned for the further updates.
Kindly subscribe to my YouTube channel for vlogs and follow my blog for further updates.