We can load CSV data into hive table with the help of CSV SERDE JAR FILE which is freely available. You can download it manually by clicking below text.
Here, we are trying to load two types of CSV data in hive table.
First type of data contains header i.e. first line in the file is header information and Second type of CSV file contains only data and no header information is given.
Following script will load CSV data containing header as first line in hive table called csvtohive
# Assuming we are using hduser for doing Hadoop Operations | |
cd /home/hduser | |
# Loading .bashrc file in order to access HIVE Installation Directory Path | |
source .bashrc | |
# Removing already existing CSV SERDE JAR FILE | |
rm csv-serde-1.1.2-0.11.0-all.jar | |
# Downloading CSV SERDE JAR FILE during run time | |
wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar | |
# Copying Downloaded CSV SERDE FILE to HIVE lib Directory | |
cp csv-serde-1.1.2-0.11.0-all.jar $HIVE_HOME/lib | |
# Creating Input Data File | |
echo "column1,column2" > input.txt | |
echo "1,this is first line" >> input.txt | |
echo "2,this is second line" >> input.txt | |
# Creating Hive Table by using CSV SERDE and skipping the first line since it is an header. At last printing records in hive table. | |
hive -e "drop table csvtohive;create table csvtohive(column1 string,column2 string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile tblproperties ('skip.header.line.count'='1');LOAD DATA LOCAL INPATH '/home/hduser/input.txt' INTO TABLE csvtohive;select * from csvtohive;" |
After this,
Below script will load CSV data (without containing header) in hive table called csvtohive.
# Assuming we are using hduser for doing Hadoop Operations | |
cd /home/hduser | |
# Loading .bashrc file in order to access HIVE Installation Directory Path | |
source .bashrc | |
# Removing already existing CSV SERDE JAR FILE | |
rm csv-serde-1.1.2-0.11.0-all.jar | |
# Downloading CSV SERDE JAR FILE during run time | |
wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar | |
# Copying Downloaded CSV SERDE FILE to HIVE lib Directory | |
cp csv-serde-1.1.2-0.11.0-all.jar $HIVE_HOME/lib | |
# Creating Input Data File | |
echo "1,this is first line" > input.txt | |
echo "2,this is second line" >> input.txt | |
# Creating Hive Table by using CSV SERDE. At last printing records in hive table. | |
hive -e "drop table csvtohive;create table csvtohive(column1 string,column2 string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile;LOAD DATA LOCAL INPATH '/home/hduser/input.txt' INTO TABLE csvtohive;select * from csvtohive;" |
Thanks for having a read.
I hope the explanation makes the scripts easier to understand.
One thought on “Load CSV File in Hive Table”