Load XML File in Hive

We can load XML data in HIVE Table very easily just like simple delimited file.
The only difference between loading Delimited File and XML File is we have to use Hive provided xpath UDF in order to extract the data residing within the tags.

All the steps that I have used are committed in following file on GitHub GIST.
You can find below screenshots depicting the execution scenarios of those commands.

STEP 1 : CREATING INPUT XML FILE WHICH WE WILL LOAD IN HIVE TABLE
nano student.xml
<student> <id>1</id> <name>Milind</name> <age>25</age> </student>
<student> <id>2</id> <name>Ramesh</name> <age>Testing</age> </student>
STEP 2 : LOG IN TO HIVE
hive
STEP 3 : CREATING HIVE TABLE
create table student_xml( studinfo string) ;
STEP 4 : LOADING DATA INTO HIVE TABLE
load data local inpath '/home/hduser/student.xml' into table student_xml;
STEP 5 : QUERYING THE LOADED DATA
select * from student_xml;
STEP 6 : CREATING A VIEW ON TOP OF NEWLY CREATED HIVE TABLE FOR GETTING NEWLY ADDED RECORDS
create view student_xml_view as SELECT xpath_int(studinfo ,'student/id'),xpath_string(studinfo ,'student/name'),xpath_string(studinfo ,'student/age') FROM student_xml;
STEP 7 : QUERYING THE CREATED VIEW
select * from student_xml_view;
STEP 8 : ADDING ONE MORE FILE TO CHECK VIEW FUNCTIONALITY
load data local inpath '/home/hduser/student.xml' into table student_xml;
STEP 9 : QUERYING VIEW FOR INCREMENTAL RECORDS
select * from student_xml_view;

view raw
xml_to_hive.txt
hosted with ❤ by GitHub

Loading XML into Hive
Process Screenshot
VIEW records
Incremental Records via VIEW

Thank you for having a read.
Kindly revert back if you have any doubts or need more clarifications.

Published by milindjagre

I founded my blog www.milindjagre.co four years ago and am currently working as a Data Scientist Analyst at the Ford Motor Company. I graduated from the University of Connecticut pursuing Master of Science in Business Analytics and Project Management. I am working hard and learning a lot of new things in the field of Data Science. I am a strong believer of constant and directional efforts keeping the teamwork at the highest priority. Please reach out to me at milindjagre@gmail.com for further information. Cheers!

One thought on “Load XML File in Hive

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: