Post 28 | HDPCD | Write and Execute a Hive Query

Hello everyone, Welcome to the first tutorial of the DATA ANALYSIS section of the HDPCD certification. This section is going to contain a total of 24 posts, after which we will be finally done with the HDPCD certification tutorials. In the last tutorial of the DATA TRANSFORMATION section, we saw the process of invoking aContinue reading “Post 28 | HDPCD | Write and Execute a Hive Query”

Post 14 | HDPCD | Data Transformation to match Hive Schema using Apache Pig

The last tutorial talked about transforming data by reducing the number of columns from input to output records. This tutorial is kind of similar, in which, we are going to take the data transformation process one step further. This tutorial focuses on matching your input records with the Hive table schema. This includes splitting theContinue reading “Post 14 | HDPCD | Data Transformation to match Hive Schema using Apache Pig”

Post 12 | HDPCD | Load data from Hive to Pig

Hello, everyone. Thanks for coming back! I Hope the tutorials are inspiring you to take each task seriously and perform each operation by understanding why we are performing each step. In the last tutorial, we saw how to create the Pig Relation with a defined schema. This tutorial is about creating a Pig Relation, but insteadContinue reading “Post 12 | HDPCD | Load data from Hive to Pig”

Post 5 | HDPCD | RDBMS to Hive Import

Hello everyone, in this tutorial, we are going to see the 3rd objective in data ingestion category. The objective is listed on Hortonworks website under data ingestion and looks like this. In the previous post, we imported data into HDFS, here, we are going to import the data directly into Hive table. So, let us begin.Continue reading “Post 5 | HDPCD | RDBMS to Hive Import”

Load XML File in Hive

We can load XML data in HIVE Table very easily just like simple delimited file. The only difference between loading Delimited File and XML File is we have to use Hive provided xpath UDF in order to extract the data residing within the tags. All the steps that I have used are committed in followingContinue reading “Load XML File in Hive”

Load CSV File in Hive Table

We can load CSV data into hive table with the help of CSV SERDE JAR FILE which is freely available. You can download it manually by clicking below text. Download CSV SERDE Jar File Here, we are trying to load two types of CSV data in hive table. First type of data contains header i.e.Continue reading “Load CSV File in Hive Table”

Hive 1.1.1 Installation on single node hadoop 2.6.0 on ubuntu 14.04

sudo su hduser cd sudo wget sudo cp apache-hive-1.1.1-bin.tar.gz /usr/local/ cd /usr/local/ sudo tar -xvf apache-hive-1.1.1-bin.tar.gz sudo mv apache-hive-1.1.1-bin hive sudo chown hduser:hadoop -R hive cd sudo nano ~/.bashrc #HIVE VARIABLES START export HIVE_HOME=/usr/local/hive export PATH=$PATH:$HIVE_HOME/bin #HIVE VARIABLES END ctrl+x Y <enter> source ~/.bashrc hive if you get some error related to found classContinue reading “Hive 1.1.1 Installation on single node hadoop 2.6.0 on ubuntu 14.04”

Set Hive in Local / Auto Mode

Setting Hive in “AUTO/LOCAL MODE” comes very handy when you are dealing with small amount of data. How this works? Well, suppose you are running some complex hive query, then you know that it will trigger MapReduce job in background and will give you the output. This approach works well if the data size isContinue reading “Set Hive in Local / Auto Mode”


Array<Datatype> is considered to be the complex datatype in case of HIVE which is used to store the data into internal/external tables using the data present in the local files. It is customary to specify the collections’ behavior while creating the table itself. We need to pass the argument which specifies which character is usedContinue reading “STORE ARRAY DATA IN HIVE INTERNAL TABLE”