Post 52 | HDPCD | The conclusion

Hi everyone. Finally, we have reached the end of this tutorial series. It's been so long. We started this journey together on January 15th, 2017, and, 276 days later this beautiful journey is coming to an end. But, we do not need to worry, because, I am working on something new and would love to … Continue reading Post 52 | HDPCD | The conclusion

Post 51 | HDPCD | Set Hadoop or Hive Configuration property

Hello, everyone. Welcome to the last technical tutorial in the HDPCD certification series. It's funny! This beautiful journey is coming to an end. In the last tutorial, we saw how to sort the output of a Hive query across multiple reducers. In this tutorial, we are going to see how to set a Hadoop or Hive configuration … Continue reading Post 51 | HDPCD | Set Hadoop or Hive Configuration property

Post 50 | HDPCD | Order Hive query output across multiple reducers

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to run a subquery within a Hive query. Let us begin, then. The following infographics show the step-by-step process of performing this operation. From … Continue reading Post 50 | HDPCD | Order Hive query output across multiple reducers

Post 48 | HDPCD | Printing the execution plan of a Hive query

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to print the execution plan of a Hive query. Let us begin, then. This is one of the simplest tutorials in this certification series. In … Continue reading Post 48 | HDPCD | Printing the execution plan of a Hive query

Post 47 | HDPCD | Run a Hive query using Vectorization

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to run a Hive Query using TeZ execution engine. In this tutorial, we are going to see how to run a Hive Query using Vectorization. Let us begin, then. Before starting off with the objective of this tutorial, let … Continue reading Post 47 | HDPCD | Run a Hive query using Vectorization

Post 30 | HDPCD | Define a Hive External Table

Hello, everyone! Welcome to the third tutorial in the Data Analysis section of the HDPCD certification. In the last tutorial, we saw how to create the hive-managed or internal table. In this tutorial, we are going to create the hive external table. So, let us start with the process. The following infographics show the process … Continue reading Post 30 | HDPCD | Define a Hive External Table

Post 14 | HDPCD | Data Transformation to match Hive Schema using Apache Pig

The last tutorial talked about transforming data by reducing the number of columns from input to output records. This tutorial is kind of similar, in which, we are going to take the data transformation process one step further. This tutorial focuses on matching your input records with the Hive table schema. This includes splitting the … Continue reading Post 14 | HDPCD | Data Transformation to match Hive Schema using Apache Pig

Spark 1.6.1 Installation on Ubuntu 14.04 and Hadoop 2.6.0

Hi friends, I have started learning Apache Spark and it is time now to share few things with you. In this series related to Spark and Scala, we are going to see essential things that we should know regarding both Spark and Scala. So as I always begin, we are going to start with Spark … Continue reading Spark 1.6.1 Installation on Ubuntu 14.04 and Hadoop 2.6.0

Hive 1.1.1 Installation on single node hadoop 2.6.0 on ubuntu 14.04

sudo su hduser cd sudo wget http://mirror.tcpdiag.net/apache/hive/hive-1.1.1/apache-hive-1.1.1-bin.tar.gz sudo cp apache-hive-1.1.1-bin.tar.gz /usr/local/ cd /usr/local/ sudo tar -xvf apache-hive-1.1.1-bin.tar.gz sudo mv apache-hive-1.1.1-bin hive sudo chown hduser:hadoop -R hive cd sudo nano ~/.bashrc #HIVE VARIABLES START export HIVE_HOME=/usr/local/hive export PATH=$PATH:$HIVE_HOME/bin #HIVE VARIABLES END ctrl+x Y <enter> source ~/.bashrc hive if you get some error related to found class … Continue reading Hive 1.1.1 Installation on single node hadoop 2.6.0 on ubuntu 14.04

Single Node Hadoop Installation on Ubuntu 14.04

Following are the steps which should be followed in order to install Hadoop 2.6.0 on ubuntu 14.04 open terminal window -------------------------------------------------------------------------------------------------------------------------------- step 1 : install java run following commands command 1 : sudo apt-get update command 2 : sudo apt-get install default-jdk this will install JAVA in your system command 1 : javac -version command … Continue reading Single Node Hadoop Installation on Ubuntu 14.04