Post 13 | HDPCD | Data Transformation using Apache Pig

In the previous tutorial, we saw how to load the data from Apache Hive to Apache Pig. If you remember, we used HCatalog for performing that operation. In this tutorial, we are going to see the process of doing the data transformation using Apache Pig. The process of data transformation itself is too involved andContinue reading “Post 13 | HDPCD | Data Transformation using Apache Pig”

Post 12 | HDPCD | Load data from Hive to Pig

Hello, everyone. Thanks for coming back! I Hope the tutorials are inspiring you to take each task seriously and perform each operation by understanding why we are performing each step. In the last tutorial, we saw how to create the Pig Relation with a defined schema. This tutorial is about creating a Pig Relation, but insteadContinue reading “Post 12 | HDPCD | Load data from Hive to Pig”

Post 11 | HDPCD | Load Pig Relation WITH schema

In the previous tutorial, we saw how to load the Pig Relation without a defined schema. In this tutorial, we are going to load a Pig Relation with a properly defined schema. It is exactly similar to the last tutorial, except for one step, which I will discuss in a moment. Please have a look at theContinue reading “Post 11 | HDPCD | Load Pig Relation WITH schema”

Post 10 | HDPCD | Load Pig Relation WITHOUT schema

  Hello everyone, hope you are finding the tutorials useful. In the previous tutorial, we started off with Data Transformation category of the HDPCD certification. This tutorial, being the second objective in this category, focuses on creating a sample pig relation without the schema. Before, starting with the actual process, let us define what isContinue reading “Post 10 | HDPCD | Load Pig Relation WITHOUT schema”

Post 9 | HDPCD | Pig Script Execution

This is the first post in Data Transformation category which is essential to clear the HDPCD certification, given by Hortonworks Inc. In the last eight tutorials, we focused on Data Ingestion tasks. The next twenty-one, yeah, that’s right, I said next twenty-one tutorial, including this one, will focus on the Data Transformation category of theContinue reading “Post 9 | HDPCD | Pig Script Execution”

Post 8 | HDPCD | Configure Flume Memory Channel

In the last tutorial, we saw the process to start the flume agent. This tutorial is an extension to the previous tutorial, so please refer to it before getting started with this tutorial. The last tutorial enabled us to start the flume agent, after which we can send the messages that we want over flumeContinue reading “Post 8 | HDPCD | Configure Flume Memory Channel”

Post 7 | HDPCD | Starting Flume Agent

Hello everyone, hope you are finding the tutorials quite useful. In the previous post, we performed the Sqoop Export operation. In this tutorial, we are going to start the flume agent. Flume is one of the projects of Apache Ecosystem. Apache Flume is a reliable and distributed service for moving a large amount of logContinue reading “Post 7 | HDPCD | Starting Flume Agent”

Post 6 | HDPCD | Sqoop Export

Hi everyone, hope you are finding these tutorials quite helpful. Today, we are going to target the 4th objective in data ingestion category of the HDPCD certification. We are going to perform the Sqoop Export operation. In this tutorial, we saw the sqoop import operation, which is the reverse of the sqoop export operation. So, letContinue reading “Post 6 | HDPCD | Sqoop Export”

Post 5 | HDPCD | RDBMS to Hive Import

Hello everyone, in this tutorial, we are going to see the 3rd objective in data ingestion category. The objective is listed on Hortonworks website under data ingestion and looks like this. In the previous post, we imported data into HDFS, here, we are going to import the data directly into Hive table. So, let us begin.Continue reading “Post 5 | HDPCD | RDBMS to Hive Import”

Post 4 | HDPCD | Free-form Query Import

Hello everyone, in this tutorial, we are going to see the 2nd objective in data ingestion category. The objective is listed on Hortonworks website under data ingestion and looks like this. In the previous objective, we imported entire records from a MySQL table, whereas in this post, we are going to import data based onContinue reading “Post 4 | HDPCD | Free-form Query Import”