Post 23 | HDPCD | Perform a REPLICATED JOIN using Apache Pig

Hey everyone, thank you once again for keep on coming back to perform these tutorials. In the last tutorial, we saw how to perform the simple JOIN Operation and in this tutorial, we are going to perform the REPLICATED JOIN Operation. ┬áThe process is similar and there is a difference only at one place, soContinue reading “Post 23 | HDPCD | Perform a REPLICATED JOIN using Apache Pig”

Post 22 | HDPCD | Join two datasets using Apache Pig

Hey, everyone. Thanks for the overwhelming response to the blog posts that I am receiving since the last week. I really appreciate it. I will keep on posting interesting and innovative contents for you. In the last tutorial, we saw how to use the parallel features of Apache Pig in two ways. In this tutorial,Continue reading “Post 22 | HDPCD | Join two datasets using Apache Pig”

Post 21 | HDPCD| Specify number of reduce tasks for Pig MapReduce job

Hello everyone. Thanks for coming back to one more tutorial in this HDPCD certification series. In the last tutorial, we saw how to remove the duplicate tuples from a pig relation. In this tutorial, we are going to see how to specify the┬ánumber of reduce tasks for a Pig MapReduce job. Let us get started […]

Post 20 | HDPCD | Removing Duplicate tuples from a PIG Relation

Hi everyone, welcome to one more tutorial in this HDPCD certification series. As you might notice, I have changed the blog layout a little bit, hope you like it. Kindly let me know your feedback on this in the COMMENT SECTION. In the last tutorial, we saw how to perform the SORT OPERATION in ApacheContinue reading “Post 20 | HDPCD | Removing Duplicate tuples from a PIG Relation”

Post 19 | HDPCD | Sort the output of a Pig Relation

Hi everyone, thanks for coming back again to continue with this tutorial series. We are almost there with this section, and once we are done with this, we will jump into Hive, which will not take much time. In the last tutorial, we saw the process to store the data from PIG to HIVE usingContinue reading “Post 19 | HDPCD | Sort the output of a Pig Relation”

Post 18 | HDPCD | Storing Pig Relation in Hive Table

Wassup everyone. Thanks for coming back once again. This section is coming to an end with less than 10 tutorials remaining. Once we are done with this section (Apache Pig), we will start with the next section which focuses on Apache Hive. In the last tutorial, we saw the process of storing the data storedContinue reading “Post 18 | HDPCD | Storing Pig Relation in Hive Table”

Post 17 | HDPCD | Storing Pig Relation in HDFS Directory

Thanks for coming back for the next tutorial in the HDPCD certification series. In the last tutorial, we saw how to remove the records with the NULL values, whereas in this tutorial, we are going to see the process of storing the output of a Pig Relation in the HDFS directory. This is one ofContinue reading “Post 17 | HDPCD | Storing Pig Relation in HDFS Directory”

Post 16 | HDPCD | Removing records with NULL values from a Pig Relation

Removing records with NULL values from a Pig relation

Post 15 | HDPCD | Group Data in one or more PIG Relations

Grouping in Apache pig

Post 14 | HDPCD | Data Transformation to match Hive Schema using Apache Pig

The last tutorial talked about transforming data by reducing the number of columns from input to output records. This tutorial is kind of similar, in which, we are going to take the data transformation process one step further. This tutorial focuses on matching your input records with the Hive table schema. This includes splitting theContinue reading “Post 14 | HDPCD | Data Transformation to match Hive Schema using Apache Pig”