Post 31 | HDPCD | Defining a Partitioned Hive table

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to create the hive external table. In this tutorial, we are going to see how to create a partitioned Hive table. For doing this, we are going to follow the following process. As you can see … Continue reading Post 31 | HDPCD | Defining a Partitioned Hive table

Post 30 | HDPCD | Define a Hive External Table

Hello, everyone! Welcome to the third tutorial in the Data Analysis section of the HDPCD certification. In the last tutorial, we saw how to create the hive-managed or internal table. In this tutorial, we are going to create the hive external table. So, let us start with the process. The following infographics show the process … Continue reading Post 30 | HDPCD | Define a Hive External Table

Post 29 | HDPCD | Define a Hive-managed Table

Hello, everyone. Welcome to the second post in the Data Analysis section of the HDPCD certification series. In the last tutorial, we saw the three ways in which we run the hive commands. In this tutorial, we are going to create the hive-managed table i.e. hive internal table. For creating a hive-managed or internal table, … Continue reading Post 29 | HDPCD | Define a Hive-managed Table

Post 28 | HDPCD | Write and Execute a Hive Query

Hello everyone, Welcome to the first tutorial of the DATA ANALYSIS section of the HDPCD certification. This section is going to contain a total of 24 posts, after which we will be finally done with the HDPCD certification tutorials. In the last tutorial of the DATA TRANSFORMATION section, we saw the process of invoking a … Continue reading Post 28 | HDPCD | Write and Execute a Hive Query

Post 25 | HDPCD | Register a Jar file of UDF in Apache Pig

Hello, everyone. Thanks for coming back again to continue with this certification series. In the last tutorial, we saw how to run any pig script with TEZ as the execution mode. In this tutorial, we are going to see how to register a JAR file to use the User Defined Function written and packages inside it. … Continue reading Post 25 | HDPCD | Register a Jar file of UDF in Apache Pig

Post 24 | HDPCD | Run a Pig job using TEZ

Hey, everyone. Thank you for giving me company on this beautiful journey of HDPCD certification. We are almost done with the Data Transformation section of the certification and are only left with Data Analysis section using Apache Hive. The section of Data Analysis, in my opinion, is easier than this section so you can say … Continue reading Post 24 | HDPCD | Run a Pig job using TEZ

Post 23 | HDPCD | Perform a REPLICATED JOIN using Apache Pig

Hey everyone, thank you once again for keep on coming back to perform these tutorials. In the last tutorial, we saw how to perform the simple JOIN Operation and in this tutorial, we are going to perform the REPLICATED JOIN Operation.  The process is similar and there is a difference only at one place, so … Continue reading Post 23 | HDPCD | Perform a REPLICATED JOIN using Apache Pig

Post 22 | HDPCD | Join two datasets using Apache Pig

Hey, everyone. Thanks for the overwhelming response to the blog posts that I am receiving since the last week. I really appreciate it. I will keep on posting interesting and innovative contents for you. In the last tutorial, we saw how to use the parallel features of Apache Pig in two ways. In this tutorial, … Continue reading Post 22 | HDPCD | Join two datasets using Apache Pig

Post 21 | HDPCD| Specify number of reduce tasks for Pig MapReduce job

Hello everyone. Thanks for coming back to one more tutorial in this HDPCD certification series. In the last tutorial, we saw how to remove the duplicate tuples from a pig relation. In this tutorial, we are going to see how to specify the number of reduce tasks for a Pig MapReduce job. Let us get started […]

Post 20 | HDPCD | Removing Duplicate tuples from a PIG Relation

Hi everyone, welcome to one more tutorial in this HDPCD certification series. As you might notice, I have changed the blog layout a little bit, hope you like it. Kindly let me know your feedback on this in the COMMENT SECTION. In the last tutorial, we saw how to perform the SORT OPERATION in Apache … Continue reading Post 20 | HDPCD | Removing Duplicate tuples from a PIG Relation