Post 52 | HDPCD | The conclusion

Hi everyone. Finally, we have reached the end of this tutorial series. It's been so long. We started this journey together on January 15th, 2017, and, 276 days later this beautiful journey is coming to an end. But, we do not need to worry, because, I am working on something new and would love to … Continue reading Post 52 | HDPCD | The conclusion

Post 51 | HDPCD | Set Hadoop or Hive Configuration property

Hello, everyone. Welcome to the last technical tutorial in the HDPCD certification series. It's funny! This beautiful journey is coming to an end. In the last tutorial, we saw how to sort the output of a Hive query across multiple reducers. In this tutorial, we are going to see how to set a Hadoop or Hive configuration … Continue reading Post 51 | HDPCD | Set Hadoop or Hive Configuration property

Post 50 | HDPCD | Order Hive query output across multiple reducers

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to run a subquery within a Hive query. Let us begin, then. The following infographics show the step-by-step process of performing this operation. From … Continue reading Post 50 | HDPCD | Order Hive query output across multiple reducers

Post 48 | HDPCD | Printing the execution plan of a Hive query

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to print the execution plan of a Hive query. Let us begin, then. This is one of the simplest tutorials in this certification series. In … Continue reading Post 48 | HDPCD | Printing the execution plan of a Hive query

Post 47 | HDPCD | Run a Hive query using Vectorization

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to run a Hive Query using TeZ execution engine. In this tutorial, we are going to see how to run a Hive Query using Vectorization. Let us begin, then. Before starting off with the objective of this tutorial, let … Continue reading Post 47 | HDPCD | Run a Hive query using Vectorization

Post 27 | HDPCD | Invoke a User Defined Function in Apache Pig

Hello everyone, thanks for coming back to the last tutorial in the DATA TRANSFORMATION category of the HDPCD certification. We are going to pick-off things from the last tutorial, in which, we saw how to define an ALIAS to a function present in the JAR file. In this tutorial, we are going to see how to … Continue reading Post 27 | HDPCD | Invoke a User Defined Function in Apache Pig

Post 26 | HDPCD | Define an ALIAS for a User Defined Function

Hi, everyone. Thank you for returning again to this certification series. In the last tutorial, we saw the process of registering the jar file in the Apache PIG session. This tutorial is an extension to the previous one and in this, we are going to see how to define an alias for the UDF present … Continue reading Post 26 | HDPCD | Define an ALIAS for a User Defined Function

Post 17 | HDPCD | Storing Pig Relation in HDFS Directory

Thanks for coming back for the next tutorial in the HDPCD certification series. In the last tutorial, we saw how to remove the records with the NULL values, whereas in this tutorial, we are going to see the process of storing the output of a Pig Relation in the HDFS directory. This is one of … Continue reading Post 17 | HDPCD | Storing Pig Relation in HDFS Directory

Post 16 | HDPCD | Removing records with NULL values from a Pig Relation

Removing records with NULL values from a Pig relation

Post 15 | HDPCD | Group Data in one or more PIG Relations

Grouping in Apache pig