Hi everyone. Finally, we have reached the end of this tutorial series. It’s been so long. We started this journey together on January 15th, 2017, and, 276 days later this beautiful journey is coming to an end. But, we do not need to worry, because, I am working on something new and would love toContinue reading “Post 52 | HDPCD | The conclusion”
Tag Archives: beginners in hadoop
Post 51 | HDPCD | Set Hadoop or Hive Configuration property
Hello, everyone. Welcome to the last technical tutorial in the HDPCD certification series. It’s funny! This beautiful journey is coming to an end. In the last tutorial, we saw how to sort the output of a Hive query across multiple reducers. In this tutorial, we are going to see how to set a Hadoop or Hive configurationContinue reading “Post 51 | HDPCD | Set Hadoop or Hive Configuration property”
Post 50 | HDPCD | Order Hive query output across multiple reducers
Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to run a subquery within a Hive query. Let us begin, then. The following infographics show the step-by-step process of performing this operation. FromContinue reading “Post 50 | HDPCD | Order Hive query output across multiple reducers”
Post 48 | HDPCD | Printing the execution plan of a Hive query
Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to print the execution plan of a Hive query. Let us begin, then. This is one of the simplest tutorials in this certification series. InContinue reading “Post 48 | HDPCD | Printing the execution plan of a Hive query”
Post 47 | HDPCD | Run a Hive query using Vectorization
Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to run a Hive Query using TeZ execution engine. In this tutorial, we are going to see how to run a Hive Query using Vectorization. Let us begin, then. Before starting off with the objective of this tutorial, letContinue reading “Post 47 | HDPCD | Run a Hive query using Vectorization”
Post 27 | HDPCD | Invoke a User Defined Function in Apache Pig
Hello everyone, thanks for coming back to the last tutorial in the DATA TRANSFORMATION category of the HDPCD certification. We are going to pick-off things from the last tutorial, in which, we saw how to define an ALIAS to a function present in the JAR file. In this tutorial, we are going to see how toContinue reading “Post 27 | HDPCD | Invoke a User Defined Function in Apache Pig”
Post 26 | HDPCD | Define an ALIAS for a User Defined Function
Hi, everyone. Thank you for returning again to this certification series. In the last tutorial, we saw the process of registering the jar file in the Apache PIG session. This tutorial is an extension to the previous one and in this, we are going to see how to define an alias for the UDF presentContinue reading “Post 26 | HDPCD | Define an ALIAS for a User Defined Function”
Post 17 | HDPCD | Storing Pig Relation in HDFS Directory
Thanks for coming back for the next tutorial in the HDPCD certification series. In the last tutorial, we saw how to remove the records with the NULL values, whereas in this tutorial, we are going to see the process of storing the output of a Pig Relation in the HDFS directory. This is one ofContinue reading “Post 17 | HDPCD | Storing Pig Relation in HDFS Directory”
Post 16 | HDPCD | Removing records with NULL values from a Pig Relation
Removing records with NULL values from a Pig relation
Post 15 | HDPCD | Group Data in one or more PIG Relations
Grouping in Apache pig