Post 52 | HDPCD | The conclusion

Hi everyone. Finally, we have reached the end of this tutorial series. It's been so long. We started this journey together on January 15th, 2017, and, 276 days later this beautiful journey is coming to an end. But, we do not need to worry, because, I am working on something new and would love to … Continue reading Post 52 | HDPCD | The conclusion

Post 48 | HDPCD | Printing the execution plan of a Hive query

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the┬álast┬átutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to print the execution plan of a Hive query. Let us begin, then. This is one of the simplest┬átutorials in this certification series. In … Continue reading Post 48 | HDPCD | Printing the execution plan of a Hive query

Post 9 | HDPCD | Pig Script Execution

This is the first post in Data Transformation category which is essential to clear the HDPCD certification, given by Hortonworks Inc. In the last eight tutorials, we focused on Data Ingestion tasks. The next twenty-one, yeah, that's right, I said next twenty-one tutorial, including this one, will focus on the Data Transformation category of the … Continue reading Post 9 | HDPCD | Pig Script Execution

WordCount in Spark

Hello friends, Today we are going to implement the very famous WordCount code in Spark in spark-shell. For folks who are not familiar with WordCount, in this implementation, we count the occurrences of each word and as a result present a pair of word and their respective count. For example, if my input is as … Continue reading WordCount in Spark