Post 2 | Installations – R and Python

Hello, everyone, we are going to start off learning the concepts of Machine Learning. If you are following my blog posts on Hadoop and Big Data Analytics, then you will come to know I do give more importance on performing the hands-on exercises. Same is going to be the case for these tutorials. Here, weContinue reading “Post 2 | Installations – R and Python”

Post 52 | HDPCD | The conclusion

Hi everyone. Finally, we have reached the end of this tutorial series. It’s been so long. We started this journey together on January 15th, 2017, and, 276 days later this beautiful journey is coming to an end. But, we do not need to worry, because, I am working on something new and would love toContinue reading “Post 52 | HDPCD | The conclusion”

Post 48 | HDPCD | Printing the execution plan of a Hive query

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the┬álast┬átutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to print the execution plan of a Hive query. Let us begin, then. This is one of the simplest┬átutorials in this certification series. InContinue reading “Post 48 | HDPCD | Printing the execution plan of a Hive query”

Spark + Python – Filter Operation

This is the first program in Spark + Python series. In this tutorial, we are going to see the Filter operation. The objective of this tutorial is to print only those lines containing specified keyword. For doing this, we are going to follow below steps. But before diving into actual operations, please look for theContinue reading “Spark + Python – Filter Operation”