Post 52 | HDPCD | The conclusion

Hi everyone. Finally, we have reached the end of this tutorial series. It's been so long. We started this journey together on January 15th, 2017, and, 276 days later this beautiful journey is coming to an end. But, we do not need to worry, because, I am working on something new and would love to … Continue reading Post 52 | HDPCD | The conclusion

Post 48 | HDPCD | Printing the execution plan of a Hive query

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to print the execution plan of a Hive query. Let us begin, then. This is one of the simplest tutorials in this certification series. In … Continue reading Post 48 | HDPCD | Printing the execution plan of a Hive query

Spark + Python : Passing Function

In this tutorial, we are going to various ways in which we pass functions in Spark using Python API. I have shown two ways in which functions can be called/created (for user-defined function). We are going to do the comparison based on filtering capabilities of Spark. For doing this I have created a user-defined function called containsMilind() which … Continue reading Spark + Python : Passing Function

Spark 1.6.1 Installation on Ubuntu 14.04 and Hadoop 2.6.0

Hi friends, I have started learning Apache Spark and it is time now to share few things with you. In this series related to Spark and Scala, we are going to see essential things that we should know regarding both Spark and Scala. So as I always begin, we are going to start with Spark … Continue reading Spark 1.6.1 Installation on Ubuntu 14.04 and Hadoop 2.6.0