Post 7 | ML| Data Preprocessing – Part 6

Hello, everyone. Welcome to the last tutorial in the Data Preprocessing portion of the Machine Learning tutorials. In the last tutorial, we saw how to create the TRAINING and TEST data sets for model building purposes. In this tutorial, we are going to see why and how to perform the Feature Scaling. Let us begin, then. To refresh … Continue reading Post 7 | ML| Data Preprocessing – Part 6

Post 7 | ML | Data Preprocessing – Part 5

Hello, everyone. Thanks for joining me in this 5th tutorial of the Data Preprocessing part of the Machine Learning tutorials. In the last tutorial, we saw how to convert the CATEGORICAL VARIABLES from the STRING format to an INTEGER format. In this tutorial, we are going a step ahead and are going to split the original data … Continue reading Post 7 | ML | Data Preprocessing – Part 5

Post 2 | Installations – R and Python

Hello, everyone, we are going to start off learning the concepts of Machine Learning. If you are following my blog posts on Hadoop and Big Data Analytics, then you will come to know I do give more importance on performing the hands-on exercises. Same is going to be the case for these tutorials. Here, we … Continue reading Post 2 | Installations – R and Python

Post 52 | HDPCD | The conclusion

Hi everyone. Finally, we have reached the end of this tutorial series. It's been so long. We started this journey together on January 15th, 2017, and, 276 days later this beautiful journey is coming to an end. But, we do not need to worry, because, I am working on something new and would love to … Continue reading Post 52 | HDPCD | The conclusion

Post 48 | HDPCD | Printing the execution plan of a Hive query

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to print the execution plan of a Hive query. Let us begin, then. This is one of the simplest tutorials in this certification series. In … Continue reading Post 48 | HDPCD | Printing the execution plan of a Hive query

Post 43 | HDPCD | Delete a row in a Hive table

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to update a row in a Hive table. In this tutorial, we are going to see how to delete a row in the Hive table. It is quite interesting to see that Hive supports ACID operations … Continue reading Post 43 | HDPCD | Delete a row in a Hive table

Post 2 | Machine Learning | Installations – R and Python

Hello, everyone, we are going to start off learning the concepts of Machine Learning. If you are following my blog posts on Hadoop and Big Data Analytics, then you will come to know I do give more importance on performing the hands-on exercises. Same is going to be the case for these tutorials. Here, we … Continue reading Post 2 | Machine Learning | Installations – R and Python

Spark + Python : reduce action

This tutorial is sort of an introduction to the action in spark. We have seen transformations like map() and flatMap() till now. reduce is one of the actions provided by spark. In this, we are going to perform an addition operation with the help of reduce action. We are going to follow below steps for achieving … Continue reading Spark + Python : reduce action

Spark : map() and flatMap()

Hi guys, Hope you are finding the tutorials helpful. In this tutorial, we are going to see the two transformations which we are going to use a lot while learning Spark. Both map() and flatMap() functions are transformations in Spark. We will discuss these two transformations one by one. Then will see the similarities between … Continue reading Spark : map() and flatMap()

Spark + Python : Passing Function

In this tutorial, we are going to various ways in which we pass functions in Spark using Python API. I have shown two ways in which functions can be called/created (for user-defined function). We are going to do the comparison based on filtering capabilities of Spark. For doing this I have created a user-defined function called containsMilind() which … Continue reading Spark + Python : Passing Function