Post 1 | ML | Introduction

Hello, people. In this new tutorial series, we are going to talk about the different aspects of the Machine Learning. As an aspiring Data Scientist, I always wanted to get my hands dirty with the concepts of Machine Learning and the Summar Break gave me exactly what I wanted - "TIME TO LEARN MACHINE LEARNING … Continue reading Post 1 | ML | Introduction

Spark + Python : reduce action

This tutorial is sort of an introduction to the action in spark. We have seen transformations like map() and flatMap() till now. reduce is one of the actions provided by spark. In this, we are going to perform an addition operation with the help of reduce action. We are going to follow below steps for achieving … Continue reading Spark + Python : reduce action

Spark : map() and flatMap()

Hi guys, Hope you are finding the tutorials helpful. In this tutorial, we are going to see the two transformations which we are going to use a lot while learning Spark. Both map() and flatMap() functions are transformations in Spark. We will discuss these two transformations one by one. Then will see the similarities between … Continue reading Spark : map() and flatMap()

Spark + Python : Passing Function

In this tutorial, we are going to various ways in which we pass functions in Spark using Python API. I have shown two ways in which functions can be called/created (for user-defined function). We are going to do the comparison based on filtering capabilities of Spark. For doing this I have created a user-defined function called containsMilind() which … Continue reading Spark + Python : Passing Function

Spark + Python : Union Operation

In this tutorial, we are going to see how the Union operation works. In English Language, union means combining two things. Here, we are also going to do the same thing. The difference is, we are going to attach two RDDs using Union operation. We are using the same input.txt file we used in last tutorial. … Continue reading Spark + Python : Union Operation

Spark + Python – Filter Operation

This is the first program in Spark + Python series. In this tutorial, we are going to see the Filter operation. The objective of this tutorial is to print only those lines containing specified keyword. For doing this, we are going to follow below steps. But before diving into actual operations, please look for the … Continue reading Spark + Python – Filter Operation

Spark + Python – Tools Setup

In this series, we are going to talk about the simple concepts and basic spark programming with Python API. For doing our development work faster and easier, we are going to use some basic tools and software. The tools that we are talking about are Notepad ++ Putty We use Putty to connect to the … Continue reading Spark + Python – Tools Setup

WordCount in Spark

Hello friends, Today we are going to implement the very famous WordCount code in Spark in spark-shell. For folks who are not familiar with WordCount, in this implementation, we count the occurrences of each word and as a result present a pair of word and their respective count. For example, if my input is as … Continue reading WordCount in Spark