This tutorial is sort of an introduction to the action in spark.
We have seen transformations like map() and flatMap() till now. reduce is one of the actions provided by spark.
In this, we are going to perform an addition operation with the help of reduce action.
We are going to follow below steps for achieving this.
we follow above-mentioned steps to perform the addition operation with the help of the reduce action.
We take input as series/list of numbers, parallelize it, and then perform the reduce action as an add function.
Following code uploaded on github explains this approach.
|from pyspark import SparkConf, SparkContext|
|conf = SparkConf().setMaster("local").setAppName("Sum")|
|sc = SparkContext(conf = conf)|
|numbers = sc.parallelize([1, 2, 3, 4, 5])|
|sum = numbers.reduce(lambda x, y: x + y)|
|print "SUM IS :", sum|
Below screenshot shows the code written in Notepad++.
As we have seen in previous tutorials, we run above code with the help of spark-submit command.
Below screenshot shows the command used for running this code.
Once you run above command, you get the output shown in below screenshot.
Above screenshot shows us the output of the code we ran, summation of number 1, 2, 3, 4 and 5 which comes up to 15.
This is the way we implement reduce action.
Hope this helps.