Spark + Python : reduce action

This tutorial is sort of an introduction to the action in spark. We have seen transformations like map() and flatMap() till now. reduce is one of the actions provided by spark. In this, we are going to perform an addition operation with the help of reduce action. We are going to follow below steps for achievingContinue reading “Spark + Python : reduce action”

Spark + Python : Passing Function

In this tutorial, we are going to various ways in which we pass functions in Spark using Python API. I have shown two ways in which functions can be called/created (for user-defined function). We are going to do the comparison based on filtering capabilities of Spark. For doing this I have created a user-defined function called containsMilind() whichContinue reading “Spark + Python : Passing Function”

Spark + Python : Union Operation

In this tutorial, we are going to see how the Union operation works. In English Language, union means combining two things. Here, we are also going to do the same thing. The difference is, we are going to attach two RDDs using Union operation. We are using the same input.txt file we used in last tutorial.Continue reading “Spark + Python : Union Operation”

Spark + Python – Filter Operation

This is the first program in Spark + Python series. In this tutorial, we are going to see the Filter operation. The objective of this tutorial is to print only those lines containing specified keyword. For doing this, we are going to follow below steps. But before diving into actual operations, please look for theContinue reading “Spark + Python – Filter Operation”

Spark + Python – Tools Setup

In this series, we are going to talk about the simple concepts and basic spark programming with Python API. For doing our development work faster and easier, we are going to use some basic tools and software. The tools that we are talking about are Notepad ++ Putty We use Putty to connect to theContinue reading “Spark + Python – Tools Setup”

Top 10 Twitter Trending Topics using JAVA twitter4j API

Hi friends, Today we are going to find out the Top 10 Worldwide Trending Topics on Twitter with the help of twitter4j Java API. The step by step process can be depicted in below picture. It shows the step by step algorithm which you can use in order to show top 10 Worldwide Trending Twitter Topics.Continue reading “Top 10 Twitter Trending Topics using JAVA twitter4j API”

WordCount in Spark

Hello friends, Today we are going to implement the very famous WordCount code in Spark in spark-shell. For folks who are not familiar with WordCount, in this implementation, we count the occurrences of each word and as a result present a pair of word and their respective count. For example, if my input is asContinue reading “WordCount in Spark”

Twitter Sentiment Analysis using OpenNLP JAVA API

Hi, everyone ! Hope everyone is having a great time. In this post, we are going to see the TWITTER SENTIMENT ANALYSIS by using JAVA as a programming language. We are using OPENNLP Maven dependencies for doing this sentiment analysis. Following is that Maven Dependency. <?xml version="1.0" encoding="UTF-8"?> <project xmlns="; xmlns:xsi="; xsi:schemaLocation=""&gt; <modelVersion>4.0.0</modelVersion> <groupId>com.mycompany</groupId>Continue reading “Twitter Sentiment Analysis using OpenNLP JAVA API”

Load CSV File in Hive Table

We can load CSV data into hive table with the help of CSV SERDE JAR FILE which is freely available. You can download it manually by clicking below text. Download CSV SERDE Jar File Here, we are trying to load two types of CSV data in hive table. First type of data contains header i.e.Continue reading “Load CSV File in Hive Table”

Read Word Document using JAVA

Hi guys, Following code will enable us to read Microsoft Word Document file using JAVA API. /* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package; /** * * @author milindContinue reading “Read Word Document using JAVA”