Post 21 | HDPCD| Specify number of reduce tasks for Pig MapReduce job

Hello everyone. Thanks for coming back to one more tutorial in this HDPCD certification series. In the last tutorial, we saw how to remove the duplicate tuples from a pig relation. In this tutorial, we are going to see how to specify the number of reduce tasks for a Pig MapReduce job. Let us get started […]

Read Excel File using MapReduce

The below code is used for reading excel files using MapReduce API. Entire source code has been taken from this link.   ExcelDriver.java /* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package com.milind.mr.excel;Continue reading “Read Excel File using MapReduce”

MapReduce code for average of all numbers in a file

/* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package com.github.milind; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /**Continue reading “MapReduce code for average of all numbers in a file”

MapReduce code for addition of numbers per line

/* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package com.github.milind; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /**Continue reading “MapReduce code for addition of numbers per line”

MapReduce code for Addition of all Numbers in a file

/* * To change this license header, choose License Headers in Project Properties. * To change this template file, choose Tools | Templates * and open the template in the editor. */ package com.github.milind; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /**Continue reading “MapReduce code for Addition of all Numbers in a file”