Prerequisites for Hadoop – Part 3

Hello people,

Welcome to third and last installment of Prerequisites for Hadoop.
This post will put light on following topics

  • Apache Maven
  • Core Java
  • Java Project
  • Java Packages
  • Java Classes
  • NetBeans IDE Installation

Apache Maven

Maven is an Apache project which is used for Project Management and Comprehension.
It is based on Project Object Model(POM).
In simplest words, we as Java Developers, use pom.xml file in order to specify the dependencies that we are going use during the course of the Project Development.
Now you might be wondering, why we specify the dependencies, can’t we directly copy the jar files in the classpath and get done with it.
The answer to above query is YES, you can do that. But what if there are 100s of Jar Files, so will you copy those jar files every time you create a new Project? The answer definitely is NO.

When I started working on MapReduce, I always fell in trap of missing either one or multiple jar files. I always found it tedious to go with this approach, so adopting Apache Maven for project configuration helped me a lot and this is the reason I encourage people to use Maven instead of manually copying and pasting the jar files in classpaths.

Core Java

We use JAVA mainly for MapReduce coding and writing UDFs in hive or pig, therefore we must know the core part of JAVA and not the advanced part.
Following are the contents which we are going to look through Core Java part of Hadoop Prerequisites.

  1. NetBeans IDE Installation
  2. Java Project
  3. Java Packages
  4. Java Classes

NetBeans IDE Installation

We are going to use NetBeans IDE for doing the development in JAVA and MapReduce.
You can use Eclipse IDE also which totally depends on your choice.

We are going to use following steps in order to install NetBeans IDE 8.0.2 on Ubuntu 14.04

JAVA Project

As already discussed, we are going to use Apache Maven Java Project.
We can create Apache Maven Java Project in following way.

    1. Open NetBeans IDE
    2. Click on File -> New Project
New Project
New Project
    1. Select Maven Category -> Java Application Project -> Click Next
Java Maven
Java Maven
    1. Give suitable Project Name -> Click Finish
Project Name
Project Name

Once you follow all above mentioned steps, you will be able to see the newly created project on the left panel of the NetBeans IDE window which is shown in below figure.

Apache Maven Java Project
Apache Maven Java Project

You can see the default package that gets created, dependency jar files if any and pom.xml which is used for specifying the dependency in XML format.

JAVA Packages

We use JAVA Packages in order to differentiate components in a single project.
For example, let us say we are working on Online Shopping Website for which code needs to be written, so instead of keeping all the code in one package, we can create different packages and write code related to the particular department in corresponding packages. In simple English Language, we will write code for Electronics Department in electronics package, Fashion Department in fashion package, so on and so forth.
Using packages, debugging and project structuring is made very simple. We can narrow down the errors and exceptions quite easily when we use packages.

In below screenshot, com.mycompany.test_project is the package name.

Apache Maven Java Project
Apache Maven Java Project

JAVA Classes

JAVA Class is the place where we actually write our code.
If you want any class to execute, it must include main() method.
public static void main(String args) is the main() method signature.
If you do not have main() method in your class, then you cannot execute that class.

You can create a class by following steps.

    1. Right click on Project -> New -> Java Class
Create Class
Create Class
    1. Give Class Name -> Click on Finish
Class Name
Class Name
    1. Class Definition
Class Definition
Class Definition
    1. Class Without main() method cannot be run

You can see in the screenshot that Run File option is disabled, because it does not contain main() method.

Class without main method
Class without main method
    1. We can run Java Class With main() method

Below screenshot shows, as soon as you include main() method, Run File option is enabled.

Class with main method
Class with main method

I think this much information is sufficient for introductory part of JAVA for hadoop.
Hope you people have a great read.

Please do give some feedback, so that I can improvise on the content of this blog.

Published by milindjagre

I founded my blog four years ago and am currently working as a Data Scientist Analyst at the Ford Motor Company. I graduated from the University of Connecticut pursuing Master of Science in Business Analytics and Project Management. I am working hard and learning a lot of new things in the field of Data Science. I am a strong believer of constant and directional efforts keeping the teamwork at the highest priority. Please reach out to me at for further information. Cheers!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: