Read Word Document using JAVA

Hi guys,

Following code will enable us to read Microsoft Word Document file using JAVA API.

* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
* @author milind
import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public class MicrosoftWordDocReader {
public static void readDocFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
HWPFDocument doc = new HWPFDocument(fis);
WordExtractor we = new WordExtractor(doc);
String[] paragraphs = we.getParagraphText();
System.out.println("Total no of paragraph " + paragraphs.length);
for (String para : paragraphs) {
} catch (Exception e) {
public static void readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
System.out.println("Total no of paragraph " + paragraphs.size());
for (XWPFParagraph para : paragraphs) {
} catch (Exception e) {
public static void main(String[] args) {
String ext = FilenameUtils.getExtension("D:\\test.docx");
System.out.println("extension : " + ext);
if ("docx".equalsIgnoreCase(ext)) {
} else if ("doc".equalsIgnoreCase(ext)) {
} else {
System.out.println("INVALID FILE TYPE. ONLY .doc and .docx are permitted.");

Following is the pom.xml contents

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="" xmlns:xsi="" xsi:schemaLocation="">

view raw
hosted with ❤ by GitHub

Following is the word file contents

Input File
It is input word file


Following is the output of code.

Output Screenshot
Code Output


Thanks for having a read.

Do comment below for your queries.

Published by milindjagre

I founded my blog four years ago and am currently working as a Data Scientist Analyst at the Ford Motor Company. I graduated from the University of Connecticut pursuing Master of Science in Business Analytics and Project Management. I am working hard and learning a lot of new things in the field of Data Science. I am a strong believer of constant and directional efforts keeping the teamwork at the highest priority. Please reach out to me at for further information. Cheers!

2 thoughts on “Read Word Document using JAVA

    1. Did you mean a specific page in the word document?
      IF YES, I have not explored page specific data extraction.

      Please let me know if you found the solution for this.
      Appreciate it.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: