Read Word Document using JAVA

Following code will enable us to read Microsoft Word Document file using JAVA API.

* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
* @author milind
import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public class MicrosoftWordDocReader {
public static void readDocFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
HWPFDocument doc = new HWPFDocument(fis);
WordExtractor we = new WordExtractor(doc);
String[] paragraphs = we.getParagraphText();
System.out.println("Total no of paragraph " + paragraphs.length);
for (String para : paragraphs) {
} catch (Exception e) {
public static void readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
System.out.println("Total no of paragraph " + paragraphs.size());
for (XWPFParagraph para : paragraphs) {
} catch (Exception e) {
public static void main(String[] args) {
String ext = FilenameUtils.getExtension("D:\\test.docx");
System.out.println("extension : " + ext);
if ("docx".equalsIgnoreCase(ext)) {
} else if ("doc".equalsIgnoreCase(ext)) {
} else {
System.out.println("INVALID FILE TYPE. ONLY .doc and .docx are permitted.");

Following is the pom.xml contents

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="" xmlns:xsi="" xsi:schemaLocation="">

Following is the word file contents

Input File
It is input word file


Following is the output of code.

Output Screenshot
Code Output


I founded my blog four years ago and am currently working as a Data Scientist Analyst at the Ford Motor Company. I graduated from the University of Connecticut pursuing Master of Science in Business Analytics and Project Management.

2 thoughts on “Read Word Document using JAVA

    1. Did you mean a specific page in the word document?
      IF YES, I have not explored page specific data extraction.

      Please let me know if you found the solution for this.
      Appreciate it.

