DOM and SAX Parser in Java

How to write Java Parser?

Overview:

Parser is an important component in any programming languages. There are multiple open source parsers available in the market. So, the developer has to select the correct parser as per the requirement. Now, in many situations, correct parser is not available freely, in that scenario, developers have to develop their own custom parsers in different languages like Java, C++ etc. The reason behind developing custom parsers can be performance issue, complexity, flaws in parsing, not matching the requirement etc.

In this article we will try to explore how parsing is performed in Java and we will also have a look at different popular Java parsers.

What is parsing and what is a parser?

Before going into details, we must know the meaning of the terms ‘parsing’ and ‘parser’. Let’s have a look.

In simple words, parsing can be defined as a mechanism of breaking down a block of data into smaller pieces based on some pre-defined set of rules. And then interpret, modify or manage the small pieces of data as per the requirement.

And, parser is a software program that is used to break the data into smaller chunks. A parser can be written in any languages based on the requirement.

What are the different types of parsers in Java?

Parsers can be categorized in different ways. In the simplest way, a parser can be either sequential or random. In a sequential parser, only the current parsed data is accessible. It cannot go back or forward. In a random parser, parsed data can be accessed randomly, so moving back and forth is possible. SAX and StAX parsers are examples of sequential parser and XML DOM is an example of random parser.

In a different way, parsers can be classified as text parser or XML parser. A text parser parses textual data whereas XML parser parses XML/JSON data. In our discussion we will focus on popular Java DOM and SAX parsers and their examples.

DOM parser and SAX parser

DOM (Document Object Model) defines an interface which can be used to manipulate XML documents. XML parsers are written by implementing this interface. DOM parsers are random parsers which are suitable when

Information about the structure of the document is important
You need to move back and forth within the structure

DOM parser provides several Java interfaces and methods to work with the XML data. It returns a tree structure of all the elements in a XML document. And the tree can be traversed to work with the data.

SAX (Simple API for XML) is a sequential event-based parser. It parses the XML data in a sequential manner, starting from the root till the end. It does not form a tree structure to parse; rather it sends an event notification while parsing elements. SAX is suitable when

Linear and sequential processing is required
The XML document is too large
Complex nesting in XML is not there
Part of the XML document needs to be manipulated

SAX parser provides interfaces with call-back methods to get event notification during parsing.

How to implement a DOM parser in Java?

In this section we will work with a XML document and DOM parser. Following is a sample XML file containing employee’s data of a company. This is the input file to the parser.

Following is a XML document containing employee related data of a company. The root element is ‘company’, which is at the top of the document. After that, ‘employee’ is the next branch element. It contains employee data like name, salary etc. Parsing will start from the root element onward.

Listing 1: Sample XML document for processing

[code]

<?xml version=”1.0″?>

<firstname>Kaushik</firstname>

<nickname>Kaushik</nickname>

</employee>

<firstname>Thomas</firstname>

<lastname>saparoff</lastname>

<nickname>Thomas</nickname>

</employee>

</employee>

</company

[/code]

Now let us create a Java parser by using DOM parsing model. Following are the steps to be followed in the program to extract the data.

In the import section get all the XML related packages
Access input data file and create document builder
Extract the root element
Create node list containing ’employee’ node
Iterate through the node list and extract values

Listing 2: Implementing DOM parser

[code]

//Create a package

package com.eduonix.xml;

//Import all the packages

import java.io.File;

import javax.xml.parsers.DocumentBuilder;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;

import org.w3c.dom.Element;

import org.w3c.dom.Node;

import org.w3c.dom.NodeList;

//Write a public parser class

public class TestDomParser {

public static void main(String[] args){

try {

//Access input data file and create document builder

File inputDataFile = new File(“InputData.txt”);

DocumentBuilderFactory dbldrFactory

= DocumentBuilderFactory.newInstance();

DocumentBuilder docBuilder = dbldrFactory.newDocumentBuilder();

Document docmt = docBuilder.parse(inputDataFile);

docmt.getDocumentElement().normalize();

System.out.println(“Name of the Root element:”

+ docmt.getDocumentElement().getNodeName());

//Create node list

NodeList ndList = docmt.getElementsByTagName(“employee”);

System.out.println(“*****************************”);

//Iterate through the node list and extract values

for (int tempval = 0; tempval < ndList.getLength(); tempval++) {

Node nd = ndList.item(tempval);

System.out.println(“\n Name of the current element :”

+ nd.getNodeName());

if (nd.getNodeType() == Node.ELEMENT_NODE) {

Element elemnt = (Element) nd;

System.out.println(“Employee ID : ”

+ elemnt.getAttribute(“empid”));

System.out.println(“Employee First Name: ”

+ elemnt

.getElementsByTagName(“firstname”)

.item(0)

.getTextContent());

System.out.println(“Employee Last Name: ”

+ elemnt

.getElementsByTagName(“lastname”)

.item(0)

.getTextContent());

System.out.println(“Employee Nick Name: ”

+ elemnt

.getElementsByTagName(“nickname”)

.item(0)

.getTextContent());

System.out.println(“Employee Salary: ”

+ elemnt

.getElementsByTagName(“salary”)

.item(0)

.getTextContent());

}

} catch (Exception e) {

//Catch and print exception – if any

e.printStackTrace();

}

[/code]

Now compile and run the Java program keeping the XML document in a proper location. The output of the application will be shown as below. It shows all the employee data found in the XML file.

Compiling the source code….

$javac com/eduonix/xml/TestDomParser.java 2>&1

Executing the program….

$java -Xmx128M -Xms16M com/eduonix/xml/TestDomParser

Name of the Root element:company

*****************************************************

Name of the current element: employee

Employee ID: 3931

Employee First Name: Kaushik

Employee Last Name: Pal

Employee Nick Name: Kaushik

Employee Salary: 85000

Name of the current element: employee

Employee ID: 4932

Employee First Name: Thomas

Employee Last Name: saparoff

Employee Nick Name: Thomas

Employee Salary: 95000

Name of the current element: employee

Employee ID: 5935

Employee First Name: Nick

Employee Last Name: Doe

Employee Nick Name: Nick

Employee Salary: 90000

Parser best practices

In case of parsers, best practices depend upon the situations and requirements. A text parser is suitable when you are parsing text input, and then tokenizing/splitting it and making use of the data. XML parsers are suitable when you receive XML/JSON data as an input. Following are some of the best practice rules followed in XML parsing.

DOM parser is best fit when the numbers of elements are under 1000 and you have a requirement of adding/deleting elements. But as DOM creates a tree structure before start processing, performance is an important parameter. So, for partial manipulation of an xml document, DOM is not recommended.

SAX is best fit for large xml files with linear structure and unique elements. It is light weight and suitable for shallow xml document parsing. As it does not make any tree structure, the performance is better than DOM parser.

Conclusion:

Parsing is an integral part of any programming languages. Java has its own method of parsing text, XML data. In this article we have covered different areas of parsing as a generic concept. And then we have talked about specific areas of parsing and parsers like DOM and SAX. In the example section, we have covered DOM parser and its implementation details. At the end of the article we have concluded with the best practices in the industry.

Share on Facebook

Save

Tagged on: DOM, Java, Java Parser, Parser, Parsing, SAX

TechAlpine – All About Technology

www.techalpine.com

DOM and SAX Parser in Java

Enjoy this blog? Please spread the word :)