Webcam Chat QuickBooks Advice international calling cards international phone cards
JavaBeat Java Books Certifications Certifications Kits Articles Tutorials Tips QNA Book Store Interview Questions SCJP 1.5 SCJP 1.6 SCWCD 5.0 SCBCD 5.0 SCEA SCJA Feeds

How to Query XML using XPath

Author : Raja
Topic : xml
Date : Thu Mar 12th, 2009
Feedback Request New Tips Print Email

How to Query XML using XPath

1) Introduction

XML Documents are nothing until some kind of Components called Parsers parses the Documents to extract the meaningful data. Some of the most popular XML Parsers are the Simple API for XML (SAX) and Document Object Model (DOM). Both these parsers have their own advantages and disadvantages in parsing the XML Documents. XPath is a simple Query Language for querying data from a XML Document and it is a standard specification from the W3C Group.

  • XQuery by Oreilly Publications

2) XPath - A Query Language for XML

Let us see how XPath can be used to query the various pieces of data in a XML Document. Consider a following simple XML file,

employees.xml:


	<employees>	
		<employee id = "001">
			<name>Johny</name>
	    </employee>
		<employee>
	        <name>Williams</name>
		</employee>
	</employees>

The above XML file represents a collection of Employee instances as represented by the <employee> tag. A set of <employee> shares a common root tag <employees>. It is wise to mention that in XML terms a tag, element or node all means the same. A XML Document is nothing but a collection of properly organised well-formed tags. A XML Document can contain a mixture of several of the commonly-used tags or nodes like Element, Attribute, Text etc.

For example, in the above employees.xml, <employees>, <employee>, <name> are examples for 'Elements'. 'Attributes' represent a property of an element and in our example XML Document, it happens to be the 'id' attribute of the <employee> element. A 'Text' in a XML Document represents any textual content. For example 'Johny' and 'Williams' are the suitable candiates for 'Text'.

XPath uses simple expressions to query or select a portion of information from a XML Document. For instance, if we want to get the name of the first employee, then we can frame an expression like this,


	/employees/employee[1]/name

The above expression can be intepreted like this, Starting from the root of the XML Document, (which is represented by '/') traverse until the <employees> element is found, then deep traverse to find the first employee element represented by employee[1], then retrive the value of the <name> element. As seen, the XML Document is hierarchically traversed to retrieve the information. '/' represents the root of the document, and multiple elements having the same name can be accessed using Array-based notation. The index starts with 0, 1, … and so on. If we want to select an attribute then '@' sign has to be prefixed along with the attribute name. For example, if we wish to query for the 'id' value for the second employee, then the following XPath expression will just do that,


	/employees/employee[2]/@id

3) Java and XPath

Easy to use Java XPath API is available for accessing the XML data. The XPath API is available in the standard JDK distribution in the javax.xml.xpath package. All we have to do is to utilize the XPathFactory, XPath and XPathExpression classes and interfaces to do the task.

XPathFactory class follows the standard Factory Pattern to create XPath objects. XPath objects provides an environment to compile expressions which is encapsulated by XPathExpression. Then the compiled XPathExpression can be executed to get the desired results. Following is the code snippet,


XPathFactory xPathFactory = XPathFactory.newInstance();
// To get an instance of the XPathFactory object itself.
XPath xPath = xPathFactory.newXPath();
// Create an instance of XPath from the factory class.
String expression = "SomeXPathExpression";
XPathExpression xPathExpression = xPath.compile(expression);
// Compile the expression to get a XPathExpression object.
Object result = xPathExpression.evaluate(xmlDocument);
// Evaluate the expression against the XML Document to get the result.

4) Sample Application

Following section provides a sample application to demonstrate the usage of XPath in Java Applications. The sample application tries to select the value of an element, the value of an attribute, the value of a element-set (which is an element containing multiple elements) by compiling and executing different expressions.

4.1) projects.xml

Here is a XML file called 'projects.xml' which contains the structured information for various projects. The <project> element has an attribute called 'id' and various nested elements like <name>, <start-date> and <end-date>. The structure of the XML File is given below.


	<?xml version="1.0" encoding="UTF-8"?>
	<projects>
	    
		<project id = "BP001">
	        <name>Banking Project</name>
	        <start-date>Jan 10 1999</start-date>
	        <end-date>Jan 10 2003</end-date>
	    </project>
		<project id = "TP001">        
	        <name>Telecommunication Project</name>
	        <start-date>March 20 1999</start-date>
	        <end-date>July 30 2004</end-date>
	    </project>
	    <project id = "PP001">
	        <name>Portal Project</name>
	        <start-date>Dec 10 1998</start-date>
	        <end-date>March 10 2006</end-date>
	    </project>
    
	</projects>

4.2) XPathReader.java

Now, let write a simple Java Application which acts as a reader in reading the various pieces of information from the XML Document. Following is the Java source that does the job of parsing the XML Document.


	package com.javabeat.tips.xpath;
	import java.io.IOException;
	import javax.xml.XMLConstants;
	import javax.xml.namespace.QName;
	import javax.xml.parsers.*;
	import javax.xml.xpath.*;
	import org.w3c.dom.Document;
	import org.xml.sax.SAXException;
	public class XPathReader {
	    
	    private String xmlFile;
	    private Document xmlDocument;
	    private XPath xPath;
	    
	    public XPathReader(String xmlFile) {
	        this.xmlFile = xmlFile;
	        initObjects();
	    }
	    
	    private void initObjects(){        
	        try {
	            xmlDocument = DocumentBuilderFactory.
				newInstance().newDocumentBuilder().
				parse(xmlFile);            
	            xPath =  XPathFactory.newInstance().
				newXPath();
	        } catch (IOException ex) {
	            ex.printStackTrace();
	        } catch (SAXException ex) {
	            ex.printStackTrace();
	        } catch (ParserConfigurationException ex) {
	            ex.printStackTrace();
	        }       
	    }
	    
	    public Object read(String expression, 
				QName returnType){
	        try {
	            XPathExpression xPathExpression = 
				xPath.compile(expression);
		        return xPathExpression.evaluate
				(xmlDocument, returnType);
	        } catch (XPathExpressionException ex) {
	            ex.printStackTrace();
	            return null;
	        }
	    }
	}

The constructor of this class is passed a XML File from which the information has to be read. The method initObjects() is called immediately, and it is used to initialize the XML Document and the XPath objects. A Document representation of the XML File is created by calling the DocumentBuilder.parse() method Then, a new XPath object is created by calling the XPathFactory.newXPath() method.

Client Applications can then call XPathReader.read() method by passing the expression to be evaluated and the return type of the expression. The return type of the expression is a QName which in XML terms, stands for Qualified Name. The standard XPath data-types are String, Number, Boolean, Node, NodeSet etc., which are represented as constants in XPathConstants namely XPathConstants.STRING, XPathConstants.NUMBER, XPathConstants.BOOLEAN, XPathConstants.NODE and XPathConstants.NODESET. Hence, the return type after evaluating an expression should be any of the above mentioned data-types. Within the read() method, an expression is compiled using the XPath.compile() method which returns a XPathExpression and the compiled expression can be evaluated using XPathExpression.evaluate() method.

4.3) XPathReaderTest.java


	package com.javabeat.tips.xpath;
	import javax.xml.xpath.XPathConstants;
	import org.w3c.dom.*;
	public class XPathReaderTest {
	    
	    public XPathReaderTest() {
	    }
	    
	    public static void main(String[] args){
		        
	        XPathReader reader = new XPathReader("
				src\\com\\javabeat\\tips\\xpath\\projects.xml");
		        
	        // To get a xml attribute.
	        String expression = "/projects/project[1]/@id";
	        System.out.println(reader.read(expression, 
				XPathConstants.STRING) + "\n");
	        
		    // To get a child element's value.'
	        expression = "/projects/project[2]/name";
	        System.out.println(reader.read(expression, 
				XPathConstants.STRING) + "\n");
        
	        // To get an entire node
	        expression = "/projects/project[3]";
	        NodeList thirdProject = (NodeList)reader.read(expression, 
				XPathConstants.NODESET);
	        traverse(thirdProject);
	    }
        
	    private static void traverse(NodeList rootNode){
	        for(int index = 0; index < rootNode.getLength();
			index ++){
	            Node aNode = rootNode.item(index);
	            if (aNode.getNodeType() == Node.ELEMENT_NODE){
	                NodeList childNodes = aNode.getChildNodes();            
	                if (childNodes.getLength() > 0){
						System.out.println("Node Name-->" + 
						aNode.getNodeName() +
							" , Node Value-->" + 
	                    aNode.getTextContent());
					}
	                traverse(aNode.getChildNodes());                
		        }
			}        
	    }
	}

This test application uses the XPathReader class by creating its instance and then calls the XPathReader.read() method by passing different expressions and return types. As we see, the third expression tries to retrieve an entire node-set by passing the return type as XPathConstants.NODESET. Since a node-set contains a collection of nodes which in turn can contain some other nodes, a Recursive Traversal is made on the node-set to get the name and the value of the node by calling the Node.getNodeName() and Node.getTextContent() methods. The following would be the expected output for the above sample client application.

Output for the above program


	BP001
	Telecommunication Project
	Node Name-->project , Node Value-->        
	        Portal Project
	        Dec 10 1998
	        March 10 2006
	    
	Node Name-->name , Node Value-->Portal Project
	Node Name-->start-date , Node Value-->Dec 10 1998
	Node Name-->end-date , Node Value-->March 10 2006

XML References

  • How to read xml file and inject bean reference using Spring Framework?
  • XML Tips
  • XML Articles

Feedback Print Email


JavaBeat Website (2004-2011), India
javabeat | advertise | about us | contact | useful resources
Copyright (2004 - 2011), JavaBeat


Technology Blogs
Technology blogs Technology Blogs
blog log