1) Introduction
XML Documents are nothing until some kind of Components called
Parsers parses the Documents to
extract the meaningful data. Some of the most popular
XML Parsers are the
Simple API for XML (SAX)
and
Document Object Model (DOM). Both these parsers have their own advantages
and disadvantages in
parsing the XML Documents.
XPath is a simple Query Language for querying data from a XML Document
and it is a standard specification from the W3C Group.
2) XPath - A Query Language for XML
Let us see how
XPath can be used to query the various pieces of data in a XML Document. Consider a
following simple XML file,
employees.xml:
<employees>
<employee id = "001">
<name>Johny</name>
</employee>
<employee>
<name>Williams</name>
</employee>
</employees>
|
The above XML file represents a collection of
Employee instances as represented by the
<employee>
tag. A set of
<employee> shares a common root tag
<employees>. It is wise to mention that
in XML terms a tag,
element or
node all means the same. A XML Document is nothing but a collection
of properly organised
well-formed tags. A XML Document can contain a mixture of several of the
commonly-used tags or nodes like
Element,
Attribute,
Text etc.
For example, in the above employees.xml,
<employees>,
<employee>,
<name> are
examples for
'Elements'.
'Attributes' represent a property of an element and in our example
XML Document, it happens to be the 'id' attribute of the
<employee> element. A
'Text' in a XML Document represents any textual content.
For example 'Johny' and 'Williams' are the suitable
candiates for 'Text'.
XPath uses simple expressions to query or select a portion of information from a XML Document. For
instance, if we want to get the name of the first employee, then we can frame an expression like this,
/employees/employee[1]/name
|
The above expression can be intepreted like this, Starting from the root of the XML Document,
(which is represented by '/') traverse until the
<employees> element is found, then deep traverse
to find the first employee element represented by employee[1], then retrive the value of the
<name> element. As seen, the XML Document is hierarchically traversed
to retrieve the information. '/' represents the root of the document, and multiple elements
having the same name can be accessed using
Array-based notation. The index starts with 0, 1, … and so on. If we want to
select an attribute then '@' sign has to be prefixed along with the attribute name.
For example,
if we wish to query for the 'id' value for the second employee, then the following XPath expression
will just do that,
/employees/employee[2]/@id
|
3) Java and XPath
Easy to use Java XPath API is available for accessing the XML data. The XPath API is
available in the standard JDK distribution in the
javax.xml.xpath package. All we have to do is
to utilize the
XPathFactory,
XPath and
XPathExpression classes and interfaces to do the task.
XPathFactory
class follows the standard Factory Pattern to create
XPath objects.
XPath objects
provides an environment to compile expressions which is encapsulated by
XPathExpression. Then the
compiled
XPathExpression
can be executed to get the desired results. Following is the code snippet,
XPathFactory xPathFactory = XPathFactory.newInstance();
// To get an instance of the XPathFactory object itself.
XPath xPath = xPathFactory.newXPath();
// Create an instance of XPath from the factory class.
String expression = "SomeXPathExpression";
XPathExpression xPathExpression = xPath.compile(expression);
// Compile the expression to get a XPathExpression object.
Object result = xPathExpression.evaluate(xmlDocument);
// Evaluate the expression against the XML Document to get the result.
|
4) Sample Application
Following section provides a sample application to demonstrate the usage of XPath in Java
Applications. The sample application tries to select the value of an element, the value of an
attribute, the value of a element-set (which is an element containing multiple elements) by
compiling and executing different expressions.
4.1) projects.xml
Here is a XML file called 'projects.xml' which contains the structured information for various
projects. The <project> element has an attribute called
'id' and various
nested elements
like
<name>,
<start-date> and
<end-date>. The structure of the XML File is given
below.
<?xml version="1.0" encoding="UTF-8"?>
<projects>
<project id = "BP001">
<name>Banking Project</name>
<start-date>Jan 10 1999</start-date>
<end-date>Jan 10 2003</end-date>
</project>
<project id = "TP001">
<name>Telecommunication Project</name>
<start-date>March 20 1999</start-date>
<end-date>July 30 2004</end-date>
</project>
<project id = "PP001">
<name>Portal Project</name>
<start-date>Dec 10 1998</start-date>
<end-date>March 10 2006</end-date>
</project>
</projects>
|
4.2) XPathReader.java
Now, let write a simple Java Application which acts as a reader in reading the various pieces of
information from the XML Document. Following is the Java source that does the job of parsing the
XML Document.
package com.javabeat.tips.xpath;
import java.io.IOException;
import javax.xml.XMLConstants;
import javax.xml.namespace.QName;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class XPathReader {
private String xmlFile;
private Document xmlDocument;
private XPath xPath;
public XPathReader(String xmlFile) {
this.xmlFile = xmlFile;
initObjects();
}
private void initObjects(){
try {
xmlDocument = DocumentBuilderFactory.
newInstance().newDocumentBuilder().
parse(xmlFile);
xPath = XPathFactory.newInstance().
newXPath();
} catch (IOException ex) {
ex.printStackTrace();
} catch (SAXException ex) {
ex.printStackTrace();
} catch (ParserConfigurationException ex) {
ex.printStackTrace();
}
}
public Object read(String expression,
QName returnType){
try {
XPathExpression xPathExpression =
xPath.compile(expression);
return xPathExpression.evaluate
(xmlDocument, returnType);
} catch (XPathExpressionException ex) {
ex.printStackTrace();
return null;
}
}
}
|
The constructor of this class is passed a XML File from which the information has to be read.
The method
initObjects() is called immediately, and it is used to
initialize the XML Document and the
XPath objects. A Document representation of the XML File
is
created by calling the
DocumentBuilder.parse() method Then, a new
XPath object is created by calling the
XPathFactory.newXPath() method.
Client Applications can then call
XPathReader.read() method by passing the expression to
be evaluated and the return type of the expression. The return type of the expression is a
QName
which in XML terms, stands for Qualified Name.
The standard XPath data-types are String, Number,
Boolean, Node, NodeSet etc., which are represented as constants in
XPathConstants namely
XPathConstants.STRING,
XPathConstants.NUMBER,
XPathConstants.BOOLEAN,
XPathConstants.NODE and
XPathConstants.NODESET. Hence, the return type after
evaluating an expression should be any of the above mentioned data-types. Within the
read() method, an expression is compiled using the
XPath.compile() method which returns a
XPathExpression and the compiled expression can be
evaluated using
XPathExpression.evaluate() method.
4.3) XPathReaderTest.java
package com.javabeat.tips.xpath;
import javax.xml.xpath.XPathConstants;
import org.w3c.dom.*;
public class XPathReaderTest {
public XPathReaderTest() {
}
public static void main(String[] args){
XPathReader reader = new XPathReader("
src\\com\\javabeat\\tips\\xpath\\projects.xml");
// To get a xml attribute.
String expression = "/projects/project[1]/@id";
System.out.println(reader.read(expression,
XPathConstants.STRING) + "\n");
// To get a child element's value.'
expression = "/projects/project[2]/name";
System.out.println(reader.read(expression,
XPathConstants.STRING) + "\n");
// To get an entire node
expression = "/projects/project[3]";
NodeList thirdProject = (NodeList)reader.read(expression,
XPathConstants.NODESET);
traverse(thirdProject);
}
private static void traverse(NodeList rootNode){
for(int index = 0; index < rootNode.getLength();
index ++){
Node aNode = rootNode.item(index);
if (aNode.getNodeType() == Node.ELEMENT_NODE){
NodeList childNodes = aNode.getChildNodes();
if (childNodes.getLength() > 0){
System.out.println("Node Name-->" +
aNode.getNodeName() +
" , Node Value-->" +
aNode.getTextContent());
}
traverse(aNode.getChildNodes());
}
}
}
}
|
This test application uses the
XPathReader class by creating its instance and then
calls the
XPathReader.read() method by passing different
expressions and
return types. As we see, the third expression tries to retrieve an entire node-set by passing
the return type as
XPathConstants.NODESET.
Since a node-set contains a collection of nodes which in turn can contain some other nodes,
a Recursive Traversal
is made on the node-set to get the name and the value of the node
by calling the
Node.getNodeName() and
Node.getTextContent() methods. The following would
be the expected output for the above sample client application.
Output for the above program
BP001
Telecommunication Project
Node Name-->project , Node Value-->
Portal Project
Dec 10 1998
March 10 2006
Node Name-->name , Node Value-->Portal Project
Node Name-->start-date , Node Value-->Dec 10 1998
Node Name-->end-date , Node Value-->March 10 2006
|