JavaBeat
SCDJWS Home Objectives Tutorials Forums  

1. XML Introduction

2. XML Syntax

3. XML Declarations

4. XML Elements

5. XML Attributes

6. XML Comments

XML Comments

Comments provide a way to insert into an XML document text that isn't really part of the document, but rather is intended for people who are reading the XML source itself.

You can add comments to an XML document just as you can add comments to the HTML document. Comment starts with the string <!--and end with the string -->, as shown here:

<tree>
<type>ever green</type>
<!--planting information-->
<season>spring</season>
<sunshine>medium</sunshine>
</tree>

The XML specification states that an XML parser doesn't need to pass these comments on to the application, meaning that you should never count on being able to use the information inside a comment from your application.

Comments in XML documents must follow these rules:

  • Can't have a comment inside a tag. The following statement is illegal:

<season></season <!--early summer is also fine--> >

  • Can't use "-" or "--" inside a comment. The following statement is also illegal:

<!--early summer is also fine -- but not recommended-->

  • Can't place a comment inside of an entity declaration.
  • Can't put a comment inside another comment.
  • Can't place a comment before the XML declaration that must always be the first line in any XML document.
  • Comments can be used to comment out tag sets. When comment out blocks of tags, make sure that the remaining XML is well-formed. In the following example, the season and sunshine elements are ignored:

<tree>
<type>ever green</type>
<!--planting information-->
<!--commented out for testing
<season>spring</season>
<sunshine>medium</sunshine>
-->
</tree>


 

XML CDATA

Some of the characters have special meanings in XML. If an element contains these characters, the parser will be confused. CDATA sections are used to escape blocks of text containing these characters, such as "<", ">", "&" and so on. A CDATA section marks the text as literal so that it will not be parsed, instead be considered just a string of characters. CDATA sections can appear inside element content to allow the special characters to appear.

A CDATA section begins with the character sequence "<![CDATA["and ends with the character sequence "]]>". The text in between the "<![CDATA[" and the "]]>" are escaped.

The following is an example of using CDATA to include HTML content inside an XML element without changing the HTML "<" and ">" to < and >. Since the element is included inside a CDATA section, the content is not parsed by XML parser.

<? xml version="1.0" encoding="UTF-8" ?>
<book>
<title>Little Sister, Big Sister</title>
<author>Diana Cain Blutherthal</author>
<category>chapter book</category>
<TOC>
  <![CDATA[
    <htlm>
      <body>
        <table>
          <tr><td>Chapter One</td><td>Queen</td><td>....3</td></tr>
          <tr><td>Chapter Two</td><td>Mermaids</td><td>....20</td></tr>
          <tr><td>Chapter Three</td><td>The Chocolate Bar</td><td>....31</td></tr>
          <tr><td>Chapter Four</td><td>Thunder Cookies</td><td>....40</td></tr>
        </table>
      </body>
    <html>
  ]]>
</TOC>
</book>


All tags and entity references inside these CDATA markers are ignored by the XML parser that treats them just like any character data. In another words, an XML parser ignores all markup characters such as <, >, and & between these CDATA markers. The only markup an XML parser recognizes inside a CDATA section is the closing character sequence ?]]>". The character string "]]>? must not appear inside a CDATA block as it would signal the end of the CDATA section. Instead, the closing greater-than character must be escaped using the appropriate entity &gt;. Therefore, CDATA sections cannot be nested.

Keep in mind, though, that nothing inside a CDATA Section is parsed. Therefore, if you were to include entities, they would not be parsed. So, &lt;I&gt; would remain &lt;I&gt; if it were contained inside a CDATA section.

A CDATA Section can be used anywhere PCDATA occurs?as element content, and so on. However, attribute values are always parsed unless they are specified as CDATA in a DTD or Schema. So, you cannot include a CDATA Section in an attribute value.

Comments are not recognized in a CDATA section. The XML parser will treat any "<! ? comments - >" in the CDATA block as the literal text without parsing them. CDATA does not work in HEML.


 

XML Validation

An XML document needs to be valid to be of practical use. An XML valid document obeys the following rules:

  1. The XML document must be well formed. A "Well Formed" XML document is a document that conforms to the XML syntax rules that were described in the previous XML Syntax section.

  2. The XML document must apply to the rules as defined in a XML Schema or Document Type Definition (DTD).

  3. A "Valid" XML document is a "Well Formed" XML document (there is no possibility for typo?s in the tags), which also conforms to the rules defined in its XML schema (there?s no possibility to use a tag that is not defined in the XML schema).

Validity is another important XML concept. A document instance is valid when it parses successfully against its accompanying DTD or XML schema. As mentioned XML does not require the use of a DTD or XML schema, but when a document instance invokes a DTD or schema, then a parser will validate the document instance against it. If it parses successfully to build XML document tree, the document is both well-formed and valid. A document instance can be well-formed but not valid if it has a DTD or XML schema and violates the rules of its DTD or schema. For example, its schema may require an element to be numeric only. If the numeric-only element contains an alpha character, it would be invalid. Parsed without invoking its DTD, it could very likely be well-formed. If a document instance is valid, then it is always well-formed.

The W3C XML specification states that a program should not continue to process an XML document if it finds a validation error. The reason is that XML software should be easy to write, and that all XML documents should be compatible.

Note:

DTDs can be used to describe XML markup languages and to validate XML documents, but they are also limited. DTDs describes how elements and attributes are organized in a markup language, but they fail to address data typing. XML Schema was created to address the limitations of DTDs. This is also the reason that XML Schema is preferred to DTDs in Web Services.


 

XML Processing Instructions

Processing instructions (PIs) are an escape hatch to provide information to an application. Like comments, they are not textually part of the XML document, but the XML processor is required to pass them to an application.

PIs allow you to enter instructions into your XML which are not part of the actual document, but which are passed up to the application.

<?xml version='1.0' encoding='UTF-16' standalone='yes'?>
<name nickname='Shiny John'>
<first>John</first>
<!--John lost his middle name in a fire-->
<middle/>
<?nameprocessor SELECT * FROM blah?>
<last>Doe</last>
</name>

There aren't really a lot of rules on PIs. Processing instructions have the form: <?name pidata?>. The name, called the PI target, identifies the PI to the application. Applications should process only the targets they recognize and ignore all other PIs. Any data that follows the PI target is optional, it is for the application that recognizes the target. The names used in PIs may be declared as notations in order to formally identify them. The PI target is bound by the same naming rules as elements and attributes. So, in this example, the PI target is nameprocessor,and the actual text of the PI is SELECT * FROM blah.


Note:

  • PI names beginning with "xml" are reserved for XML standardization.

  • PIs are used to embed information for proprietary applications (not XML browser or parser)

  • PIs Often provide information about how to view unparsed external entities

  • PIs are the string of text between <? and ?>, i.e.

             <?xml-stylesheet type="text/css" href="greeting.css"?>

             <?acrobat document="passport.pdf"?>


PIs are pretty rare, and are often frowned upon in the XML community, especially when used frivolously. But if you have a valid reason to use them, go for it. For example, PIs can be an excellent place for putting the kind of information (such as scripting code) that gets put in comments in HTML. While you can't assume that comments will be passed on to the application, PIs always are.

Is the XML Declaration a Procesing Instruction?

At first glance, you might think that the XML declaration is a PI that starts with xml. It uses the same "<? ?>" notation, and provides instructions to the parser (but not the application). So is it a PI?

Actually, no: the XML declaration isn't a PI. But in most cases it really doesn't make any difference whether it is or not, so feel free to look at it as one if you wish. The only places where you'll get into trouble are the following:

  • Trying to get the text of the XML declaration from an XML parser. Some parsers erroneously treat the XML declaration as a PI, and will pass it on as if it were, but many will not. The truth is, in most cases your application will never need the information in the XML declaration; that information is only for the parser. One notable exception might be an application that wants to display an XML document to a user, in the way that we're using IE5 to display the documents in this book.

  • Including an XML declaration somewhere other than at the beginning of an XML document. Although you can put a PI anywhere you want, an XML declaration must come at the beginning of a file.

Buy Mock Exam Software
Buy Whizlabs SCWCD 1.4 Preparation Kit
Buy Whizlabs SCJP 5.0 Preparation Kit
Buy Whizlabs SCBCD Preparation Kit
Buy Whizlabs SCDJWS Preparation Kit
Buy Whizlabs SCEA Preparation Kit


JavaBeat 2006, India | javabeat home | About Us | partners directory | Advertise with us