|
XML Comments
Comments provide a way to insert into an XML document text that isn't really
part of the document, but rather is intended for people who are reading the
XML source itself.
You can add comments to an XML document just as you can add comments to the
HTML document. Comment starts with the string <!--and end with the string
-->, as shown here:
<tree>
<type>ever green</type>
<!--planting information-->
<season>spring</season>
<sunshine>medium</sunshine>
</tree>
The XML specification states that an XML parser doesn't need to pass these
comments on to the application, meaning that you should never count on being
able to use the information inside a comment from your application.
Comments in XML documents must follow these rules:
-
Can't have a comment inside a tag. The following statement is illegal:
<season></season <!--early summer is also fine--> >
-
Can't use "-" or "--" inside a comment. The following statement is also
illegal:
<!--early summer is also fine -- but not recommended-->
-
Can't place a comment inside of an entity declaration.
-
Can't put a comment inside another comment.
-
Can't place a comment before the XML declaration that must always be the
first line in any XML document.
-
Comments can be used to comment out tag sets. When comment out blocks of
tags, make sure that the remaining XML is well-formed. In the following
example, the season and sunshine elements are ignored:
<tree>
<type>ever green</type>
<!--planting information-->
<!--commented out for testing
<season>spring</season>
<sunshine>medium</sunshine>
-->
</tree>
XML CDATA
Some of the characters have special meanings in XML. If an element contains
these characters, the parser will be confused. CDATA sections are used to
escape blocks of text containing these characters, such as "<", ">",
"&" and so on. A CDATA section marks the text as literal so that it will
not be parsed, instead be considered just a string of characters. CDATA
sections can appear inside element content to allow the special characters to
appear.
A CDATA section begins with the character sequence
"<![CDATA["and ends with the character sequence
"]]>". The text in between the "<![CDATA["
and the "]]>" are escaped.
The following is an example of using CDATA to include HTML content inside an
XML element without changing the HTML "<" and ">" to < and >.
Since the element is included inside a CDATA section, the content is not
parsed by XML parser.
<? xml version="1.0" encoding="UTF-8" ?>
<book>
<title>Little Sister, Big Sister</title>
<author>Diana Cain Blutherthal</author>
<category>chapter book</category>
<TOC>
<![CDATA[
<htlm>
<body>
<table>
<tr><td>Chapter
One</td><td>Queen</td><td>....3</td></tr>
<tr><td>Chapter
Two</td><td>Mermaids</td><td>....20</td></tr>
<tr><td>Chapter
Three</td><td>The Chocolate
Bar</td><td>....31</td></tr>
<tr><td>Chapter
Four</td><td>Thunder
Cookies</td><td>....40</td></tr>
</table>
</body>
<html>
]]>
</TOC>
</book>
All tags and entity references inside these CDATA markers are
ignored by the XML parser that treats them just like any character data. In
another words, an XML parser ignores all markup characters such as <, >,
and & between these CDATA markers. The only markup an XML parser
recognizes inside a CDATA section is the closing character sequence
?]]>". The character string "]]>? must not
appear inside a CDATA block as it would signal the end of the CDATA section.
Instead, the closing greater-than character must be escaped using the
appropriate entity >. Therefore, CDATA sections cannot be
nested.
Keep in mind, though, that nothing inside a CDATA Section is parsed.
Therefore, if you were to include entities, they would not be parsed. So,
<I> would remain <I> if it were contained inside a
CDATA section.
A CDATA Section can be used anywhere PCDATA occurs?as element content, and so
on. However, attribute values are always parsed unless they are specified as
CDATA in a DTD or Schema. So, you cannot include a CDATA Section in an
attribute value.
Comments are not recognized in a CDATA section. The XML parser will treat any
"<! ? comments - >" in the CDATA block as the literal text without
parsing them. CDATA does not work in HEML.
XML Validation
An XML document needs to be valid to be of practical use. An XML valid
document obeys the following rules:
-
The XML document must be well formed. A "Well Formed" XML document is a
document that conforms to the XML syntax rules that were described in the
previous XML Syntax section.
-
The XML document must apply to the rules as defined in a XML Schema or
Document Type Definition (DTD).
-
A "Valid" XML document is a "Well Formed" XML document (there is no
possibility for typo?s in the tags), which also conforms to the rules
defined in its XML schema (there?s no possibility to use a tag that is not
defined in the XML schema).
Validity is another important XML concept. A document instance is valid when
it parses successfully against its accompanying DTD or XML schema. As
mentioned XML does not require the use of a DTD or XML schema, but when a
document instance invokes a DTD or schema, then a parser will validate the
document instance against it. If it parses successfully to build XML document
tree, the document is both well-formed and valid. A document instance can be
well-formed but not valid if it has a DTD or XML schema and violates the rules
of its DTD or schema. For example, its schema may require an element to be
numeric only. If the numeric-only element contains an alpha character, it
would be invalid. Parsed without invoking its DTD, it could very likely be
well-formed. If a document instance is valid, then it is always well-formed.
The W3C XML specification states that a program should not continue to process
an XML document if it finds a validation error. The reason is that XML
software should be easy to write, and that all XML documents should be
compatible.
Note:
DTDs can be used to describe XML markup languages and to validate XML
documents, but they are also limited. DTDs describes how elements and
attributes are organized in a markup language, but they fail to address data
typing. XML Schema was created to address the limitations of DTDs. This is
also the reason that XML Schema is preferred to DTDs in Web Services.
XML Processing Instructions
Processing instructions (PIs) are an escape hatch to provide information to an
application. Like comments, they are not textually part of the XML document,
but the XML processor is required to pass them to an application.
PIs allow you to enter instructions into your XML which are not part of the
actual document, but which are passed up to the application.
<?xml version='1.0' encoding='UTF-16' standalone='yes'?>
<name nickname='Shiny John'>
<first>John</first>
<!--John lost his middle name in a fire-->
<middle/>
<?nameprocessor SELECT * FROM blah?>
<last>Doe</last>
</name>
There aren't really a lot of rules on PIs. Processing instructions have the
form: <?name pidata?>. The name, called the PI target, identifies
the PI to the application. Applications should process only the targets they
recognize and ignore all other PIs. Any data that follows the PI target is
optional, it is for the application that recognizes the target. The names used
in PIs may be declared as notations in order to formally identify them. The PI
target is bound by the same naming rules as elements and attributes. So, in
this example, the PI target is nameprocessor,and the actual text of the PI is
SELECT * FROM blah.
Note:
-
PI names beginning with "xml" are reserved for XML standardization.
-
PIs are used to embed information for proprietary applications (not XML
browser or parser)
-
PIs Often provide information about how to view unparsed external entities
-
PIs are the string of text between <? and ?>, i.e.
<?xml-stylesheet
type="text/css"
href="greeting.css"?>
<?acrobat
document="passport.pdf"?>
PIs are pretty rare, and are often frowned upon in the XML community,
especially when used frivolously. But if you have a valid reason to use them,
go for it. For example, PIs can be an excellent place for putting the kind of
information (such as scripting code) that gets put in comments in HTML. While
you can't assume that comments will be passed on to the application, PIs
always are.
Is the XML Declaration a Procesing Instruction?
At first glance, you might think that the XML declaration is a PI that starts
with xml. It uses the same "<? ?>" notation, and provides instructions
to the parser (but not the application). So is it a PI?
Actually, no: the XML declaration isn't a PI. But in most cases it really
doesn't make any difference whether it is or not, so feel free to look at it
as one if you wish. The only places where you'll get into trouble are the
following:
-
Trying to get the text of the XML declaration from an XML parser. Some
parsers erroneously treat the XML declaration as a PI, and will pass it on
as if it were, but many will not. The truth is, in most cases your
application will never need the information in the XML declaration; that
information is only for the parser. One notable exception might be an
application that wants to display an XML document to a user, in the way
that we're using IE5 to display the documents in this book.
-
Including an XML declaration somewhere other than at the beginning of an
XML document. Although you can put a PI anywhere you want, an XML
declaration must come at the beginning of a file.
|