|
XML Elements
The segment of an XML document between an opening tag (i.e.,
<tagname>) and a corresponding closing tag (i.e.,
</tagname>) is called an element. XML is designed
to hold any kind of information in elements. An element may
contain a mixture of sub-elements and PCDATA (text) between the opening and
closing tags.
We would actually refer to the element by the element type, which is
synonymous with the name used in the start/end tag pair. For example
<state>Virginia</state>, we have a state element, the content of
which is the state of Virginia.
Elements have relationships with other elements in a document. Some are
parents and some are children. Using this semantic description, one can see
that children elements need parent elements defined and used first. The fact
that XML elements can contain other elements can give rise to an arbitrarily
deep hierarchy of elements within elements.
The following terms are used to describe the hierarchical relationships.
-
Nesting -- Refers to the process of elements containing other
elements
-
Child-- A child element is an element that is contained within
another element.
-
Parent--> A parent element is an element that contains another
element.
-
Sibling -- Sibling elements are elements with the same parent.
As mentioned in the XML Syntax section, an XML document must have one root
element. The root element is the ultimate parent element and
must be contain all the other elements and data, except the XML declaration,
XML comments, and certain processing instructions.
The following XML document describes a movie:
<movie>
<title>Harry Potter and the Goblet of Fire</title>
<released_by>Warner Bros.</released_by>
<directed_by>Mike Newell</directed_by>
<run_time>154 min.</run_time>
<rating>PG-13</rating>
<actors>Leading Actors
<actor>Daniel Radcliffe</actor>
<actor>Emma Watson</actor>
<actor>Rupert Grint</actor>
<actor>Robbie Coltrane</actor>
</actors>
<reviews total="0"></reviews>
</movie>
The movie is the root element. The title, release_by, directed_by,
run_time, rating and actors are child elements of movie. The movie is
the parent element of title, release_by, directed_by, run_time, rating
and actors. Thetitle, release_by, directed_by, run_time, rating and actors are
siblings (or sister elements) because they have the same parent.
Tags are a very important part of XML. They are what you use to mark the
beginning and ending of elements in your XML documents. An XML element
is everything from (including) the element's start tag to
(including) the element's end tag.
An element can have element content, mixed content,
simple content, empty content, or/and attributes.
In the example above, movie has element content, because it contains
other elements. The actors element has mixed content because it
contains both text and other elements. The released_by element has simple
content (or text content) because it contains only text. The
reviews has empty content, because it carries no information.
In the example above only the reviews element has attributes. The
attribute named total has the value.
Parsed Character Data
XML documents are read and processed by a specific piece of software called an
XML parser. When a document is processed by the XML parser, each character in
the document is read, or parsed, in order to create a representation of the
data.
Any text that gets read by the parser is Parsed Character Data, or
PCDATA. This is important because you will see the term PCDATA pop up all
over. Element content is considered either other elements or PCDATA. Attribute
values are considered PCDATA.
By definition, PCDATA is parsed, which means that the parser looks at each of
the characters and tries to determine their meaning. For example, if the
parser encounters a < then it knows that the characters that follow
represent an element instance. When the parser encounters a /, it knows that
it has encountered an end tag.
Because PCDATA is parsed, it cannot contain <, >, and / characters, as
these characters are used in XML syntax. For example,
<!--This is not well-formed XML!-->
<order>0 is < 1 & 1 > 0</order>
Element Naming
If we're going to be creating elements we're going to have to give them names,
and XML is very generous in the names we're allowed to use. For example, there
aren't any reserved words to avoid in XML, as there are in most programming
languages, so we have a lot flexibility in this
regard.
However, there are some rules that we must follow:
-
Names can contain letters, numbers, and other characters.
-
Names can start with letters (including non-Latin characters) or the "_"
character, but not numbers or other punctuation characters.
-
After the first character, numbers are allowed, as are the characters "-"
and ".".
-
Names can't contain spaces.
-
Names can't contain the ":" character. Strictly speaking, this character
is allowed, but the XML specification says that it's "reserved". You
should avoid using it in your documents, unless you are working with
namespaces.
-
Names can't start with the letters "xml", in uppercase, lowercase, or
mixed - you can't start a name with "xml", "XML", XmL", or any other
combination.
-
There can't be a space after the opening "<" character; the name of the
element must come immediately after it. However, there can be space before
the closing ">"character, if desired.
-
Names are case sensitive
The good practice of the element names based these simple rules:
-
Any name can be used, no words are reserved, but the idea is to make names
descriptive. Names with an underscore separator are nice. Examples:
<first_name>, <last_name>.
-
Avoid "-" and "." in names. It could be a mess if your software tried to
subtract name from first (first-name) or think that "name" is a property of
the object "first" (first.name).
-
Element names can be as long as you like, but don't exaggerate. Names should
be short and simple, like this: <book_title> not like this:
<the_title_of_the_book>.
-
XML documents often have a corresponding database, in which fields exist
corresponding to elements in the XML document. A good practice is to use the
naming rules of your database for the elements in the XML documents.
-
Non-English letters like >??? are perfectly legal in XML element names,
but watch out for problems if your software vendor doesn't support them.
-
The ":" should not be used in element names because it is reserved to be
used for something called namespaces (more later).
Empty Element
If an element contains no subelements or character data, that element is said
to be "empty." In most cases, an empty element will contain an attribute-value
pair inside of a single tag that is "terminated" by a forward slash before its
closing bracket. The slash before the ending bracket serves the same function
as an end tag's forward slash. The special empty element syntax is
<tagname/>.
An element containing nothing more than an attribute is still considered
"empty" and "without content" because attribute values count as markup not
character data.For example:
<item id="1234"/>
Technically, an empty element can also be expressed using element start and
end tags. For the above example, the <item
id="1234"category="food"></item> is also a correct syntax.
Recall from our discussion of element names that the only place we can have a
space within the tag is before the closing ">". This rule is slightly
different when it comes to empty elements. The "/" and ">" characters
always have to be together, so you can create an empty element like this:
<item />
but not like these:
<item/ >
<item / >
Empty elements really don't buy you anything - except that they take less
typing - so you can use them, or not, at your discretion. Keep in mind,
however, that as far as XML is concerned <item></item> is exactly
the same as <item/>; for this reason, XML parsers will sometimes change
your XML from one form to the other. You should never count on your empty
elements being in one form or the other, but since they're syntactically
exactly the same, it doesn't matter.
Interestingly, nobody in the XML community seems to mind the empty element
syntax, even though it doesn't add anything to the language. This is
especially interesting considering the passionate debates that have taken
place on whether attributes are really necessary.
One place where empty elements are very often used is for elements that have
no (or optional) PCDATA, but instead have all of their information stored in
attributes. So if we rewrote our <item> example without child elements,
instead of a start-tag and end-tag we would probably use an empty element,
like this:
<item id="1234" category="food"/>
Another common example is the case where just the element name is enough; for
example, the HTML <BR> tag might be converted to an XML empty element,
such as the XHTML <br/> tag. (XHTML is the latest "XML-compliant"
version of HTML.)
Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<book>
<title>XML Element</title>
<author>
<HR/>
<published_year>2005</published_year><BR/>
<contact>800-123-4567</contact><BR/>
</author>
</book>
When do we use empty elements?
-
Element has no data other than attributes
-
Used as placeholders for attributes
-
Mark point phenomena
|