XML is not a markup language. XML is a ‘metalanguage’, that is, it's a
language that lets you define your own markup languages (see
* XML is a markup language [two (seemingly) contradictory statements
one after another is an attention-getting device that I'm fond of], not
a programming language. XML is data: is does not ‘do’ anything, it has
things done to it.
* XML is non-proprietary: your data cannot be held hostage by someone else.
* XML allows multi-purposing of your data.
* Well-designed XML applications most often separate ‘content’ from
‘presentation’. You should describe what something is rather what
something looks like (the exception being data content which never gets
presented to humans).
Saying ‘the data is in XML’ is a relatively useless statement, similar
to saying ‘the book is in a natural language’. To be useful, the former
needs to specify ‘we have used XML to define our own markup language’
(and say what it is), similar to specifying ‘the book is in French’.
A classic example of multipurposing and separation that I often use is
a pharmaceutical company. They have a large base of data on a
particular drug that they need to publish as:
* reports to the FDA;
* drug information for publishers of drug directories/catalogs;
* ‘prescribe me!’ brochures to send to doctors;
* little pieces of paper to tuck into the boxes;
* labels on the bottles;
* two pages of fine print to follow their ad in Reader's Digest;
* instructions to the patient that the local pharmacist prints out;
Without separation of content and presentation, they need to maintain
essentially identical information in 20 places. If they miss a place,
people die, lawyers get rich, and the drug company gets poor. With XML
(or SGML), they maintain one set of carefully validated information,
and write 20 programs to extract and format it for each application.
The same 20 programs can now be applied to all the hundreds of drugs
that they sell.
In the Web development area, the biggest thing that XML offers is fixing what is wrong with HTML:
* browsers allow non-compliant HTML to be presented;
* HTML is restricted to a single set of markup (‘tagset’).
If you let broken HTML work (be presented), then there is no motivation
to fix it. Web pages are therefore tag soup that are useless for
further processing. XML specifies that processing must not continue if
the XML is non-compliant, so you keep working at it until it complies.
This is more work up front, but the result is not a dead-end.
If you wanted to mark up the names of things: people, places,
companies, etc in HTML, you don't have many choices that allow you to
distinguish among them. XML allows you to name things as what they are:
<person>Charles Goldfarb</person> worked
gives you a flexibility that you don't have with HTML:
<B>Charles Goldfarb</B> worked at<B>IBM<</B>
With XML you don't have to shoe-horn your data into markup that restricts your options.
What is the purpose of XML namespaces?
XML namespaces are designed to provide universally unique names for
elements and attributes. This allows people to do a number of things,
* Combine fragments from different documents without any naming conflicts. (See example below.)
* Write reusable code modules that can be invoked for specific elements
and attributes. Universally unique names guarantee that such modules
are invoked only for the correct elements and attributes.
* Define elements and attributes that can be reused in other schemas or
instance documents without fear of name collisions. For example, you
might use XHTML elements in a parts catalog to provide part
descriptions. Or you might use the nil attribute defined in XML Schemas
to indicate a missing value.
As an example of how XML namespaces are used to resolve naming
conflicts in XML documents that contain element types and attributes
from multiple XML languages, consider the following two XML documents:
<?xml version="1.0" ?>
<?xml version="1.0" ?>
Each document uses a different XML language and each language defines
an Address element type. Each of these Address element types is
different -- that is, each has a different content model, a different
meaning, and is interpreted by an application in a different way. This
is not a problem as long as these element types exist only in separate
documents. But what if they are combined in the same document, such as
a list of departments, their addresses, and their Web servers?