XML stands for "eXtensible Markup Language". If you know HTML, XML will look very similar at first. It uses a similar syntax for its tags in angle brackets, elements can have attributes, and so forth.
If you don't know HTML, let's just say that XML is a way to structure documents. XML documents are plain text files, but they contain special "commands", which are called markup.
Text in an XML document is either "character data" or markup. Most markup
is in angle brackets (<
...>
). Some markup
also starts with an ampersand (&
) and stops with a semicolon
(;
). As a result, these characters can no longer be used in the
rest of an XML document because they are considered special. (There are ways
to get around this, of course.)
The most common XML markup makes up elements, which are used to
group data. Elements are normally inserted with the use of
opening and closing tags in angle brackets.
The closing tag looks like the opening tag, except that it starts with a slash.
For example, provided that the element for "bold text" is "B"
,
you can enclose a piece of bold text like this:
<B>Bold text.</B>
Compared to HTML, XML is both stricter and more flexible though.
It is stricter because it enforces stricter syntax rules to get rid of the present chaos in HTML syntax. Among other things, XML imposes the following requirements, for example:
<P>...</P>
.
<HR WIDTH="100%" />
.
<B><I>blah</B></I>
.
This is no longer accepted.
ATTRIBUTE="VALUE"
.
HTML allowed "empty" attributes and values without quotes, which is no longer
accepted either.
<P>
element is not the same
as the <p>
element.
.dtd
extension. (The syntax for DTDs is different from that of the actual XML document,
and we're not going into that here.)
As a result, while HTML can only display text in a predefined way, XML documents can contain anything. You can use them for texts, for databases, for spreadsheets, or whatever. To be precise, that's the exact purpose of XML: define an infrastructure for any type of document.
Here's a sample XML document for a bibliography.
With XML, there can be only one "root" element, which,
in this case, we define to be <bibliography>
. (That's similar
to the HTML
tag in HTML.) Each book in the
bibliography will be in a <book>
element, which in turn contains
the <author>
and <title>
elements with the
actual data.
<?xml version="1.0"?> <bibliography> <book> <author>Ulrich M”ller<author> <title>WarpIN Programmer's Guide and Reference</title> </book> <book> <author>Ulrich M”ller<author> <title>WarpIN User's Guide and Reference</title> </book> </bibliography>The first line is required for any XML document and defines the XML version (1.0 is the only one at this point) and some other stuff which we are not describing here.
As you might imagine, an "XML document" by itself is not very helpful. Since an XML document can contain any data in any structure, as long as the XML syntax rules are obeyed, it is XML-compliant. To be precise, XML isn't even really a language -- it's more of a "meta-language" which sets rules of how to define a certain document type. How the data is understood however depends on the application that reads the data.
Enter WarpIN. The script language described in this book is XML-compliant, but
only WarpIN will react to WarpIN scripts the way you expect it to (that is, display
pages, extract archives, and so on). On the other hand, WarpIN will refuse to
accept XML files which do not obey the WarpIN conventions.