|
XML: The greatest thing since sliced bread?
XML (extensible
Markup Language) is one of the most important technologies that
have emerged in recent times. It is supported just about any
company that can write software. XML is a cousin of HTML and is
widely used by Web applications. But XML is far more than just a
web technology. Where HTML is about the presentation of the
data, XML describes the data itself, and so it is an important
data interchange medium.
Let’s look at an example.
We’ve set a PC up to do some data logging and now is the time to
analyse the data. We dump the data over to our PC and it
comes in like this:
164558067211652622714116586897230
Although all the information is there, it’s
meaningless. We’re going to have to write a custom app to
analyse it or at least some code to translate it.
If we sent the same data as XML, it would look
like this:
<?xml version="1.0"?>
<Datalog>
<Log>
<LogNumber>1</LogNumber>
<Time>1645</Time>
<Speed>580</Speed>
<Temperature>67.2</Temperature>
<Valve>OPEN</Valve>
</Log>
<Log>
<LogNumber>2</LogNumber>
<Time>1652</Time>
<Speed>622</Speed>
<Temperature>71.4</Temperature>
<Valve>OPEN</Valve>
</Log>
<Log>
<LogNumber>3</LogNumber>
<Time>1658</Time>
<Speed>689</Speed>
<Temperature>72.3</Temperature>
<Valve>CLOSED</Valve>
</Log>
</Datalog>
Wow! Even if you have not seen or heard of XML
before, you can follow this. Can you say at what time the
highest speed was? It’s so readable it’s self-explanatory.
Tags
XML uses ‘Tags’, which are the names in the
triangular brackets. We have a Start Tag and an End Tag, and in
between the two we have data. The correct terminology is that
Tags are markup and the data between them is content.
The end tag must have the same name as the Start
Tag but is preceded by a ‘/’. For example, to send a surname you
may use
<surname>Kendall</surname>
Note that the tags are case sensitive.
You can choose any tag name you want and the
more descriptive the better. If you think that sounds like
making things up as we going along, well, that’s the power and
flexibility of XML; all you need to do is to keep to the syntax.
Markup may be hierarchal. Between a Start tag
and End tag, we can add other tags. The rule here is that you
must nest start and end tags in the correct order. We can use
<tag1><tag2></tag2></tag1> but
<tag1><tag2></tag1></tag2> is illegal. As long as you
obey that simple rule, you can nest as deep as you want.
Empty tags are when there is no data. You can
write empty tags as
<email></email> or more commonly as
<email/>.
Say you wanted to describe a business contact
list in XML. Think of a contact list that will have a
number of contacts and that each contact has Name,
email and company; each Company has a Name,
Address and Phone number. We call the layout
the XML Schema. An example with just two records may
look like this:
<?xml version="1.0"?>
<contactlist>
<contact>
<name>Les Kendall</name>
<email>LesK@cyberforth.com</email>
<company>
<name>Les Kendall Software</name>
<address>Birmingham</address>
<phone>08700 11 70 20</phone>
</company>
</contact>
<contact>
<name>Sam Salesman</name>
<email>sales@cyberforth.com</email>
<company>
<name>Cyberforth Ltd</name>
<address>England</address>
<phone>08700 11 70 20</phone>
</company>
</contact>
</contactlist>
Notice how contact name and company name both
have the tag <name> but due to the inherent nature of XML it’s
both legal and obvious to the user.
‘Well Formed’ XML
There are only a few more things you need to
know before you can use XML. You must include is the first line:
<?xml version="1.0"?>
That tells the parser that this is intended as a
XML document and what version of XML we are using.
The above examples are Well Formed because they follow XML
syntax and will pass through a XML Parser (which is a XML
reader) without the parser objecting.
Valid XML
If you want the parser to validate the data then
the structure of the file can be defined at the start of the
XML. This is called the DTD (Document Type Declaration) and it
describes what markup is to be expected in the XML file. This is
optional and I won’t go into any further detail here.
Entities
Some of you may have noticed that we could
really mess up the syntax if the content (data) contained a ‘<’
or a ‘>’. The triangular brackets are reserved character in XML,
together with ‘&’, ‘;’ and ‘’’. These reserved characters are
called Entities and have to be translated:
|
Description |
Character |
Entity |
|
less than |
< |
< |
|
greater than |
> |
> |
|
apostrophe |
' |
&apos |
|
quotation Mark |
" |
" |
|
ampersand |
& |
& |
So if we have a name called “FRED’S DISK & DRIVES” then it would be output as
<name>FRED'S DISK & DRIVES</name>
The parser will decode these entities back to
the original character when the file is read.
XML Parsers
To read the XML back you need a XML Parser.
There are many free parsers around, and all the large software
companies have at least one. Many software applications are
starting to have their own built-in parsers, and as XML gains
pace these will become standard and you won’t even realise that
all data will be in XML format.
As a quick check to test if your XML is well
formed just try to open it in Internet Explorer; if it displays
without error then you’re in business.
Advantages of XML
-
XML is flexible and extensible
-
XML is fairly simple and very readable.
-
XML is a good format for storing data
-
XML is easy to search and sort
-
XML is non-proprietary.
-
XML is platform independent and you can use it to pass
data back and forth between Windows, Linux, Sun, IBM,
embedded systems and virtually anything else.
-
XML is already supported by almost every large software
company.
-
XML can be placed directly on Web Pages.
-
XML is plain ASCII and will support almost any
transmission method.
-
XML is already the basis of many other exciting new
developments.
Almost all Spreadsheets, databases and Word
Processors are able to read and write XML.
Conclusion
Many have waited for the day when database,
desktop, mobile, or any application can communicate easily with
each other. XML makes it all possible, and there are many
offshoot formats based on XML. XML has enormous and limitless
potential for everyone in the IT business, no matter what sector
or allegiance.
It is the greatest thing since sliced bread!
|
Les Kendall is a director of Cyberforth
Limited (http://www.cyberforth.com),
a software company based in England that offers Booking Systems, Bespoke Software Development Eervices and
Business Solutions. Contact Les at Les3@cyberforth.com.
|
|
©Les Kendall 2010.
You can copy and use this artlicle as long as you leave this box intact.
|
|
|