XML is a free and open standard that allows for the sharing of structured data between information systems. XML stands for Extensible Markup Language; it is extensible in that the data elements that are used can be defined by the user.
XHTML, used to mark-up web pages, is actually one implementation of XML. In XHTML you have a variety of tags like <p> for paragraphs, <table> for tables and <a> for anchors (web links). Because XML is extensible you are able to create your own tags to suit whatever data you wish to encode. For instance, you might have a tag like <phone> to encode phone numbers or <temp> to encode temperatures.
The World Wide Web Consortium (W3C) supports the use of XML Programming. If you encode your data using XML syntax, rather than inventing your own syntax, you can take advantage of a number of technologies that have been developed for working with XML. These will be discussed as we run through the article.
<h2>XML Programming History</h2>
XML is the latest in a series of markup languages. Before it came SGML and before that GML (Standard Generalized Markup Language and Generalized Markup Language respectively). The purpose of these languages has been to encode data in such a way to be readable by both humans and machines.
XML was developed in the mid 1990s as a more refined version of SGML. The internet was already becoming widely available at this time and much of the design discussion behind XML was conducted via email and teleconferencing.
In November 1996, the first working draft of the XML specification was released. This later became known as XML 1.0 and it is now in its fourth revision as of August 2006. XML 1.1, which was first released in February 2004, is currently not as widely used as XML 1.0. XML 1.1 offers support for some extra Unicode scripts such as Mongolian, Cambodian and Burmese. Unless you need to use the features of XML 1.1 it is recommended sticking with 1.0.
<h2>XML Programming Benefits</h2>
It is important to understand the reasons why XML programming is useful. Firstly, it is designed to be human readable. It’s support of Unicode means that almost any written language can be incorporated. At the same time, it is also well suited to being read by a computer program; it’s strict syntax means that a computer can parse the data easily.
It is well suited to describing data structures used by computers, such as trees, lists and records. But, since it is extensible, it can be used to describe just about any form of data. One of the most common applications for XML is to encode documents; the Open Office word processor uses an XML based file format.
XML is platform-independent which, combined with its strict syntax, makes it a language that is resilient to changes in technology. Having a consistent syntax across all your data stores means that XML manipulation and display programs can be reused throughout the organization; this can mean that certain processes such as testing, document storage and software construction become more efficient and streamlined throughout the organization.
Although XML syntax is simple, the amount of storage space required is far in excess of what it would be if the data were stored in a binary format. Thus for some applications XML may not be the ideal storage format.
It should also be noted that, since XML tends to produce documents with a hierarchical structure, non-hierarchical data may be difficult or time consuming to properly represent in XML.
<h2>XML Programming – Defining your data structure</h2>
In practice, there are two ways to define a set of XML elements to fit the data you are working with. These are XML schema or DTD (Document Type Declaration).
DTD is the older of the two and harks back to the days of SGML. It is widely used but it lacks some of the extra features of schemas. Some people consider DTDs to be easier to read and write.
XML Schemas were developed by W3C to replace DTDs. Schemas allow for better control over data typing in the XML document. Schemas are actually themselves written in XML, so standard XML tools can be used to create and edit schemas.
<h2>XML Programming – Processing XML</h2>
Since XML files are stored in Unicode, any programming language or script can parse the Unicode and read the data. But, of course, a number of APIs for XML programming have been developed.
The SAX API (Simple API for XML), is an event-driven interface. An XML document is read sequentially by SAX and its elements are processed by an event handler designed by the user. SAX is efficient and easy to use, but it is not well suited for circumstances where data will be extracted from a document at random.
The DOM API (Document Object Model) is interface-oriented and permits navigation through a document on a node by node basis. If you have experience using JavaScript in web pages you will have used the DOM to process HTML/XHTML elements. DOM is also supported in Java and it is common to see XML processors written in Java utilizing the DOM API. Although DOM is powerful, it can be memory intensive as the entire XML document must be loaded into memory.