by
Ellen E. Fischer
April 21, 1998
Will XML affect my life? Whether it makes sense or not, XML will most certainly affect my life. I might be conscious of it, or I might not. I might have a choice in how much I'm aware of it, or I might not. But, XML will affect my life.
The Extensible Markup Language 1.0 Recommendation was released by the World Wide Web Consortium on February 10, 1998. Then, on March 4, 1998, eighteen technology companies announced that they were forming X-ACT, the XML Active Content Technologies Council (X-ACT, p. 1). Microsoft, although not a member of X-ACT, is also leading the charge in creating XML-based applications. These companies hope to use XML as a simple solution to many computer problems, both simple and complex. With the flurry of new applications being released and being proposed, and with the hype surrounding the release of the XML Recommendation, XML is being hailed as the answer to almost everyone's problems.
XML is a subset of the Standard Generalized Markup Language, a standard of the International Organization for Standards. The purpose of SGML is to allow one to describe a document in terms of its structure, rather than its page layout. With SGML, a designer writes a Document Type Definition which specifies the structure elements and attributes of a particular class of documents. Then, the document is written with tags identifying these parts, which an electronic viewer for the DTD can interpret in order to lay out the document. For example, HTML, the HyperText Markup Language used on the Internet, is an SGML DTD which allows one to specify elements such as headings, paragraphs and lists (using tags <H1>, <P>, <LI>, etc.).
XML was designed to allow one to create structured documents more easily than is possible with SGML. Also, XML was designed to have the advantages of HTML, so that it would "be straightforwardly usable over the Internet" (XML 1.0 Recommendation). This ability to define standardized, readable structure is taking XML beyond the publishing realm into other areas which are looking for simple cures for complex problems.
According to the X-ACT Q&A sheet, "The purpose and mission of X-ACT is to grow the XML market" (X-ACT FAQ's). In order to do this, it seems that these organizations are planning to use XML for any application it could possibly relate to. The new, proposed uses for XML documents include data sharing, database access and distributed computing. The X-ACT web page (www.x-act.org) describes Active Content as a technology that will "allow data or objects to be reused and repurposed by any application." Usually, one tries to use a tool for the purpose it was intended. But, by using XML for these proposed uses, it seems X-ACT and other organizations are "repurposing" XML.
Publishing
Because many XML developers came from the Standard Generalized Markup Language world ... they see it primarily as a publishing prize...…
J.P. Morgenthal, NC.Focus
The initial efforts at simplifying SGML into XML began in the summer of 1996. The XML Working Group of the World Wide Web Consortium officially formed in January 1997 and has worked on the stabilizing the XML specification in the past year. XML was designed to combine the flexibility of SGML with the web-abilities of HTML. Current browsers have been designed primarily to recognize HTML and have a built-in mechanism for displaying HTML documents. This is necessary because, by design, documents defined by SGML and its pure derivatives do not specify formatting instructions. So, obviously, an application displaying XML documents will need to be made XML-aware. By combining XML with Cascading Style Sheets, or the future Extensible Stylesheets Language, such applications will be able to read XML DTDs and appropriately display XML documents.
Traditional SGML software vendors will be releasing products for XML. For example, Interleaf is planning a product, code-named Blade Runner, which "combine a visual Document Type Definition (DTD) modeling tool; a WYSIWYG authoring and assembly environment; and a publishing engine designed for experienced XML and Standard Generalized Markup Language (SGML) users" to be run on Windows NT. In addition, Microstar Software has made available "a free XML parser optimized for use in Java applets", named Alfred, and is working on a new version of its Near & Far Designer, which is for DTD design (Walsh 4).
The XML publishing revolution will occur due to its Internet features, though. Therefore, the products of concern to most consumers will be those allowing XML documents to be viewed on the web. In anticipation of the XML Recommendation, then, Microsoft built an XML parser into its Internet Explorer 4.0. Also, Netscape's recently released Communicator 5.0 supports XML 1.0, "us[ing] Cascading Style Sheets to convert XML data into HTML display," and will integrate XSL when its development is complete (Walsh 2). Beyond this, Netscape is leaving further XML support to developers. Ramanathan Guha, Netscape engineer, said, "the most important use for XML today is for documents" (Walsh 2).
Data Sharing
[V]iewing XML only as a publishing format is missing the ball.
J.P. Morgenthal, NC.Focus
Adam Bosworth of Microsoft seems to believe that using XML for publishing, its original intent, is perhaps the hardest thing for XML to do. He told InfoWorld,
"It's true that if you open up the arena to say any tags can show up, then you have some hard problems to solve and it will take some time to get some of those problems worked through. And that's one of the reasons we focused very quickly on XML as a way to let applications interact in terms of sharing information and not as an alternate presentation model…. There is a huge opportunity to share information in an open way …" (Walsh 1).
In order to share information using XML, as Bosworth implies, common tag sets must be defined and standardized. While any group wishing to use XML for information-sharing can define their own tag sets, a likely division of tag sets would be by industry. The professional organizations within each industry could define the sets, i.e., the DTDs (Walsh 5). For example, Chrysler and Ford have proposed a standard automotive industry set; this tag set should be finalized this year (Walsh 5).
In addition to ensuring XML tag sets are defined, companies must ensure their software vendors will be XML-compliant. According to Dave Pool, CEO of DataChannel, "That's an education process" (Walsh 5). Then, applications on company web and other servers will have to be engineered to read and create XML files so that they can interact. Due to the clear structure and contextual information in XML documents, these applications can easily parse the file to access the information, actually enabling the data sharing within the organization and with other sites.
Publishing and Data Sharing Combined: A Health Care Example
"It's like taking a document and putting it in a lemon press: If a system squeezes this XML document, a bunch of structured information will come out, namely information that has been tagged."
Mark Tucker, Health Level 7
The health-care industry started early in planning for the use of XML, making news in August 1997 for its Kona Proposal. The Kona Proposal "aims to enable the exchange of medical information in a vendor-neutral structure built on the Standard Generalized Markup Language (SGML) and XML," Lynda Radosevich wrote in Infoworld (Radosevich 1). Radosevich's article explains the concept of XML and, in part, uses health care as an example to show that various industry groups could find XML useful for their needs. All potential health-care patients should find this proposal interesting, however.
"Would you as a patient want your medical information stored in a proprietary structure that is controlled by an application?" That question was posed by John C. Spinosa, a founding member of the Health Level 7 standards body (Radosevich 1). The obvious answer is, "No." A common XML tag set for the health-care industry provides greater assurance that patients receive more comprehensive health care. While not providing a guarantee of good care, the use of a standard XML DTD would allow easier access to each person's medical records. For example, if a patient's record was translated into an XML document, a hospital and a patient's primary care physician each would be able to pull up and read the record on the Web for collaboration. If patients' records were directly written with this standard health care DTD, their records would be able to follow them, whether they moved across country or just changed local insurance companies. Thus, this ensures the record format "won't tie patients' medical records to prorietary systems in the control of specific health-care institutions and insurance companies," said Spinosa (Radosevich 1).
The scenario just described shows a relatively normal publishing use of a XML DTD. In a sense, one could say it describes data sharing in that several different people and institutions have access to the same (complete) information about a patient. The real use of data sharing, though, is to allow different groups to use the same information without rewriting, retyping or copying the information from the Web into their own files. As described by Rita Knox, of the Gartner Group, the medical information entered in the physician's office could be transferred to a pharmacy for its use. In addition, "a company that tracks statistics on the flu could use the same data" (Walsh 5). This is one of the features that sets XML apart from other publishing mechanisms; "There is no mechanism for that kind of data sharing in place today, [Knox] said" (Walsh 5).
Database Access
"[A] spreading ooze of data" will expand as people exchange data more easily with XML.
Rita Knox, Gartner Group
XML documents are described as ‘context-rich' since the tags can be defined to specify both element names and attributes. In essence, they describe a flat-file database. In 1995, Martin Libicki wrote, "One reason for SGML's growing popularity is that it is the only accepted grammar for marking up text to convert it into data" (Libicki, p. 271). This is the main reason XML (as noted, a subset of SGML) is generating so much excitement: by use of context-rich tags, XML converts text into data. Like databases, XML documents allow "contextual search and retrieval" (Marion Elledge, quoted in Radosevich 1).
But rather than XML documents being used as databases, XML documents are expected to be used as a container for getting information from and sending information to databases. The tags in the XML document can specify the database field and its properties. Scripts can be used to move the data from databases and to XML documents and vice versa. In the first case, a script can read from a database to generate an XML document. In the second case, a script can parse an XML document to insert data into databases. These scripts can be small, stand-alone scripts, or part of a larger application.
Even information from many legacy, non-XML-enabled databases can be integrated in XML documents. As Norbert Mikula, of DataChannel, notes, "Most application software packages support [delimited text format] as an export and/or import format" (Mikula). XML documents are just a verbose form of comma-delimited files; again, scripts are called on to generate an XML document. DataChannel has created a template-driven generator which, combined with the DataChannel XML Parser, automatically creates XML documents from comma-delimited files. Generating a comma-delimited file from an XML document would be even easier. Thus, virtually all applications are no more than one step from being "XML-enabled".
Database Access Applications: Web Pages and EDI
An idea almost as good as peanutbutter and chocolate!
XML/EDI Group
With XML designed to be "easily usable over the Internet", this access to database information allows the creation of dynamic web pages and allows user-input to the web page to be inserted into databases in ways better than the current methods. One could imagine many types of databases and corresponding web pages that would make this a very exciting idea for business and personal interests, e.g., on the page-generation side, creating pages of stock quotes or virtual cookbooks (Stamper). The ability to communicate between web pages and databases with XML will be particularly useful for transforming Electronic Data Interchange into a web-based system.
EDI began in 1969 with transportation paperwork standards defined by the EDI Association (Libicki, p. 167). According to Libicki, as of 1995, "EDI has gone through four phases" (p. 167). With the current rise of electronic commerce on the web, traditional EDI seems to be dying and EDI may be entering its fifth phase. In July 1997, the XML/EDI Group was formed, in part to advance the technology of, and promote the support for, an XML/EDI amalgam (XML/EDI Group's Charter).
Evidence that traditional EDI is making way for the web is provided by David Webber's note that in the past year, "‘EDI World' magazine was renamed ‘Electronic Commerce International'" (Webber). Webber proposes a 4-tier model to integrate traditional EDI and XML-based EDI. The layers of the model are Traditional EDI, Rule Based EDI/EC, Process Based EDI, and Object Based EDI. A business could use one or more of the layers for its transactions. The third and fourth layers are built using the tools of layer two, which uses XML and Java. The three non-traditional layers are backward-compatible, so that EDI messages can be translated between layers. This compatibility makes sense, since by now it's clear that XML documents are easy to generate and easy to parse.
Webber's model is not the only XML/EDI model being advanced. The key point of the XML/EDI group is that XML will be useful for the evolution of EDI and they will work to define a standard model. New EDI forms will have to be built on top of standard industry forms. But, once again, XML's web capabilities and amenability to processing make XML a good match for EDI. The ubiquity of the web will allow more companies to use EDI for their business transactions; EDI will no longer be limited to large enterprises which can afford the "very costly implementation and support" (CommerceNet). Companies will be able to view information about other companies' products, such as price and amount in stock, on web browsers. In addition, EDI requests can still be sent automatically from one company's databases to another's, as described earlier.
Distributed Computing
"With HTTP, you have this fantastic distribution mechanism. Now XML gives you the ability to provide structure over the data. That's cool."
J.P. Morgenthal, NC.Focus
EDI provides one example of a way in which XML can be used not only for Web publishing, but for more general data interchange as well. These two features make XML confusing, according to J. P. Morgenthal of NC.Focus (Walsh 5). As far as the suggestions for general data interchange go, EDI seems to be one of the most straightforward applications. The use of XML as an independent distributed computing protocol is the data interchange application which, perhaps, is providing the most hype for XML. This application, if realized, is the coup de grâce which makes XML truly a "groundbreaking new technology" (X-ACT, p. 1).
According to some people in the software industry, the use of XML documents for data interchange will essentially beat Java, COM and CORBA at their own distributed-computing games. In a hyperbolic moment, Jeff Walsh wrote that XML would be used as "agnostic middleware in the religious distributed computing protocol wars" (Walsh 3). DataChannel and webMethods have both released beta versions of products which will use XML as related middleware. DataChannel's product uses XML to link COM, CORBA and SQL databases. WebMethods' product uses XML to integrate data directly between Excel spreadsheets and applications written in Java, JavaScript, C, C++, Visual Basic, and ActiveX (Hannon). In addition, Microsoft is said to also be working on an XML solution for bridging COM and CORBA (Walsh 6).
If the many strong assertions made by XML middleware supporters could be taken as fact, then XML surely is the miracle cure for the problems of distributed computing. According to DataChannel CEO David Pool, ‘HTTP and XML provide a proven, scalable architecture for Web-based distributed computing' and ‘the combination of XML and HTTP is a more robust way to achieve cross-platform computing than the method Java offers' (Walsh 6).
The basic idea is understandable. COM and CORBA clients and stubs would send and receive XML documents. Then, they would parse the XML document to find the corresponding responses and requests. In order to use pure COM and CORBA, one must work with platform-specific code. But, XML documents are pure text files; they are readable on any machine without translation. Thus, the XML proponents win one point due to the agnosticity of an XML protocol, as "XML made it possible for [many people] to open up their applications and have their applications interact with other compnents and applications without them having to write lots of complicated platform-specific code" (Bosworth, quoted inWalsh 1).
In the new paradigm, COM and CORBA would remain in their corporate settings, with requests sent in XML-wrappers over HTTP-NG, an object-aware protocol standard in development by the World Wide Web Consortium (Walsh,4/13,55). Jim Gettys, a Digital engineer, echoes Pool's comments by saying, "The current technologies of Distributed COM and CORBA haven't thought out how to do this well in a high-latency environment like the Internet, so we're trying to help out in that regard with HTTP-NG" (Walsh 3).
A simple text-based solution seems too easy and sounds inefficient. COM and CORBA are powerful models which offer non-trivial features such as directory services to locate objects. Vendors will need to prove to many users that adding an XML document protocol layer retains full functionality of the distributed object model and is at least as efficient as previous solutions. Many users will probably find that the solution works, even if seemingly impure. The fact that "it works", though, might be the most important issue of all.
Distributed Computing at Work: Generic User Interfaces
"To get work done, we don't have to all set up our desks alike."
Rita Knox, Gartner Group
DataChannel's Mikula and Randy Gordon tantalize white paper readers with the opening, "Will there ever be user interfaces that are independent of their platform, operating system, and all programming languages? What seemed to be impossible for many years appears to have become reality" (Mikula and Gordon). Of course, XML is the answer. Of course, the use of tags with attributes is the key.
Using XML documents to specify components of a user interface allows the interface description to be sent from computer to computer. Then, each user can implement the interface in his or her preferred language, e.g., Java, Visual Basic, C/C++. The XML document contains the names and properties of objects, properly tagged. With a corresponding translator, this interface description can be transcribed into a program to display the interface. Mikula and Gordon provide a sample interface with the XML specification which makes the idea clear (see Appendix).
The authors do not discuss the method of integrating the code which underlies the interface, running functions at the click of a button on the interface. Presumably, this would also involve an application of XML as a distributed computing protocol. Using XML to specify these functions seemingly would be a difficult task; it would be probably be quite difficult to generically describe functions so that they could be easily translated to any language.
XML Extensions: WIDL and XML Data
"Most of what we're doing is tuning – making it a little better, making it a little faster...."
Adam Bosworth, Microsoft
After all the hype, in order for all these applications to work, using plain XML isn't quite enough. Text files don't automate processes by themselves and they can't understand that numbers are different from letters. So, proposals for extensions of XML have been put before the World Wide Web Consortium. The Web Interface Definition Language and XML Data have been acknowledged by W3C as Submissions, for comment and possible further review (WIDL Submission, XML Data Submission).
The Web Interface Definition Language, created by webMethods, specifies Web Automation, in which "everything a browser can do" can be done by business applications, "without human intervention and without using a browser" (Allen). According to webMethods, the benefits of Web Automation include, "competitive intelligence ... application integration ... robust e-commerce solutions ... Web-based alternative to EDI ... web site functionality in the heart of customers' and suppliers' IT infrastructures" (Allen). WIDL defines services, which are essentially function calls, and provides the structure to generate code in languages such as C/C++, Java, COBOL, and Visual Basic (WIDL Submission). The simplicity of this XML extension is impressive; there are only six WIDL elements, each with approximately 5 or 6 attributes: WIDL, SERVICE, BINDING, VARIABLE, CONDITION, and REGION.
XML Data is a submission of Microsoft which aims to provide awareness of the types of data with an XML document. XML Data will enable the recognition of a number as a number, a date as a date, a boolean as a boolean, etc. Thus, applications will be able to enforce the appropriate data-typing. Also, calculations using the data can be done; for example, programs will be able to compute sums or averages with the numbers (Bosworth, cited in Walsh 1).
Conclusions
"XML, as a context-rich, data-neutral file format, is probably the most important new technology development of the last two years."
Michael Vizard, InfoWorld
The XML 1.0 Recommendation is lauded for many diverse uses. As long as at least one of them catches hold, XML will surely be important to software users and developers. Since XML is designed to work well with the Internet, its publishing uses will not be confined to the limited publishing world in which SGML operates. In addition, the data exchange uses will find a market.
Many persons take pains to assure others that XML will not replace HTML, and that, in fact, they'll work together. While HTML, as an SGML product, is supposed to define a certain type of structured document, those comparing it to XML describe HTML as a presentation format (e.g., Jonathan Marsh, cited in Stamper). In the next few years, as newer versions of browsers, with XML support, become the standard on desktops, I imagine XML documents with accompanying stylesheets will become the norm.
In the meantime, XML documents will begin sitting next to the HTML documents to shuttle information back-and-forth between the web page and the database. In addition, anyone who has access to the XML DTD can use a script to draw information from another web page into his or her own. Microsoft's Charles Heinemann describes a method in which one can put weather updates, stock quotes, baseball games scores and movie listings directly on her page by using XML and ActiveX. This method also allows her to write a function to calculate the current value of her stock portfolio (Heinemann). In the future, the growth of XML will make such information gathering more common.
While the increasing dynamic generation of web pages will make searches more difficult, the tags embedded in XML documents provide enough information to make online searches easier. The net gain or loss in searching may depend on the need. For example, if one wishes to search for document "authors" in general, an XML tag for an author would provide the needed information easily. If one was searching for a specific author, however, the fact that a document tags an element as "author" would not be helpful if the page is to be generated dynamically.
Beyond the generation and regeneration of web pages, XML DTDs such as that of the health-care and automotive examples should become more common. These may not be as widely noticeable as general web pages, however. Such XML documents may indirectly affect many people. But, I imagine these DTD developers would probably remain within the industries and general developers and users will not be directly aware of the use (or lack thereof) of industry-specific XML documents.
Application-to-application data exchange, particularly bridging COM and CORBA, will happen if developers make it happen. And it appears that's what developers are doing. So, future database and applications developers may have to learn the new syntax of XML communication. J. P. Morgenthal writes that there is a competition of purism vs. pragmatism (Morgenthal). If object purists win, XML will not gain the distributed computing prominence currently envisioned. If the pragmatists win, XML will be the "agnostic middleware" connecting object models. Morgenthal claims OO programming does not live up to its promises, but that successes outweigh failures in the ASCII-based Web world. Thus, text files used pragmatically may solve the problems where other paradigms have failed (Morgenthal). Wait and see.
Appendix: DataChannel User Interface Example (from Mikula and Gordon)
<container> <tabset defaultborder="10" defaultspacing="5"> <tab label="General" border="10" spacing="5"> <label label="Randy Gordon" fontsize="18"/> <textfield label="Title:" id="title" w="100%"></textfield> <textfield label="Desc:" w="50%"></textfield> <button label="Button" w="100%"></button> <textfield label="URL:"></textfield> <container flow="horizontal" w="100%" border="0"> <button label="Push Me"/> <button label="And Me"></button> <button label="Me Too!" x="-80" w="80"></button> </container> <textfield label="Headline:" y="-20"></textfield> </tab> <tab label="Schedule"> <textfield label="Stuff:"></textfield> <radiogroup label="Schedule" rows="3" cols="2" w="50%" border="20"> <radiobutton label="Hourly"></radiobutton> <radiobutton label="Daily"></radiobutton> <radiobutton label="Weekly"></radiobutton> <radiobutton label="Monthly"></radiobutton> </radiogroup> <textfield label="More Stuff:"></textfield> </tab> <tab label="Export/Import"> <textfield label="Stuff:"></textfield> <container w="100%" flow="horizontal" border="0"> <radiogroup label="Left" rows="3" cols="2"> <radiobutton label="Hourly"></radiobutton> <radiobutton label="Daily"></radiobutton> <radiobutton label="Weekly"></radiobutton> <radiobutton label="Monthly"></radiobutton> </radiogroup> <radiogroup label="Right" rows="3" cols="2" w="600"> <radiobutton label="Hourly"></radiobutton> <radiobutton label="Daily"></radiobutton> <radiobutton label="Weekly"></radiobutton> <radiobutton label="Monthly"></radiobutton> </radiogroup> </container> <textfield label="More Stuff:"></textfield> </tab> <tab label="Test Corners"> <component w="20" h="20" x="0" y="0"></component> <component w="20" h="20" x="-20" y="0"></component> <component w="20" h="20" x="0" y="-20"></component> <component w="20" h="20" x="-20" y="-20"></component> </tab> <tab label="Tree"> <tree id="testtree" w="50%" h="100"> <treenode label="First"/> <treenode label="Second"> <treenode label="1"/> <treenode label="2"/> <treenode label="3"/> </treenode> <treenode label="Third"/> </tree> <button label="Set Tree"/> </tab> </tabset> <container h="20"></container> <button label="Doit" w="50"></button> </container>
References
Allen, Charles. WIDL: Application Integration with XML. http://www.webmethods.com.
CommerceNet. EDI and Network Services. http://www.commerce.net/members/portfolios/technology/edi/index.html
Hannon, Brian. Common Ground for Data Exchange. PC Week, vol. 15, num. 15, p. 33.
Heinemann, Charles. Help, My Web Page Needs a Makeover. http://www.microsoft.com/sitebuilder/columnists/xml041398.asp
Libicki, Martin C. Information Technology Standards: Quest for the Common Byte. Digital Press: Toronto, pp. 266-271.
Mikula, Norbert H. Template-driven XML Generator: Integrating Legacy into the New World of XML. http://www.datachannel.com.
Mikula, Norbert H. and Gordon, Randy. XML-Driven User Interfaces: Toward True Platform-Independent User Interfaces. http://www.datachannel.com.
Morgenthal, J. P. Objects of Desire: Purism vs. Pragmatism. http://techweb.cmp.com/internetwk/columns/logic0406.htm.
(1) Radosevich, Lynda. Health care uses XML for records. InfoWorld, vol. 19, iss. 34, pp. 51-52.
(2) Radosevich, Lynda. XML Initiatives Take Shape. InfoWorld, vol. 19, iss. 37, pp. 1, 24.
Stamper, Chris. The Web's New Language. http://www.abcnews.com/sections/tech/DailyNews/xml_0327.html.
Vizard, Michael. A Java Truce Could Lead to an XML War. InfoWorld, vol. 20, iss. 13, p. 3.
(1) Walsh, Jeff. Microsoft Tunes Web Use of XML. InfoWorld, vol. 20, iss. 13, p. 58.
(2) Walsh, Jeff. Netscape, Microsoft Flex XML. InfoWorld, vol. 20, iss. 14, p. 58.
(3) Walsh, Jeff. New W3C-backed protocol ups scalability. InfoWorld, vol. 20, iss. 15, p. 55.
(4) Walsh, Jeff. Server, Software Vendors Exercise XML. InfoWorld, vol. 20, iss. 14, p. 58.
(5) Walsh, Jeff. XML Gets Ready for Prime Time. InfoWorld, vol. 20, iss. 14, pp. 57-58.
(6) Walsh, Jeff. XML to Ease Net Tensions. InfoWorld, vol. 20, iss. 13, pp. 1, 24.
Webber, David. BOO!!! Are We All History??? http://www.geocities.com/WallStreet/Floor/5815/dw01.htm (www.xmledi.net).
WIDL Submission, to the World Wide Web Consortium. http://w3c.org/Submission/1997/15/
X-ACT FAQ's. http://www.x-act.org/faqs.html.
X-ACT. Industry Leaders Drive Development of XML Applications with New Alliance. http://www.x-act.org/main.html.
XML 1.0 Recommendation, of the World Wide Web Consortium. http://w3c.org/TR/1998/REC-xml-19980210.
XML Data Submission, to the World Wide Web Consortium. http://w3c.org/TR/1998/NOTE-XML-data-0105/.
XML/EDI Group's Charter. http://www.geocities.com/WallStreet/Floor/5815/charter.htm