Next: A Proposal Up: A Reference Model for Previous: Introduction

Problem Definition

To understand the scope of the broader problem that needs to be addressed, one need understand only two issues. First, the basic tenet of data interchange standards development is the now well understood conversion algorithm. If all data formats are proprietary, N*(N-1) converters are required. When N is small, the situation is manageable. When N is large, the costs of asymmetric data conversion are exorbitant. A quick review of Born leaves one with a sense of the magnitude of the problem. Born's desk reference provides a quick introduction to some of the major stable file formats: 5 DBMS formats, 10 spreadsheet formats, 6 wordprocessing formats, 37 graphics formats, 11 Windows related formats (e.g. clipboard, icon, etc.), 12 sound formats, and 3 page description languages. The first obvious issue is reducing the set of data interchange formats.

Second, there is a need to clarify the operations on interchanged data. For efficiency in coding of processing software, it is important to be able to reduce the number of variants that one must deal with. A small number is acceptable, and one is ideal. Consider, for example, a structured document such as SGML. Raymond et. al. identify a number of issues that must be addressed related to the SGML standard to allow standardized document operations. These include identity functions, redundancy control, and verifiable operations. The authors point out that even with the specificity and rigor of the SGML syntactic specification many of the semantics of operations have been left undefined. Considering the magnitude of the effort required to make SGML more robust at the semantic level -- lets say on a par with the relational DBMS operations, one can only shiver at the thought of developing such rigor when a document may be copymarked using any of a number of structural systems -- SGML, Scribe, Latex, XICS, RTF, etc. or any of hundreds of procedural systems Word, Wordperfect, etc. The standards community needs to address the issue of data interchange formats and the associated operations on those formats as they pertain to interchange. That is to say, the data interchange issue involves not only the specification of a small number of standards for data entities but also standards for those operations that are tightly coupled to them. Thus, it is likely that a standard for searching structured documents for certain conditions before transferring them over the network for processing locally will become as important as the standard format for interchange.

In the early 1990's, the Strategic Planning Committee of X3 was in the process of developing a long range plan. The committee invited input from academia, which resulted in a presentation to the committee by the author. Based on earlier work by Spring and Carbo- Bearman the presentation suggested a number of models that might be used to guide the choices about standardization, and suggested a framework for thinking about the process of managing standardization in the IT arena. At that point in time, there were a number of models of the overall IT process floating around. One of the more popular was the JTC-1 submission commonly referred to as the MUSIC model -- Management, User, System, Interchange, and Communication. The author suggested a similar conceptualization: Interconnection, Interface (later Interaction), Interoperation, and Interchange. The committee spent time talking about the existence or lack of existence of detailed models in these areas, the need for architectures, the costs of developing the models, and the implications of not developing them. More recently, Spring and Weiss have presented a theoretical case, with some supporting empirical evidence, suggesting reference standards will be undersupplied in a market-driven standardization environment. This claim would seem to be supported by Oksala, et. al.

Whether one advocates TCP/IP, OSI, SNA, DNA, or some hybrid, the Open Systems Interconnection Reference Model (OSIRM) serves as a beacon to guide individual standards development in the area of interconnection standards. The OSIRM is THE reference standard used to explain not only OSI efforts, but the work of developers from OSF to IETF. Similarly, there is growing evidence that POSIX will serve as a reference standard for efforts related to the operating system interface. More organic than the OSI effort, it none- the-less serves the purpose. Work on a reference model for human-computer interaction looked to provide a reasoned model of human-computer interaction to serve as the basis for assessing standardization efforts in this area. Unfortunately, this model was not endorsed by the Information Systems Standards Panel of the American National Standards Institute. While the reasons for the lack of endorsement are the subject of another article, the model remains in the literature, along with others, as a reminder that we need to think about how the myriad standardization efforts in this area might be better organized and managed. This brings us to the last area -- data interchange. As suggested above, the benefits of standardization are dramatically clear these days, and the cost of not standardizing is equally clear. Is there a reference model waiting to be articulated? Or must we accept the proposition that the technology is in too rapid a state of evolution to specify one? Is this a period of incunabula -- the early stages of development of any industry or technology. It may well be that just as during the incunabula of books when books were idiosyncratic and chained to the library bookshelves, so electronic data during its period of incunabula will be chained to machines with the gossamer bonds of proprietary formats.

Standards, public specifications, defacto standards, and common practices in the area of data interchange are numerous: TIFF, GIF, ODA, SGML, RTF, ASCII, UNICODE, ODBC, SQL, STEP, PDES, PDF, SPDL, PCL, HPGL, GKS, CGM, DIF, RTF, IMG, DXF, PICT, DVI, IGES, EPS, WAV, AU, UNICODE, to name a few. At one level, the reference model must organize and classify these standards in some fashion that allows us to make sense out of the menagerie. At another level, we would hope that a reference model would enable us to predict problems that will occur and identify areas in which further standardization is needed. Over the last several years, I have set out for students, and for myself, the challenge of reasoning about the framework within which one might posit a reference model for thinking about data interchange. Borrowing liberally from other architectures and reference models, let me suggest the following as a beginning point for the discussion.

LAYERS: In any reference model, it is important to modularize the standardization. This allows changes to be made in an isolated fashion. This feature may be noted most dramatically in interconnection standards, but is also apparent in interface API's such as the X Window System and Microsoft Windows a.k.a. the Microsoft Foundation Classes. It is less apparent, but also present in POSIX. This layering serves to separate technologically driven standards, normally in the lower layers, from politically driven standards, normally in the upper layers. As is shown below, the layers might be defined as the atomic, elemental, structural, and application layers. This division serves as a first cut at layering, and might be refined any number number ways. What should not be contested is the reference value of a layered approach.
SCOPE: Data interchange standards deal with different entities. Text data is not image data; image data is not audio data; etc. A reference model should provide an explicit scope for each standard.
META STANDARDS: It has become eminently clear that some data interchange standards are increasingly frameworks within which derivative standards may be developed. Cargill articulated a standards typology that contrasted product/implementation standards with conceptual/process standards. This latter form is often a meta standard. A good example would be ASN.1 or SGML. These are both standards intended to promote data interchange, but neither is an implementation in and of itself. Understanding that HTML is not an alternative to SGML, but merely a derivative allows us to think about HTML in new ways. Syntax standards become important in the world of data interchange in that they provide a method by which data might be reliably described without constraining the definitions to only what can be imagined today. Any reference model should provide a means for linking derivative standards to the syntax or meta standards upon which they are based.
DATA OPERATIONS: Finally, data interchange standards have historically addressed both representation and presentation. It is both the significance of SGML, and its major drawback that it clearly and completely separates these activities. Having to wait for DSSSL and Hytime made the use of SGML difficult. At the same time, the separation of the syntactic and operational aspects of the standardization added clarity to the functionality of each of the pieces. OLE and CORBA represent efforts that bridge data representation and operations on data by using object methods. Java represents an effort to define a syntax for data operations. The standard doesn't define the syntax of the data, but the syntax of the operations on data. A reference model needs a way to relate tightly coupled operation standards to the data interchange standards on which they operate. (Tightly coupled operations would be those that are dependent upon the data standard for meaning. For example, SQL is tightly coupled to records stored in a data base.)

Next: A Proposal Up: A Reference Model for Previous: Introduction

Michael Spring
Sat Apr 6 10:34:46 EST 1996