To understand the scope of the broader problem that needs to be addressed, one need understand only two issues. First, the basic tenet of data interchange standards development is the now well understood conversion algorithm. If all data formats are proprietary, N*(N-1) converters are required. When N is small, the situation is manageable. When N is large, the costs of asymmetric data conversion are exorbitant. A quick review of Born leaves one with a sense of the magnitude of the problem. Born's desk reference provides a quick introduction to some of the major stable file formats: 5 DBMS formats, 10 spreadsheet formats, 6 wordprocessing formats, 37 graphics formats, 11 Windows related formats (e.g. clipboard, icon, etc.), 12 sound formats, and 3 page description languages. The first obvious issue is reducing the set of data interchange formats.
Second, there is a need to clarify the operations on interchanged data. For efficiency in coding of processing software, it is important to be able to reduce the number of variants that one must deal with. A small number is acceptable, and one is ideal. Consider, for example, a structured document such as SGML. Raymond et. al. identify a number of issues that must be addressed related to the SGML standard to allow standardized document operations. These include identity functions, redundancy control, and verifiable operations. The authors point out that even with the specificity and rigor of the SGML syntactic specification many of the semantics of operations have been left undefined. Considering the magnitude of the effort required to make SGML more robust at the semantic level -- lets say on a par with the relational DBMS operations, one can only shiver at the thought of developing such rigor when a document may be copymarked using any of a number of structural systems -- SGML, Scribe, Latex, XICS, RTF, etc. or any of hundreds of procedural systems Word, Wordperfect, etc. The standards community needs to address the issue of data interchange formats and the associated operations on those formats as they pertain to interchange. That is to say, the data interchange issue involves not only the specification of a small number of standards for data entities but also standards for those operations that are tightly coupled to them. Thus, it is likely that a standard for searching structured documents for certain conditions before transferring them over the network for processing locally will become as important as the standard format for interchange.
In the early 1990's, the Strategic Planning Committee of X3 was in the process of developing a long range plan. The committee invited input from academia, which resulted in a presentation to the committee by the author. Based on earlier work by Spring and Carbo- Bearman the presentation suggested a number of models that might be used to guide the choices about standardization, and suggested a framework for thinking about the process of managing standardization in the IT arena. At that point in time, there were a number of models of the overall IT process floating around. One of the more popular was the JTC-1 submission commonly referred to as the MUSIC model -- Management, User, System, Interchange, and Communication. The author suggested a similar conceptualization: Interconnection, Interface (later Interaction), Interoperation, and Interchange. The committee spent time talking about the existence or lack of existence of detailed models in these areas, the need for architectures, the costs of developing the models, and the implications of not developing them. More recently, Spring and Weiss have presented a theoretical case, with some supporting empirical evidence, suggesting reference standards will be undersupplied in a market-driven standardization environment. This claim would seem to be supported by Oksala, et. al.
Whether one advocates TCP/IP, OSI, SNA, DNA, or some hybrid, the Open Systems Interconnection Reference Model (OSIRM) serves as a beacon to guide individual standards development in the area of interconnection standards. The OSIRM is THE reference standard used to explain not only OSI efforts, but the work of developers from OSF to IETF. Similarly, there is growing evidence that POSIX will serve as a reference standard for efforts related to the operating system interface. More organic than the OSI effort, it none- the-less serves the purpose. Work on a reference model for human-computer interaction looked to provide a reasoned model of human-computer interaction to serve as the basis for assessing standardization efforts in this area. Unfortunately, this model was not endorsed by the Information Systems Standards Panel of the American National Standards Institute. While the reasons for the lack of endorsement are the subject of another article, the model remains in the literature, along with others, as a reminder that we need to think about how the myriad standardization efforts in this area might be better organized and managed. This brings us to the last area -- data interchange. As suggested above, the benefits of standardization are dramatically clear these days, and the cost of not standardizing is equally clear. Is there a reference model waiting to be articulated? Or must we accept the proposition that the technology is in too rapid a state of evolution to specify one? Is this a period of incunabula -- the early stages of development of any industry or technology. It may well be that just as during the incunabula of books when books were idiosyncratic and chained to the library bookshelves, so electronic data during its period of incunabula will be chained to machines with the gossamer bonds of proprietary formats.
Standards, public specifications, defacto standards, and common practices in the area of data interchange are numerous: TIFF, GIF, ODA, SGML, RTF, ASCII, UNICODE, ODBC, SQL, STEP, PDES, PDF, SPDL, PCL, HPGL, GKS, CGM, DIF, RTF, IMG, DXF, PICT, DVI, IGES, EPS, WAV, AU, UNICODE, to name a few. At one level, the reference model must organize and classify these standards in some fashion that allows us to make sense out of the menagerie. At another level, we would hope that a reference model would enable us to predict problems that will occur and identify areas in which further standardization is needed. Over the last several years, I have set out for students, and for myself, the challenge of reasoning about the framework within which one might posit a reference model for thinking about data interchange. Borrowing liberally from other architectures and reference models, let me suggest the following as a beginning point for the discussion.