To understand the scope of the broader problem that
needs to be addressed, one need understand only two
issues. First, the basic tenet of data interchange
standards development is the now well understood
conversion algorithm. If all data formats are
proprietary, N*(N-1) converters are required. When N
is small, the situation is manageable. When N is
large, the costs of asymmetric data conversion are
exorbitant. A quick review of Born leaves one with a sense
of the magnitude of the problem. Born's desk reference
provides a quick introduction to some of the major
stable file formats: 5 DBMS formats, 10 spreadsheet
formats, 6 wordprocessing formats, 37 graphics formats,
11 Windows related formats (e.g. clipboard, icon,
etc.), 12 sound formats, and 3 page description
languages. The first obvious issue is reducing the set
of data interchange formats.
Second, there is a need to clarify the operations on
interchanged data. For efficiency in coding of
processing software, it is important to be able to
reduce the number of variants that one must deal with.
A small number is acceptable, and one is ideal.
Consider, for example, a structured document such as
SGML. Raymond et. al. identify a number of issues that must be addressed
related to the SGML standard to allow standardized
document operations. These include identity functions,
redundancy control, and verifiable operations. The
authors point out that even with the specificity and
rigor of the SGML syntactic specification many of the
semantics of operations have been left undefined.
Considering the magnitude of the effort required to
make SGML more robust at the semantic level -- lets say
on a par with the relational DBMS operations, one can
only shiver at the thought of developing such rigor
when a document may be copymarked using any of a number
of structural systems -- SGML, Scribe, Latex, XICS,
RTF, etc. or any of hundreds of procedural systems
Word, Wordperfect, etc. The standards community needs
to address the issue of data interchange formats and
the associated operations on those formats as they
pertain to interchange. That is to say, the data
interchange issue involves not only the specification
of a small number of standards for data entities but
also standards for those operations that are tightly
coupled to them. Thus, it is likely that a standard
for searching structured documents for certain
conditions before transferring them over the network
for processing locally will become as important as the
standard format for interchange.
In the early 1990's, the Strategic Planning Committee
of X3 was in the process of developing a long range
plan. The committee invited input from academia, which
resulted in a presentation to the committee by the
author. Based on
earlier work by Spring and Carbo-
Bearman
the presentation suggested a number of models that
might be used to guide the choices about
standardization, and suggested a framework for thinking
about the process of managing standardization in the IT
arena.
At that point in time, there were a number of models of
the overall IT process floating around. One of the
more popular was the JTC-1 submission commonly referred
to as the MUSIC model -- Management, User, System,
Interchange, and Communication. The author suggested a
similar conceptualization: Interconnection, Interface
(later Interaction), Interoperation, and Interchange.
The committee spent time talking about the existence or
lack of existence of detailed models in these areas,
the need for architectures, the costs of developing the
models, and the implications of not developing them.
More recently, Spring and Weiss
have presented a theoretical case, with some
supporting empirical evidence, suggesting reference
standards will be undersupplied in a market-driven
standardization environment. This claim would seem to
be supported by Oksala, et. al.
Whether one advocates TCP/IP, OSI, SNA, DNA, or some
hybrid, the Open Systems Interconnection Reference
Model (OSIRM) serves as a beacon to guide individual
standards development in the area of interconnection
standards. The OSIRM is THE reference standard used to
explain not only OSI efforts, but the work of
developers from OSF to IETF. Similarly, there is
growing evidence that POSIX will serve as a reference
standard for efforts related to the operating system
interface. More organic than the OSI effort, it none-
the-less serves the purpose. Work on a reference model
for human-computer interaction
looked to provide a reasoned
model of human-computer interaction to serve as the
basis for assessing standardization efforts in this
area. Unfortunately, this model was not endorsed by
the Information Systems Standards Panel of the American
National Standards Institute. While the reasons for
the lack of endorsement are the subject of another
article, the model remains in the literature, along
with others, as a reminder that we need to think about
how the myriad standardization efforts in this area
might be better organized and managed. This brings us
to the last area -- data interchange. As suggested
above, the benefits of standardization are dramatically
clear these days, and the cost of not standardizing is
equally clear. Is there a reference model waiting to
be articulated? Or must we accept the proposition that
the technology is in too rapid a state of evolution to
specify one? Is this a period of incunabula -- the
early stages of development of any industry or
technology. It may well be that just as during the
incunabula of books when books were idiosyncratic and
chained to the library bookshelves, so electronic data
during its period of incunabula will be chained to
machines with the gossamer bonds of proprietary
formats.
Standards, public specifications, defacto standards, and common practices in the area of data interchange are numerous: TIFF, GIF, ODA, SGML, RTF, ASCII, UNICODE, ODBC, SQL, STEP, PDES, PDF, SPDL, PCL, HPGL, GKS, CGM, DIF, RTF, IMG, DXF, PICT, DVI, IGES, EPS, WAV, AU, UNICODE, to name a few. At one level, the reference model must organize and classify these standards in some fashion that allows us to make sense out of the menagerie. At another level, we would hope that a reference model would enable us to predict problems that will occur and identify areas in which further standardization is needed. Over the last several years, I have set out for students, and for myself, the challenge of reasoning about the framework within which one might posit a reference model for thinking about data interchange. Borrowing liberally from other architectures and reference models, let me suggest the following as a beginning point for the discussion.