SEPARATE CONTENT FROM STRUCTURE - UNLEASH THE POWER OF ELECTRONIC DOCUMENTS

Copyright: Ravi Kalakota & Andrew Whinston

Andrew Whinston

Ravi Kalakota

Center for Information Systems Management

University of Texas at Austin

Austin, Texas 78712-1175

The combination of work flow and electronic publishing is part of a growing list of seemingly disparate applications that leverage electronic documents in new and innovative ways. This raises crucial question: How can organizations effectively exploit recent technological developments in electronic publishing -- multimedia data types, electronic document architectures and distributed hypermedia -- for better supporting organizational work flows?

A distinctive characteristic of organizations work flows today is the increasing prevalence of the "pull" philosophy of on-demand publishing rather than the "push" philosophy of E-mail based systems. Organizations have realized that document content, is no longer something that is just read or archived, it can be used to control and implement organizational workflows in the same way that data in "just-in-time" (or "pull-based") manufacturing affects the movement of a product on an assembly line. The "pull" philosophy for on-demand publishing requires the presence of two elements: (i) flexibility - separating the document content from the way it is structured, formatted or presented; (ii) interoperability - ability to distributed electronic documents across multiple platforms and applications. The two are complementary but separate.

Flexibility also implies the ability to support the evolution of data types. The rapidly mutating composition of data types lets you massage the information content in your electronic documents in ways you never imagined possible a few years ago. Data types have gone beyond ASCII characters to multimedia documents containing text, graphics, sound, and video. Also, the structure of information stored in documents has evolved from the simple linear text and to non-linear distributed hypermedia documents containing objects spread over global networks e.g., Internet. Interoperability is a key element of distributed hypermedia across multiple platforms.

The power and versatility of distributed hypermedia documents is most prominent in electronic publishing applications built on the Internet using the World Wide Web (WWW) and the NCSA Mosaic client. The WWW uses a flexible document architecture based on SGML (Standard Generalized Markup Language -- ISO number 8879), that allows it to completely separate the content of a document from the way it is formatted (single or double column), styled (headings in Times 24 or Helvetica 36) or rendered on a variety of output media -- tty, GUI, CD-ROM, Braille etc. We will discuss the SGML in detail later in the article.

Let's examine current methods used by organizations for distributing document-based know-how for internal or external consumption. This will enable us to understand the complexity of interoperability and the need for flexible document architectures. Distribution is accomplished in some, through paper documents, routing slips, and filing cabinets. In more progressive ones, through groupware such as Lotus Notes.

Notes, since its introduction in 1989, has defined groupware as a document database replicated over many sites, tightly integrated with an E-mail program accessible through an friendly user interface. It has also provided developers and VARs an accompanying API's for building add-on products and customizable applications. By sheer market dominance, Notes has carved an entire market segment on its own terms. In fact, most organizations and academic institutions understanding of groupware is dictated by the capabilities of Notes.

This raises the interesting question: Can Notes or similar products of its genre support the new philosophy of "pull"-based workflows utilizing on-demand or customized electronic publishing? The plain answer: No, not until the underlying technology used to implement electronic documents changes. For instance, Notes uses DEC's CDA (Compound Document Architecture) as an under-the-hood technology, meaning that applications interact with CDA directly, while end users are shielded from it completely. CDA is based on ODA (Open Document Architecture -- ISO number 8893).

One advantage of CDA is that it simplifies the task of working with graphics, text, images, and other data types in a single document. CDA partitions documents so that different applications can work on its various pieces and manipulate document parts generated by different applications, such as word processing, spreadsheets, graphics etc. CDA lets you receive from different computing environments by encoding documents in DDIF (Digital Document Interchange Format). DDIF's encodes documents as the sequence of bits transmitted in a data packet exchanged between platforms or stored as in-memory structures called aggregates that a application works on.

DDIF is similar in many respect to PDLs (page-description languages) such as PostScript which can be exchanged between various printers with format specification intact. CDA is comparable to Adobe Acrobat which allows limited interoperability by enabling PostScript to be viewed and manipulated on computer screens.

However, CDA and PostScript are limited in three key respects that form the strengths of another popular document architecture called SGML. First, SGML provides flexibility through document- encoding schemas called Document Type Definitions (DTDs) which can be used to specify the nature of a document's content and how that content is organized logically. This elevates an electronic document from being a flat file to a document database with well- defined schema structure. DTDs also determine how your content is stored physically (i.e., If you use video what formats and frame rates-- MPEG, JPEG, Video for Windows or QuickTime -- are permissible). Finally, DTDs also determines how your content is finally formatted -- meaning it determines such styling information as italics, underlines, and bold headings-- and most importantly rendered-- Hypertext, CD-ROM, Braille, paper-based output. The ability to render any document to a variety of outputs is the strength of SGML that is crucial in a mixed-platform environment. The importance of these features can be seen in the phenomenonally successful World Wide Web (WWW) which uses a SGML DTD called HTML (Hypertext Markup Language).

In the world of multiple-client, multiple protocol, mutiple-servers having a multitude of documents in a varity of formats, it makes sense to use non-proprietary document encoding scheme such as SGML. This is evident in the various on-line discussion groups on the Internet about the future of Lotus Notes. The general feeling is that Notes cannot survive in its current form given the rapid change of customer requirements, need for customizing or tailoring to specific customers and the new standards, protocols and "convergence" taking place. This is true of any large single-vendor system that tries to "do it all." The "mix-and-match" model is becoming more robust than any IS manager thought it would. The glue for mix-and-match is in widely adopted document architectures such as SGML - that are powerful, flexible and universally supported. These architectures enable distributed document sources, when connected electronically through a network, represent important components of an emerging, widely accessable, corporate workflow support libraries that is much larger than anything that has been accomplished earlier.