XML Content Management System
For Document Centric XML
Abstract
The use of content management systems (CMS) to author and publish document centric content of many different types is pervasive in today's public and private organizations. There are a large number of Content Management technologies available ranging from free software and open source solutions, all the way to CMS enterprise systems which can cost several hundred thousand dollars per license [CMSWatch05]. With the continued growth and acceptance of XML document formats, and the further evolution and availability of XML tools, a content management system that is oriented to XML is fast becoming an important tool. This paper explores some of the features of such an XML Content Management System (XML CMS) and some of the technologies that naturally enhance the power of such a system.
Why XML CMS
The main reason to have an XML CMS is if you have XML content. In this paper we focus on document centric XML content, meaning XML content authored by humans for humans to read. Another category of XML formats called data-centric XML is not covered here. A first step in determining why XML CMS is to determine why XML content. Reasons to have XML content are:
- XML's Data Model is for Documents
- The tree like structure of XML works well for documents. For example, a tree easily models sections within sections, a common structure of documents. As well, an XML tree maintains the order of content and order is important to human readable documents. This tree like structure is an underpinning of the XML data model which is formally defined in the XML Information Set specification and its definition of information items. In fact, XML goes beyond a simple tree representation to allow text items interspersed with markup, followed by yet more text. This mixed content model is also necessary to represent documents.
- XML is an open Set of Standards
- XML is a set of open standards. Open standards allow many vendors to create tools to support that standard and encourage information workers to make the investment to learn these standards. This results in a diversity of tools for the end user and many readily available tools to access and manipulate XML content. The open standards of XML is much better than proprietary solutions where it is more difficult to access parts of your document, difficult to re-purpose your documents and you are often locked into a single vendor for document processing tools.
- XML is extensible
- XML is an extensible standard (the X in XML). XML is fundamentally extensible in that XML is a set of rules about how trees of information items are constructed, but XML does not dictate what the structure and tag names of a particular kind of tree will be. This leaves domain experts such as E-Learning information architects and technical writers free to define what information items are important to them and to structure these items as they see fit. With that said, however, many organizations find this ability for XML to be any kind of information daunting. These organizations find the creation of an organization specific document information model too daunting to design and evolve. As well, many, of the same structures such as tables, paragraphs and lists occur in many kinds of documents and thus each organization has to establish these from scratch independent of other organizations. For these reasons, a standard information model developed outside the organization is most often a better choice. For instance, standard information models (or XML schemas as they are often called) such as XHTML and Dita (Darwin Information Typing Architecture) define a core set of information items but allow extensibility from that core set. With a combination of the XML namespace mechanism and/or the schema extension mechanism standard schemas can be customized for special purposes within an organization.
- XML easily supports Meta Data
- Another example of the extensibility of XML is how document-centric XML can be annotated with data-centric items, for example, date, author, publisher, factoryName. This extra data, often called meta data makes searching for and categorizing documents easier.
- XML is becoming all pervasive
- Most new document formats are XML and old formats are either already XML such as Open Document for Open Office or, in the case of Microsoft, the binary .doc format will have an XML format called .docx. Aside from a broad range of enterprise specific XML formats used by large companies such as Boeing and Nortel, an array of standard XML formats have been defined such as:
XDocs White Papers
- XML Content Management System

