The use of content management systems (CMS) to author and publish document centric content of many different types is pervasive in today's public and private organizations. There are a large number of Content Management technologies available ranging from free software and open source solutions, all the way to CMS enterprise systems which can cost several hundred thousand dollars per license [CMSWatch05]. With the continued growth and acceptance of XML document formats, and the further evolution and availability of XML tools, a content management system that is oriented to XML is fast becoming an important tool. This paper explores some of the features of such an XML Content Management System (XML CMS) and some of the technologies that naturally enhance the power of such a system.
The main reason to have an XML CMS is if you have XML content. In this paper we focus on document centric XML content, meaning XML content authored by humans for humans to read. Another category of XML formats called data-centric XML is not covered here. A first step in determining why XML CMS is to determine why XML content. Reasons to have XML content are:
The XML CMS unleashes the value of having your content in XML. An XML CMS allows you to collect on the value proposition and promises of XML Content. Below is a description of selected XML CMS features and uses cases.
Fundamentally a CMS is used to satisfy two broad use cases:
Within these two broad cases there are a number of sub use cases and variations on these use cases:
Authoring in a CMS has the following use cases:
One of the most important tools to authors is the WYSIWYG (What You See Is What You Get) XML Editor. Users expect this because they have used Microsoft Word, WordPerfect or Open Office in the past. Some WYSIWYG editors run on the desktop and some run in the browser. Below is a table of example WYSIWYG editors. These editors may be configurable and support many XML vocabularies or they may support just one vocabulary. The environment of the editors may be either the desktop or the web.
| Editor | Vocabulary | Environment | Description |
| Amaya | W3C XHTML | desktop | Amaya is a free XHTML editor, however in early versions right now. |
| Open Office | Oasis Open Document 1.0 | desktop | Open office is a free editor for the Open Document standard. |
| XMetal | many | desktop | XMetal is a Windows based editor and is configurable for any XML vocabulary. |
| XMetal ActiveX | many | browser | XMetal Active X is an ActiveX control that is based on the core XMetal engine. |
| Epic | many | desktop | A Windows based configurable editor. |
| JXHTML | W3C XHTML | browser | Runs in any browser. |
There is not much use in authoring if there is no ability to publish content. Publishing is the process of preparing the Publication for distribution to readers. This publish process may range in complexity from the simple act of exposing the author's XHTML on the web, to a more complicated publishing pipeline. For example, a print publish pipeline could have the following steps:
The nature of the pipeline and the publish process will vary greatly depending on the final published form. Possible final forms and their publish processes are:
Some will argue that the underlying technology of a CMS is not important. What is important is the features of the CMS. This statement is both true and false. Of course the features are important, without them certain use cases cannot be satisfied. But without a sound technical foundation, any computer software will be sensitive to cost overruns, too many bugs and maintenance problems. In fact, in many cases features are not possible in a system without a sound technical foundation. Poor technology eventually 'leaks' through to the user. For this reason, it is always prudent to ensure that the technology 'under the hood' has a sound footing. Fortunately, XML is a family of sound technologies based on a set of consistent, interlinked standards and tools. In an XML CMS these standards and tools can be exploited to not only deliver full featured software but also robust maintainable software. The key XML technologies of a CMS are:
Fundamental XML refers to:
<math xmlns="http://www.w3.org/1998/Math/MathML">
<msqrt>
<mi>x</mi>
</msqrt>
</math>
The challenge, however, becomes how to display this now compound document in your WYSIWYG editor or your end published form be it XHTML or PDF. Amaya and XMetal have editor support for MathML. Strategies for browser support range from generating SVG to generating an image on the fly.
Traditionally, content management systems have a relational database for data storage.
Figure 1. Classic CMS Architecture with RDBMS for Data Storage.
With an XML Content management system, the most sensible data store is an XML DBMS. (shown in Figure 2).
Figure 2. CMS Architecture using XML DBMS for Data Storage.
Relational systems are very poor at storing each of the information items of document-centric data and typically store entire documents as one blob of data making sub-items in the document unavailable to the DBMS. This blob treatment of XML makes it impossible to optimally and flexibly perform full text searches without additional software. As well, with a blob layout, it is not possible to do queries that generate content derived from the documents in the database, derived content such as a table of contents, lists of titles, lists of abstracts or other manipulations of the content. On the other hand, an XML DBMS stores and retrieves XML Documents as fully accessible trees of information items. An XML DBMS will most likely comply with the XML Query Standard, XQuery which allows you to query any item within a document or combine items from multiple documents. As well, an XML DBMS supports security, scheduled backups, transactions, recovery, binary storage, as well as other expected DBMS features. A more extensive discussion of the native XML DBMS is in Chapter 8 of XQuery From the Experts.
XSLT plays a large role in transforming the stored XML into a published form, especially when the published form is different from the stored form. By "stored form" we mean the form or vocabulary you have chosen for your XML content. XSLT is especially useful for transforming stored XML into XHTML or HTML. For instance, if your stored XML vocabulary is DITA or Open Document, then you can use XSLT to get an XHTML form for the purpose of displaying in any browser. In fact, the DITA toolkit includes the transformations from DITA to HTML and from DITA to PDF. A very common print format is PDF since Adobe has free printer rendering support for PDF as well as a free browser plugin for viewing PDF. To get to a page based layout like PDF, publishers often use the XML format called formatted objects (part 2 of the XSL specification). This allows XSLT and thus XML technologies to be used for the lion's share of the transformation with the final formatted objects to PDF step being a simpler.
For organizations that already have document-centric XML, you will want to have a CMS system that is XML savy - an XML CMS. For those organizations considering a switch to XML content, an XML CMS will help with the authoring and publishing of your documents. Once you have an XML CMS with XML content, a range of XML technologies offer numerous options for publishing, styling, searching for and otherwise manipulating your content. With an XML CMS, XML content and XML technologies, you can now easily realize the full value of your document content.
References: