There are two types of document/content: structured and unstructured. When do you need a Component Content Management System?

A Document Management System (DMS) manages unstructured content; however, structured content needs a Component Content Management System (CCMS).

Unstructured Documents/Content

We refer to unstructured document/content as content created using standard office tools such as Microsoft Word or Google Docs. Although some/most Microsoft Word never use paragraph styling, these documents might follow corporate style guidelines.

The content in these documents is of very little use to other systems; because they lack ‘structure,’ they cannot be easily ‘interpreted.’ Lastly, publications must be updated manually, often with lots of cutting and pasting.

Structured Document/content

Here, structured document/content refers to content created with reuse in mind. Structured content is created as XML using authoring applications such as Oxygen Author or structured Adobe FrameMaker.

Background — Paper-based filing and storage

Before computers became the norm, paper copies of documents were stored in filing cabinets and folders with varying success. As late as 2006, 30% of all essential information was still paper-based (InfoTrends). The average time to retrieve a hard copy document in an efficient environment is 6 minutes. Of all filed documents, 7.5% are lost, and 3% are misfiled (PWC).

Background — Document Management Systems

With the advent of desktop computers, organizations needed control over all this new content. Consequently, Document Management Systems soon began to follow, sold on the premise of making content easier to locate and manage.

A DMS manages whole documents in their original format or as PDFs or images. They rely on metadata and indexing to find the correct document. However, with the sheer growth in data and documents, a DMS will return multiple results to a search, making identifying the valid document difficult.

In addition, while a DMS will manage versions and support collaboration, it will not prevent many similar documents from being created. As a result, when global changes need to be applied, locating all the documents affected is very hard. Consequently, many companies rely on Excel spreadsheets to track documents for updates.

Component Content Management Systems —  for websites

There is more than one type of CCMS (Component Content Management System).

A CCMS manages pieces of content rather than entire documents. For example, WordPress is a CCMS for Websites and Blogs. Users build websites from pages and the pages from blocks. Pages are updated when an individual block of content is changed.

Component Content Management Systems for structured content and DITA

This is a CCMS developed to support XML content.

Introduction to XML

XML is a mark-up language like HTML. However, the tags <b>bold text</b> in HTML describe the format of the content between them. Whereas XML tags describe the content between them <Book Title>The Wizard of Oz</Book Title>.

A set of XML tags is called a Schema; you can create your own schema or adopt a shared standard. There are standards to represent almost any type of data, whether recipes, musical scores, articles, books, or anything else. They can share tools and techniques when a community exists around an XML standard.

Lastly, because XML separates content from appearance, XML tags identify what content is rather than how content should look. As a result, a single XML document can be published in multiple formats.

DITA

DITA stands for Darwin Typing Information Architecture. It is an open-source XML standard most often used to create technical documentation.

Why is it called DITA? Darwin because DITA uses the principles of inheritance and specialization pioneered by the naturalist Charles Darwin. The rest is self-explanatory.

DITA uses topics; a topic is a unit of information that can be read in isolation or inserted into a larger document. To link together topics, DITA uses a DITAmap file. A DITAmap file is simply an XML file that acts as a table of contents linking a series of topics.

The term ‘topic’ is generic. DITA allows, however, the generic Topic to be adapted to represent more specific structures. The basic DITA specification includes Concept, Task, and, Reference. These content units are more specific versions of the generic Topic. They can be handled with special rules if you want. But if you don’t have special rules, they can also be treated more generically as topics.

Benefits of a common vocabulary

A common vocabulary means that users can share information, tools, and the code used to handle the content. For example, if you use a DITA-based format, several editing tools can be used. Tools used to process the content can also be shared. For example, DITA includes the code and stylesheets needed to create PDF, HTML, and other output formats. As a result, new output types will appear, and other DITA-based solutions can use the existing tools to support the new format.

DITA Open Toolkit

For DITA, the community provides the DITA Open Toolkit. This toolkit includes a variety of transforms that can take DITA and render it to HTML, PDF, and other formats. It also provides an extensible architecture. For example, if you customize DITA, you can create a plugin so that DITA solutions can handle the specific requirements.

DITA Open Toolkit plugins can configure editing tools, extend the rules of DITA, or modify the included stylesheets. Because all proprietary extensions are mapped to more generic DITA structures, any DITA tool can process content. For example, if you use a DITA-based vocabulary that defines a ‘chapter,’ systems that do not understand ‘chapter’ can always treat the encoded content as a more generic ‘topic.’

So, XML is a set of rules for creating a particular language to encode your content. Meanwhile, DITA is a language able to be extended to more specific uses that still share a common grammar. DITA provides a base set of stylesheets for rendering your content in various formats. Many XML tools exist to process DITA documents, providing extension points so you can adapt them as needed.

Reuse

DITA differs from other standards in that it uses a Topic-based approach to authoring; each Topic should be self-contained in that it makes sense on its own. These topics fall into three established categories:

  • Concept – overview of what something does.
  • Task – information on how to do something.
  • Reference –information on how to check something.

Once created, the topics are assembled for a particular publication using a “DITAmap,” that defines their order. The modular authoring approach and self-contained nature of topics enable them to be easily reused across multiple publications.

This is particularly useful when companies produce products that share components because aspects of one manual are easily incorporated into others. Obviously, this saves on authoring time but also offers huge savings where content is translated into multiple languages. For example, a new product manual may be able to reuse 60% of the topics already created; thus, translation/localization costs are instantly reduced by 60%.
Just as Lego blocks can be rearranged to make different shapes and objects, DITA Topics can be arranged to make different publications.

Output

DITA content can be output via an open-source publishing engine called the DITA Open Toolkit – this enables the XML content to be output in multiple formats, including; PDF, XHTML, HTML Help, JAVA Help, OpenDocument (ODT), and Rich Text Format (RTF).

Some additional benefits of DITA

Future Proof

XML is the choice for documenting products with long lifespans because it is not dependent upon any single authoring application. A ship or train, for example, can be expected to be in use for twenty-plus years; can you open and read a document created in Word Perfect twenty years ago? XML documents can also be printed as flat text files and read and understood by humans.

Easy to perform global updates

With a modular XML system such as DITA, a single Topic, for example, a warning or safety notice, might be referenced by multiple publications. Once the Topic is updated, all publications using that Topic can be updated simultaneously — there is no risk of updates being missed because of a manual process.

Lower cost of localization

Because DITA Topics are reused, only the Topics that have changed need to be localized when publications are updated. As a result, Bluestream has seen savings of as much as 60% of the localization budget.

Easier to find content.

DITA uses metadata, which can be used when searching for content.

Easier to share information

Because XML is not ‘tied’ to a single application, it is much easier to share content across systems.

Structured content requires managing — a DITA CCMS

If you are to reap the benefits from structured content, it must be appropriately managed. Using an open-source software code repository system is one option, but using a CCMS (Component Content Management System) is much easier.

A CCMS has been specifically designed to support structured content/DITA. Consequently, it has the features and functionality to support the entire content lifecycle.

  • It will support the reuse of content both as topics and by reference. As a result, content and publications are easy to update.
  • Workflow is built-in, adding additional control to your content lifecycle. Bluestream XDocs DITA CCMS supports BPMN 2.0 open standard for workflow.
  • It will manage metadata, taxonomies, and in some cases, Ontologies. Bluestream XDocs DITA CCMS supports Ontologies and the OWL (Web Ontology Language) standard.
  • A CCMS will support COPE — Create Once, Publish Everywhere. A single DITAmap can publish in multiple formats: PDF, HTML, XHTML etc. for multiple devices/platforms.
  • It will also deliver substantial cost-saving on localization.
Press enter or esc to cancel