Open and Industry Standards in Document Management
Documents in formats such as Microsoft Word, PowerPoint, Excel, and Adobe PDF are sub-optimal from a pure document management perspective because they do not adequately separate format from structure.
The organization of the document is visually represented through formatting, which requires a human eye to reliably parse. Nevertheless, these file formats are entrenched and they are the lingua franca of the "knowledge worker." Such documents also embed metadata attributes within the file, making it important to synchronize these metadata with the attributes associated with the document in the repository. For example, there might be a "title" field in the metadata and also the text of the title of the document in a bold heading on the front page of the document. If the title changes, it needs to be changed in both places. Besides these industry standards, a number of open standards have appeared and are becoming more and more the basis on which document management systems are built. The following table lists the most relevant standards:
The "open" standards such as XML are more and more influencing the
way documents are stored and managed. Emerging XML standards such as
DocBook
and DITA are showing the direction. However, for now, the content that
document management systems typically manage is metadata: structured
information about the content and its classification. Metadata allow
documents to be more easily organized and found which mitigates issues
of redundancy and allows information to flow more freely through the
organization.
The value of standards for document management is high. Standards such
as JSR-170 separate the document repository from the document producers
and consumers and ensure that even if the tool is changed, documents
can still be accessed and exported. Standards such as XML, PDF or ODF
increase the longer term viability of stored and archived documents.