Last updated October 03, 2014 20:13, by eric_armstrong

RuDI: Ruby Utilities for DITA processing

This project is aimed at improving collaboration and production processes for DITA-based web documents, by providing high-end CMS features using inexpensive open-source tools.

The problems, in general, are these:

  • When a DITA file is renamed or moved, all links to that file need to be found and modified.
  • When a DITA file is deleted, all links to it need to be identified
  • Outputs from the DITA Open Source Toolkit, which generates documents from DITA files, are not well-formatted.
    (It runs, but the results aren't anything you can share with customers. So you wind up "customizing" it -- a euphemism for fixing it to generate usable outputs.)

The production problem can be solved using high-end editors like XMetal (great HTML output) and FrameMaker (great PDF output). Both of those organizations have resolved the issues with the Open Source Toolkit, but they have a high per-seat cost. (Both also have great WYSIWYG editing capabilities, which is great.)

The other alternative is a high-end, extremely expensive Content Management System (CMS). Those systems solve the file organization problem, as well as the production problem, but they do it by putting the files in a database, where it is much more difficult to use an open source editor like Notepad++, for example, to do a search across all the files, so you can do a simple substitution. Those systems have the advantage that all outputs can be generated with the press of a button, and they make the use of much less expensive editors effective. But with all the money it takes to put the CMS in place--and administrators to manage it--the cost of a DITA solution skyrockets.

The goal is to solve those problems in a much less expensive way. To do so, the project has several aspects:

  • RUNFAR: RUby-based Notation-language for (site-wide, recursive) Find And Replace processing. (Spec)
        --Spec only, at this point--
    When complete, the idea is to replicate DreamWeaver's site-management capabilities, where links are automatically modified when files are moved or renamed. The package could then be integrated into the version control system (e.g. Subversion), making a poor-man's CMS. It could also be used as the basis for an interactive proofreading and style-checking tool, and it could be used to convert XSLT transforms into rXSLT format--at least for basic transforms that don't have conditionals or procedures. Those features could then be added with much greater ease to the rXSLT version.
  • Ruby-based "Fluent XML" Module: A module that lets you use nested function calls to output HTML, without worrying about closing tags, and without having to output a collection of strings. Any undefined method name automatically generates an XML/xHTML tag with that name. Hash-map arguments to the call provide name/value pairs that become attributes.
  • Ruby-based XML Styles and Transforms (RXSLT): A Ruby-based XML transformation engine that lets you write transforms in a Ruby-ized version of XSLT, but which puts the power of Ruby at your disposal to do conditional processing, use subroutines, store values for later use, and do anything else that Ruby lets you do (a lot!). Uses the XML Builder module. Just to be nice, the results are indented to reflect the nesting, so the generated code is readable.
  • DITA Publishing using DreamWeaver Templates: A set of tools that uses the Ruby transformation engine to merge DITA content into DreamWeaver templates. The templates let designers focus on the results they want to produce, and minimizes the amount of code the production team has to write to generate it. Pipeline processing lets the production team specify transforms in more flexible and more powerful ways. This system doesn't replace the DITA toolkit, but rather augments it by adding post-processing steps--which also means that the DITA toolkit can be upgraded at will, without having to retrofit dozens of customizations.
  • Man Page Processing: A utility and suite of processing scripts that uses Ruby processing to generate nroff/troff man pages from HTML generated by the DITA Open Toolkit (OT). Solves some problems that prevents OT-generated troff output from working as a man page. And, because it exists as a separate utility, both it and the OT can be updated separately, so there is no need to re-integrate the utility when you update the OT.
  • Link-Management, Topic Search, and Version Control: [future] One of the really important features of any content management system is the ability to keep links intact in the face of renames, moves, and deletes of files and directories. The goal here is to provide those features in the context of one or more open source version control systems like Subversion. (At this point, the link management algorithms have been defined, but not implemented.) When sophisticated search-and-replace functionality is added to that, the result will be a low-cost solution for topic storage that rivals a high-end CMS.
  • End-To-End Solution: [future] When the DITA publishing solution and link management capabilities are integrated with Claude Vedovini's DITA open platform for web-based editing, the result will be a complete, end-to-end solution for DITA authoring, storage, and production.
  • Collaboration: [future] The goal here is to effectively author DITA documents online. Since others are currently working in that space, the idea will be to integrate these tools and support those systems. (Some worth keeping an eye on are DITAStorm , XOpus, and David Green's Wiki->DITA tool.) Then--given the ability to collaborate on DITA documents--it becomes possible to enable collaborative design-and-decision-making online using DITA specializations.
  • Localization Support: [future] A version control system accumulates many changes to a document, but localization teams typically translate only a few of a those versions. To be of maximum benefit to translators, the system must be able to interact with localization systems. Translators need to be able to store identification information for the last version number they worked with (either in the DITA system or in their own system), and get differences between that version and the current version, for translation. [Note: Such tools can also benefit collaborators, who need an option to get a display that highlights changes from the last version they read.]

The result, ideally, will be a fully integrated system that users can easily set up to produce well-styled, highly readable HTML pages. Those pages will include links to editing tools that use Wiki-text, an online editor like DITAStorm, or a desktop editor like oXygen or XMetaL. At the end of the day, changes made to those topics will ripple through the PDFs and other documents that depend on them, and go out through a variety of delivery channels.

A Note on Licensing

I've avoided the whole discussion of licenses for years. Just not something I wanted to spend my time on. (I know it's important. But I have limited brain cycles.) To set up this project, however, I needed to make a selection from the 18 or 20 "recommended" license models.

Google is your friend. I typed in the names of several promising candidates, to find a page that compared them. I came across this one, and relied on it to make a decision: this page.

To summarize the points made in that page:

  • GPL licenses are "viral", so they're not friendly to a company that wants to use your stuff to make money. But they do ensure that all improvements and extensions go back to the community.
  • BSD licenses (and derivatives like the Apache license) are "use at your own risk" licenses. They're business friendly, but do not ensure the growth of the code in the "software commons".
  • The Mozilla license was really good, but had a couple of bugs.
  • The CDDL license claims to fix those bugs. In the process, it distinguishes between "derived works" (new files) and "modifications" (changes to existing files). You're not required to contribute derived works back to the community (although you can--and should, whenever possible), but you are required to submit modifications back to the community for approval and adoption.

The CDDL seemed like the right idea, so that's the license I chose. (I haven't spent the time reading the license and consulting with lawyers to be sure that it actually does all that, but it sounds like it is trying to do the right thing, at least. So it looked like the best choice available at the time.)

  • Mysql
  • Glassfish
  • Jruby
  • Rails
  • Nblogo
Terms of Use; Privacy Policy;
© 2014, Oracle Corporation and/or its affiliates
(revision 20160708.bf2ac18)
Please Confirm