Source code browse

Revision: 28 (of 28)

Ignore comments and empty lines
» Project Revision History

» Checkout URL


FilenameAuthorRevisionModifiedLog Entry
eric_armstrong 1 about 8 years ago Initial source drop
eric_armstrong 18 almost 6 years ago Rename "spec" to "tests", so (a)...
eric_armstrong 12 about 8 years ago Change order of value and attrib...
eric_armstrong 27 almost 6 years ago Move the spec to my public dropb...
eric_armstrong 28 almost 6 years ago Ignore comments and empty lines
eric_armstrong 22 almost 6 years ago Hopefully clean up the reorg mes...
eric_armstrong 1 about 8 years ago Initial source drop
eric_armstrong 1 about 8 years ago Initial source drop
eric_armstrong 28 almost 6 years ago Ignore comments and empty lines
= RuDI: Ruby Utilities for DITA/XML processing

These utilities take advantage of Ruby's facilities for string processing,
regular expressions, and metaprogramming (dynamic method definitions) in
order to process XML in a powerful, flexible, and ultimately extensible ways.

They are targeted at DITA-based XML in particular, to improve the speed and
ease with which user-focused, task-centric, topic-oriented documentation can
be produced. But they are frequently useful for other purposes, as well.

== Overview

The utilities developed so far are:

* <b>RuDI::XML_Builder:</b> Generate readable XML in a readable way

* <b>RuDI::XML_Transform:</b> Specify readable transformations, much like XSLT,
  in a way that gives you access to Ruby's power when you need it.<br>
  (Requires XML_Builder)

* <b>RuDI::XML_Text</b> A wrapper for the String class that normalizes whitespace
  and does word wrapping for XML text and tags. (It also has a wrap function
  for table cells.) Arguably, this should be a processor
  that opens up the String class and adds it's operations. But it isn't
  written that way at the moment.

* <b>RuDI::HTML_Template:</b> Merge generated HTML into a DreamWeaver template
  using labels specified in the template--a process that achieves an important
  "separation of concerns" between design and production, so the production
  process is simpler, more efficient, and more flexible.

* <b>html2man:</b> A processing script that generates man pages from HTML
  source files, accompanied by a Rake script that runs it as a build process
  on multiple files.

Hopefully coming down the pike:

* <b>Production Sample:</b> A reference implementation that puts everything
  together to build web docs from DITA sources--something you can use to see
  how RuDI works, and that can be used as a basis for your own production
  system that you can customize as needed.

* <b>Purple Link generator:</b> Self-referencing links inspired by Doug
  Engelbart. Small, light-purple hash marks at the end of headings that can be
  bookmarked and passed around as URLs, to take a user directly into a document.

  Notes on HTML tags (for class comments):
     <a name="purple_number_links" id="purple_number_links"></a> --Anchor
      Title Text Goes Here                                       --Title text
      &nbsp;&nbsp;&nbsp;&nbsp;                                   --spacer
     <a href="#purple_number_links">                             --Target (self)
        <span class="PurpleNumber">#                             --Styling

* <b>Link Manager:</b> A set of scripts that keep links from breaking when you
  move a DITA file, rename a directory, etc, so that a basic version control
  system (VCS) begins to acquire features that are typically present only in
  a high-end content management system (CMS). Has three basic parts:
  - <b>Link database:</b> This could be as simple as a YAML file that maps file
    paths to the links they contain. It would be versioned in the VCS, just like
    any other file.
  - <b>Modification commands:</b> These commands do the renames or move,
    adjust links in affected files, and adjusts the link database, as needed.
    Added files could be detected and added to the database using a pre-commit
    hook in the VCS, but deletes need to be done with a command, so the
    LinkManager can tell you which files will be affected by the delete, and
    give you a chance to cancel.
  - <b>VCS Integration:</b> This is the tricky bit. Either VCS commands need
    to invoke link-management functions (ideal, because VCS command-menus are
    already defined for editors and file browsers), or else the link-management
    functions need to invoke VCS commands (or both, so you can do things in
    either order). But if it's only possible to do a one-way mapping from
    link-management commands to VCS commands, then it's desirable to find a way
    to add the link-management commands to the menus in editors and file
    browsers that interact with the VCS.

* <b>Migration Tool:</b> A wrapper for the h2d migration tool in the DITA-OT.
  The h2d tool works on HTML files that have only level of headings. The wrapper
  runs on HTML files that have more deeply nested headings, splitting them into
  separate files, running h2d on them, and creating a map that ties them all
  back together.

  Implementation Note: The migration tool creates generic topics, rather than
  specific information types, in order to minimize "information loss". (While
  it's true that h2d saves all   file content, it puts things that are illegal
  for a given information type into regions designated as "to be fixed". The
  DITA-OT gives no warnings when such regions are processed, so manual inspection
  is needed in any case. The conversion to generic topics is the most robust,
  so it is the most likely way to ensure that the original input remains intact,
  and that it will appear when documents are processed.

* <b>Conversion Tool(s):</b> To assist in the conversion of generic topics to
  information types. The ideal tool will provide a checklist of files, let
  the user select files to convert, specify the information type to convert
  them to (concept, task, reference, or some specialization), and provide
  warnings if file won't convert cleanly. (Ideally, such a tool would be built
  into an editor-environment, so an error would link to a file location, and
  corrections could be made interactively.)

* <b>RuDI::XiPi</b> ("Zippy") An XML Integrated PIpeline processor, like the XML
  XProc standard, but written in Ruby, so that you're not limited to XSL
  transforms. A production chain might consist of an XML_Transform to convert
  standard DITA output into the form you want to use, a Purple_Link generator
  to add self-referencing section-links to the document, and a simple one-line
  transform like this one:
    gsub!(%r(href="local_root_path), 'href="..."')

  A substitution like that replaces references to other documents with relative
  links or http URLS, as appropriate. (Such substitutions can of course be done
  using XSL, or entity references defined in a DTD, or with the "keyref"
  capability coming in DITA 1.2. But a simple global substitution is pretty darn
  easy, and it doesn't require prior planning.)
* <b>RuDI::Rback:</b> A DSL-driven backup tool (like Red Aurbach!)

* <b>RuDI::Drake:</b> A DITA-aware, dependency-driven build utility based on Rake.
  Rather than specifying a "task", you specify a DITA task ("dtask"). The only
  prerequisites you need to specify are a map and, optionally, the ditaval file
  that contains the processing arguments. The DTask establishes the dependencies
  automatically by examining the map, finding the topics, and examining the
  topics to find references to images and conref files. For more, see:

== Change History
* Version 9.2 (in progress)
** Added html2man processing utility
** Reversed the order of XML-builder arguments to eliminate "syntax noise", in
   the form of extra braces and parens (that are needed when a data value follows
   an attribute list, rather than preceeding it.

* Version 9.1
** Initial release. Basic functionality implemented and tested.
** XML builder, XML transform engine, and HTML template-merge program.

== Resources
For detailed discussions of web-page semantics and the production process, see:
* __blog location TBD__

For some of the early thinking behind this project, see:
* Domain Specific "PowerTool" Languages Promote Elegance

* DITA Production Maps -- A Proposal

* Doing DITA Builds Better

== ToDo
* Create a sample project that can be used as a production-template
  o .env file (Yaml) for location of dita-ot and RuDI processing engine
  o Directory structure (Templates, ditasrc, ditaout, css, js, build, webout),
    where build/ contains the processing scripts, ditaout/ gets files produced
    by DITA OT, and webout/ gets the final results.
  o Sample templates for content & tab pages, DITA src and output files
  o Rake build script
     Dir.foreach("/some/dir") do |entry|
* Release as version 1.0
* Post the series of introductory blogs
* Add references to them here
* Build a gem and post it to RubyGems
* Add instructions for installing the gem

== Utility Summaries

Major features for each utility are listed, along with
any remaining items in their respective implementation checklists.

=== RuDI::XML_Builder


* Wrap text
* Normalize whitespace (tabs & NLs)
* Normalize NLs
* Normalize tabs

* Generate readable, well-formatted XML like this:
* Code it in a readable way (w/minimal syntax)
  like this:
    html {
      body {
* Generate tags with names that correspond to existing methods.
  Precede the tag with an underscore. E.g. when attempting to generate
  a <p> tag, there is a problem: "p" is the name of a Ruby debugging method.
  It takes an argument, and tries to puts the results of invoking the inspect()
  method on that argument. Solution: Code _p { ... }. The first leading
  underscore is stripped from any tag that has one, so <p> is generated.
* Modify indentation amount or turn it off entirely to produce this:
* Allow the insertion of arbitrary XML-generating expressions, like this:
    def generate_some_XML

    html {
      body {
* Support XML "mixed text" model (interrupted text) like this:
     <p>Some <b>test</b> text</p>
* Support for XML declarations and processing instructions
* Support for DOCTYPE declarations and comments
* Support for unformatted <pre> and CDATA text
* Word wrap text and tags
* Generate tags with namespaces (e.g. <xsl:foo>)
    ns(:xsl)                # Define the "xsl" namespace { ... }   # Generate <xsl:foo ...> ... </xsl:foo>

Yet To Do
* No unimplemented features

=== RuDI::XML_Transform


* Dynamically remove and replace element transforms
* Copy transforms
* Apply the identity transform, by default
* Dynamically change the behavior of the default transform & other element
  transforms (Copy a transform under a new name and restore it later.)
* Define new transforms in a readable way, like this:
    tx = do
      xfm :t1 do |node|
        ...operate on node to extract data...
        div div({:class => "name"}) do  # start a div, passing an attribute
          ul {                          # start a list
            li {                        # create first list entry
              text! "say something"     # add text (could contain XML tags)
              xfm_node(node)            # recurse on node's children
            }                           # close the <li> tag
          }                             # etc.
* Normalize whitespace by default
* Modify indentation or remove it
   tx.indent = "   "                 # 3-space indent
   tx.indent = nil                   # No indentation and no NL's
* Remove incoming whitespace by default, preserve it if requested
   tx.preserve_ws = true             # Preserve existing whitespace
                                     # (turns off auto indent & NLs)
* Call local function in a transform, for specialized processing, like this:
    def generate_stuff(node)
      ...get stuff from under the node or from elsewhere...
    tx = do
      xfm :t1 do |node|
        _p {                          # create a paragraph
          generate_stuff(node)        # Do arbitrary processing
* Make it easy to extract and modify attributes, like this:
    tx = do
      xfm :t1 do |node|

* Handle comments
* Handle CDATA
* Handle processing instructions
* Transform a file, an XML string or a REXML node
* Return a string or a REXML node
* Write to a file
* Convert XML CDATA to HTML <pre> (configure or override existing transform)

Yet ToDo

* No unimplemented features

===  RuDI::XML_Text


* Normalize whitespace
* Word-wrap text for table cells (cell_wrap)
* Word-wrap and indent text (wrap)
* Word-wrap and indent tags with attributes (tag_wrap)
* Reuse single instance for efficiency (set value and call wrapping function)

Yet To Do

* No unimplemented features

=== RuDI::HTML_Template


 * Read a DreamWeaver-compatible HTML template
 * Generate a file that has the declarations needed to "attach" the file to that
   template in a DreamWeaver site, so that changes made to the template (in DW)
   can be automatically applied to all dependent files.
 * Initialize some elements one time, but reuse the template for multiple outputs.
 * Complain if there are regions that haven't been filled in
 * Ability to force output, even if empty regions exist
 * Ability to use content given in the template
   (Could default to that, but then there would be no way to check that
    all regions were filled in before generating output.)

Yet ToDo

 * No unimplemented features

=== Man Page Processing


* Html2Man processor that converts HTML pages to (nroff) man pages, bypassing
  known issues with DITA OT processing, as described at
* Rake script that manages processing for multiple files
* Convenience scripts (html2man, makeman, solaris_man, linux_man, remake, zap)

Yet To Do

* Canonical directory hierarcy for man pages and sample files.
  (How to make same structure work for project and for project/samples?)

  Project (defined by NetBeans):
  - doc
  - lib/man
  - lib/rudi
  - pkg
  - spec
  - sample

  Sample ("reference implementation"--or template--for a document production
  - Templates
  - css
  - js
  - mockup (design)
  - src/dita
  - out/tmp
  - out/man
  - out/dita
  - out/web
  - out/pdf
  - out/help

* Make tidy processing optional (off by default--not needed for DITA output)
* Modify man/README.html to be more generic, less JavaSE-specific. Use canonical
  structure above.
* Move README.html to sample/README.html?
* Minimize path length in output-directory.
* Decide: Are linux_man and solaris_man both needed? Both sets of Rake tasks?
* Remove non-man-page things from Rakefile
* Remove man-page-post task and support-functions from Rake file
* Modify makepubs to reduce #of tasks it displays.
* Create a test suite
* Add windows versions of processing scripts (pubsmake.bat and html2man.bat)
  so the troff can be generated there, even if it can't be displayed.
* Implement smart column-weighting heuristics, as described at

=== Sample Files ===


* None implemented, as yet

Yet To Do

* Canonical directory structure & Readme (see man page processing, above)
* Template-based processing
  - Use JSP includes for nav pane TOC
* Glassfish-test instructions

  • Mysql
  • Glassfish
  • Jruby
  • Rails
  • Nblogo
Terms of Use; Privacy Policy;
© 2014, Oracle Corporation and/or its affiliates
(revision 20160708.bf2ac18)
Please Confirm