= RuDI: Ruby Utilities for DITA/XML processing
These utilities take advantage of Ruby's facilities for string processing,
regular expressions, and metaprogramming (dynamic method definitions) in
order to process XML in a powerful, flexible, and ultimately extensible ways.
They are targeted at DITA-based XML in particular, to improve the speed and
ease with which user-focused, task-centric, topic-oriented documentation can
be produced. But they are frequently useful for other purposes, as well.
== Overview
The utilities developed so far are:
* <b>RuDI::XML_Builder:</b> Generate readable XML in a readable way
* <b>RuDI::XML_Transform:</b> Specify readable transformations, much like XSLT,
in a way that gives you access to Ruby's power when you need it.<br>
(Requires XML_Builder)
* <b>RuDI::XML_Text</b> A wrapper for the String class that normalizes whitespace
and does word wrapping for XML text and tags. (It also has a wrap function
for table cells.) Arguably, this should be a processor
that opens up the String class and adds it's operations. But it isn't
written that way at the moment.
* <b>RuDI::HTML_Template:</b> Merge generated HTML into a DreamWeaver template
using labels specified in the template--a process that achieves an important
"separation of concerns" between design and production, so the production
process is simpler, more efficient, and more flexible.
* <b>html2man:</b> A processing script that generates man pages from HTML
source files, accompanied by a Rake script that runs it as a build process
on multiple files.
Hopefully coming down the pike:
* <b>Production Sample:</b> A reference implementation that puts everything
together to build web docs from DITA sources--something you can use to see
how RuDI works, and that can be used as a basis for your own production
system that you can customize as needed.
* <b>Purple Link generator:</b> Self-referencing links inspired by Doug
Engelbart. Small, light-purple hash marks at the end of headings that can be
bookmarked and passed around as URLs, to take a user directly into a document.
Notes on HTML tags (for class comments):
<a name="purple_number_links" id="purple_number_links"></a> --Anchor
Title Text Goes Here --Title text
--spacer
<a href="#purple_number_links"> --Target (self)
<span class="PurpleNumber"># --Styling
</span>
</a>
* <b>Link Manager:</b> A set of scripts that keep links from breaking when you
move a DITA file, rename a directory, etc, so that a basic version control
system (VCS) begins to acquire features that are typically present only in
a high-end content management system (CMS). Has three basic parts:
- <b>Link database:</b> This could be as simple as a YAML file that maps file
paths to the links they contain. It would be versioned in the VCS, just like
any other file.
- <b>Modification commands:</b> These commands do the renames or move,
adjust links in affected files, and adjusts the link database, as needed.
Added files could be detected and added to the database using a pre-commit
hook in the VCS, but deletes need to be done with a command, so the
LinkManager can tell you which files will be affected by the delete, and
give you a chance to cancel.
- <b>VCS Integration:</b> This is the tricky bit. Either VCS commands need
to invoke link-management functions (ideal, because VCS command-menus are
already defined for editors and file browsers), or else the link-management
functions need to invoke VCS commands (or both, so you can do things in
either order). But if it's only possible to do a one-way mapping from
link-management commands to VCS commands, then it's desirable to find a way
to add the link-management commands to the menus in editors and file
browsers that interact with the VCS.
* <b>Migration Tool:</b> A wrapper for the h2d migration tool in the DITA-OT.
The h2d tool works on HTML files that have only level of headings. The wrapper
runs on HTML files that have more deeply nested headings, splitting them into
separate files, running h2d on them, and creating a map that ties them all
back together.
Implementation Note: The migration tool creates generic topics, rather than
specific information types, in order to minimize "information loss". (While
it's true that h2d saves all file content, it puts things that are illegal
for a given information type into regions designated as "to be fixed". The
DITA-OT gives no warnings when such regions are processed, so manual inspection
is needed in any case. The conversion to generic topics is the most robust,
so it is the most likely way to ensure that the original input remains intact,
and that it will appear when documents are processed.
* <b>Conversion Tool(s):</b> To assist in the conversion of generic topics to
information types. The ideal tool will provide a checklist of files, let
the user select files to convert, specify the information type to convert
them to (concept, task, reference, or some specialization), and provide
warnings if file won't convert cleanly. (Ideally, such a tool would be built
into an editor-environment, so an error would link to a file location, and
corrections could be made interactively.)
* <b>RuDI::XiPi</b> ("Zippy") An XML Integrated PIpeline processor, like the XML
XProc standard, but written in Ruby, so that you're not limited to XSL
transforms. A production chain might consist of an XML_Transform to convert
standard DITA output into the form you want to use, a Purple_Link generator
to add self-referencing section-links to the document, and a simple one-line
transform like this one:
gsub!(%r(href="local_root_path), 'href="..."')
A substitution like that replaces references to other documents with relative
links or http URLS, as appropriate. (Such substitutions can of course be done
using XSL, or entity references defined in a DTD, or with the "keyref"
capability coming in DITA 1.2. But a simple global substitution is pretty darn
easy, and it doesn't require prior planning.)
* <b>RuDI::Rback:</b> A DSL-driven backup tool (like Red Aurbach!)
* <b>RuDI::Drake:</b> A DITA-aware, dependency-driven build utility based on Rake.
Rather than specifying a "task", you specify a DITA task ("dtask"). The only
prerequisites you need to specify are a map and, optionally, the ditaval file
that contains the processing arguments. The DTask establishes the dependencies
automatically by examining the map, finding the topics, and examining the
topics to find references to images and conref files. For more, see:
http://blogs.sun.com/coolstuff/resource/DITA_Builds.html
== Change History
* Version 9.2 (in progress)
** Added html2man processing utility
** Reversed the order of XML-builder arguments to eliminate "syntax noise", in
the form of extra braces and parens (that are needed when a data value follows
an attribute list, rather than preceeding it.
* Version 9.1
** Initial release. Basic functionality implemented and tested.
** XML builder, XML transform engine, and HTML template-merge program.
== Resources
For detailed discussions of web-page semantics and the production process, see:
* __blog location TBD__
For some of the early thinking behind this project, see:
* Domain Specific "PowerTool" Languages Promote Elegance
http://blogs.sun.com/coolstuff/entry/domain_specific_languages
* DITA Production Maps -- A Proposal
http://blogs.sun.com/coolstuff/entry/dita_production_maps
* Doing DITA Builds Better
http://blogs.sun.com/coolstuff/resource/DITA_Builds.html
== ToDo
* Create a sample project that can be used as a production-template
o .env file (Yaml) for location of dita-ot and RuDI processing engine
o Directory structure (Templates, ditasrc, ditaout, css, js, build, webout),
where build/ contains the processing scripts, ditaout/ gets files produced
by DITA OT, and webout/ gets the final results.
o Sample templates for content & tab pages, DITA src and output files
o Rake build script
Dir.foreach("/some/dir") do |entry|
...
recurse
end
* Release as version 1.0
* Post the series of introductory blogs
* Add references to them here
* Build a gem and post it to RubyGems
* Add instructions for installing the gem
== Utility Summaries
Major features for each utility are listed, along with
any remaining items in their respective implementation checklists.
=== RuDI::XML_Builder
Features
* Wrap text
* Normalize whitespace (tabs & NLs)
* Normalize NLs
* Normalize tabs
Features
* Generate readable, well-formatted XML like this:
<html>
<body/>
</html>
* Code it in a readable way (w/minimal syntax)
like this:
html {
body {
...
}
}
* Generate tags with names that correspond to existing methods.
Precede the tag with an underscore. E.g. when attempting to generate
a <p> tag, there is a problem: "p" is the name of a Ruby debugging method.
It takes an argument, and tries to puts the results of invoking the inspect()
method on that argument. Solution: Code _p { ... }. The first leading
underscore is stripped from any tag that has one, so <p> is generated.
* Modify indentation amount or turn it off entirely to produce this:
<html><body/></html>
* Allow the insertion of arbitrary XML-generating expressions, like this:
def generate_some_XML
...
end
html {
body {
generate_some_XML
}
}
* Support XML "mixed text" model (interrupted text) like this:
<p>Some <b>test</b> text</p>
* Support for XML declarations and processing instructions
* Support for DOCTYPE declarations and comments
* Support for unformatted <pre> and CDATA text
* Word wrap text and tags
* Generate tags with namespaces (e.g. <xsl:foo>)
Usage
ns(:xsl) # Define the "xsl" namespace
xsl.foo(args) { ... } # Generate <xsl:foo ...> ... </xsl:foo>
Yet To Do
* No unimplemented features
=== RuDI::XML_Transform
Features
* Dynamically remove and replace element transforms
* Copy transforms
* Apply the identity transform, by default
* Dynamically change the behavior of the default transform & other element
transforms (Copy a transform under a new name and restore it later.)
* Define new transforms in a readable way, like this:
tx = RuDI::XML_Transform.new do
xfm :t1 do |node|
...operate on node to extract data...
div div({:class => "name"}) do # start a div, passing an attribute
ul { # start a list
li { # create first list entry
text! "say something" # add text (could contain XML tags)
xfm_node(node) # recurse on node's children
} # close the <li> tag
} # etc.
end
end
end
* Normalize whitespace by default
* Modify indentation or remove it
tx.indent = " " # 3-space indent
tx.indent = nil # No indentation and no NL's
* Remove incoming whitespace by default, preserve it if requested
tx.preserve_ws = true # Preserve existing whitespace
# (turns off auto indent & NLs)
* Call local function in a transform, for specialized processing, like this:
def generate_stuff(node)
...get stuff from under the node or from elsewhere...
end
tx = RuDI::XML_Transform.new do
xfm :t1 do |node|
_p { # create a paragraph
generate_stuff(node) # Do arbitrary processing
}
end
end
* Make it easy to extract and modify attributes, like this:
tx = RuDI::XML_Transform.new do
xfm :t1 do |node|
div
end
end
* Handle comments
* Handle CDATA
* Handle processing instructions
* Transform a file, an XML string or a REXML node
* Return a string or a REXML node
* Write to a file
* Convert XML CDATA to HTML <pre> (configure or override existing transform)
Yet ToDo
* No unimplemented features
=== RuDI::XML_Text
Features
* Normalize whitespace
* Word-wrap text for table cells (cell_wrap)
* Word-wrap and indent text (wrap)
* Word-wrap and indent tags with attributes (tag_wrap)
* Reuse single instance for efficiency (set value and call wrapping function)
Yet To Do
* No unimplemented features
=== RuDI::HTML_Template
Features
* Read a DreamWeaver-compatible HTML template
* Generate a file that has the declarations needed to "attach" the file to that
template in a DreamWeaver site, so that changes made to the template (in DW)
can be automatically applied to all dependent files.
* Initialize some elements one time, but reuse the template for multiple outputs.
* Complain if there are regions that haven't been filled in
* Ability to force output, even if empty regions exist
* Ability to use content given in the template
(Could default to that, but then there would be no way to check that
all regions were filled in before generating output.)
Yet ToDo
* No unimplemented features
=== Man Page Processing
Features
* Html2Man processor that converts HTML pages to (nroff) man pages, bypassing
known issues with DITA OT processing, as described at
http://kenai.com/projects/rudi/pages/ManPageProcessing
* Rake script that manages processing for multiple files
* Convenience scripts (html2man, makeman, solaris_man, linux_man, remake, zap)
Yet To Do
* Canonical directory hierarcy for man pages and sample files.
(How to make same structure work for project and for project/samples?)
Project (defined by NetBeans):
- doc
- lib/man
- lib/rudi
- pkg
- spec
- sample
Sample ("reference implementation"--or template--for a document production
system):
- Templates
- css
- js
- mockup (design)
- src/dita
- out/tmp
- out/man
- out/dita
- out/web
- out/pdf
- out/help
* Make tidy processing optional (off by default--not needed for DITA output)
* Modify man/README.html to be more generic, less JavaSE-specific. Use canonical
structure above.
* Move README.html to sample/README.html?
* Minimize path length in output-directory.
* Decide: Are linux_man and solaris_man both needed? Both sets of Rake tasks?
* Remove non-man-page things from Rakefile
* Remove man-page-post task and support-functions from Rake file
* Modify makepubs to reduce #of tasks it displays.
--------------------------------------
* Create a test suite
* Add windows versions of processing scripts (pubsmake.bat and html2man.bat)
so the troff can be generated there, even if it can't be displayed.
* Implement smart column-weighting heuristics, as described at
http://kenai.com/projects/rudi/pages/ManPageProcessing
=== Sample Files ===
Features
* None implemented, as yet
Yet To Do
* Canonical directory structure & Readme (see man page processing, above)
* Template-based processing
- Use JSP includes for nav pane TOC
* Glassfish-test instructions