Source code file content
subversion / lib / rudi / xml_builder.rb
Size: 26661 bytes, 1 line
require 'rudi/xml_text'
module RuDI
# = XML_Builder: Generate Readable XML in a Readable Way
#
# A readable way to generate XML, using nested method calls to govern
# indentation, the opening tag, and the closing tag.
#
# The mechanism was described here: http://beust.com/weblog/archives/000025.html
# Found some code here: http://snippets.dzone.com/posts/show/2528
#
# That code was a good basis, because it provided the method_missing function,
# along with the critical notion of using instance_eval instead of yield to
# process a block. (See method_missing for a deeper explanation.)
#
# It also provided the necessary structural foundation (create list of triples,
# something like RDF, to encode the structure, and then process it to generate
# output.) So it was a terrific starting point. But quite a few modifications
# were needed to make it work well. (It took a good deal of pride in being fast,
# but much of its performance profile derived from the minimal implementation.)
#
# Update (Mar 2009)
# * Discovered that Ruby automatically combines trailing hash elements in a
# parameter list into a single hash table passed as an argument. So it
# turns out that if you pass the data value first (instead of last, as I
# had originally implemented it) then braces aren't needed to define the
# hash table--and, since the braces aren't needed, parenthesis are no longer
# needed around the arguments. That simplifies the usage syntax tremendously,
# so I changed the order of parameters. (The old pattern followed the
# order of HTML but had a lot of syntax noise:
# <a href="foo">bar</a> --> @xml.a({:href=>"foo","bar")
# The new pattern reverses the order, but is much cleaner:
# <a href="foo">bar</a> --> @xml.a "bar", :href=>"foo", :id="bax",...
#
# Modifications (Jan 2009):
# * RSpec tests added to demonstrate behavior and test new features.
# * Comments added to explain how things work.
# * Ability to create a new instance without providing a block
# (needed for RSpec testing)
# * Provide a text! method to include text in the output
# Note: It allows for interrupted text: <p>Some <b>test</b> text</p>
# produced with code like this:
# text! "some "; b { text! "test" }; text! " text"
# * Bug fix: Allow an element to be invoked without a block
# * Bug fix: The instruct! method only generated XML declarations.
# Modified it to allow generation of true processing instructions.
# * Singleton tag constructed if a block isn't supplied (<foo/>)
# * Special case: _p for "p" (a Ruby debug method)
# (Removed any initial underscore, in case other tag names have problems.)
# * Add indentations and newlines
# - Modify the @indent string to change the indentation amount
# - Default is a two-space string (" ")
# - Set to nil to turn off indentation
# * Modify nesting for @inline elements
# (Change the @inline list to control which elements are affected)
# - Don't nest @inline tags any deeper
# - Only start a new line after text has been output
# * Allow for unformatted output with pre! and cdata!
# * doctype! directive to add a declaration like this:
# <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
# "http://www.w3.org/TR/html4/loose.dtd">
# or, by default, the more common:
# <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
# * Tags that go beyond 80 chars, are wrapped
# (only wrap between attributes, never within them)
# * Text is word wrapped to 80 characters, by default, with a 2-space indent
# - can change indentation amount or turn it off entirely
# - PRE & CDATA text is not wrapped
# * Handles namespaces
# * Converts XML CDATA to HTML <pre>, when requested
# * TODO: Rename variables @xml, @indent, etc, to prevent overload when
# the module is mixed in elsewhere. (Consider conveting it back to
# a class, and having Transform objects delegate to it. That may be
# probably a cleaner design.)
#
# Usage:
# class EnhancedBuilder
# include XML_Builder
# def configure
# # Specify class-level options here
# @indent = nil
# end
#
# # Additional build methods here
# def add_note
# blockquote do
# text! "<b>Note:</b><br>"
# end
# end
# end
#
# # Create an instance of the builder
# @xml = EnhancedBuilder.new
# @xml.indent = nil # Do instance-level configuration here
# @xml = MyXML_Builder.new do
# html {
# head { # <head>
# title "Page Title" # <title>Page Title</title>
# } # </head>
# body {
# img(:src => "path") # <img src="path"/>
# _p {
# text! "Some "; # <p>Some <b>bold</b> text</p>
# b { text! "bold" }
# text! " text!"
# }
# a("link", id=>"foo", :href=>"bar") # <a id="foo" href="bar">link</a>
# div(:class => "name") do # <div class="name"> ...
# ...
# end # Use {...} or do...end
# }
# }
# end
#
# Notes:
# * It turns out that attributes can be passed as a hash: {:arg1 => "value1", ...}
# or as a nested list: [["arg1", "value1"], ...]
#
# * The hash syntax is generally cleaner, and prefered for readability, even
# though parentheses are required, in some cases.
#
# * XMLBuilder started out as a class, but as a class, there were
# two conflicting goals when evaluating a block like:
# @xml.html do
# body { # Goal #1
# do_some_stuff() # Goal #2
# }
# end
# Goal #1: Evaluate the method name in the XML_Builder context.
# That way, you can write "body", which is cleaner, instead of
# having to write @xml.body. (Other systems do that, though,
# and may turn out to be the best idea in the long run. But
# there's a lot to be said for readability of the first form.
# Goal #2: Evaluate the method name in the context of the calling object.
# That's the only way to take advantage of Ruby's power to do
# "interesting stuff" in the calling class.
# It seemed that thhe best way to achieve both goals was to have a single
# context. Implementing XML_Builder as a module made that possible.
#
# * An alternative implementation is suggested by the Jim Weirich's notes on the
# builder gem: http://onestepback.org/index.cgi/Tech/Ruby/BuilderObjects.rdoc
# That implementation sets an @self variable, so that both contexts can be
# accessed--so calling into the other context means you code @self.foo, rather
# than foo(). That's a dual-edged sword though. On the one hand, it's always
# clear where you expect to find the method you're looking for--and you don't
# have to worry about namespace collisions. On the other hand, that syntax
# takes away from the simplicity of the XSL-style templates we want to create
# in XML_Transform objects. So at the moment, I've opted for the mix-in
# implementation.
#
# * Another alternative is the standard Ruby builder. Here are some links on
# that talk about it:
#
# - http://www.xml.com/pub/a/2006/01/04/creating-xml-with-ruby-and-builder.html
# - http://ruby.about.com/od/gems/a/builder.htm
# - http://builder.rubyforge.org/classes/Builder.html
#
# Those are good pages, and worth looking at. (In particular, there is a
# CSS-generator that is worth borrowing.) But the one thing I don't like
# about the standard builder is that you always have to specify it when
# generating a tag. So you code xm.p{} and xm.ul(), for example, instead
# of simply coding p{}, and ul{}. That consideration was important for
# the readability of XML transforms. I also added word wrapping, singleton-tag
# detection, and several other features. (So yes, I'm pretty proud of it.)
#--
# Author: Eric Armstrong
#
module XML_Builder
# Blank slate? (This was original code from when the builder was a class.
# Doesn't seem to be needed for a module. Not clear it did much for the class,
# although it didn't do any harm.
#instance_methods.each { |m| undef_method m unless (m =~ /^__|instance_eval$/)}
# Show results:
# instance_methods.each { |m| puts m.to_s }
# Here's what was left when this module was a class:
# __send__
# __id__
# instance_eval
#####################################################################
# To be visible, all methods and accessors in the class have to
# be defined /after/ instance methods are erased.
#####################################################################
# Planning on an 80-col page for now. May want to make it an accessor
# in future.
LINE_WIDTH = 80
# Amount to indent results. Set to nil to turn off formatting.
# Defaults to a two-space string (" ")
attr_accessor :indent
# List of symbols for @inline tags (not indented, don't start on a new line)
# Defaults to [:b, :i, :u, :tt, :strong, :em]
attr_accessor :inline
# Each item in @doc is an array containing three values:
# type -- :open/:close/:singleton (tag), :text, :instruct -- part[0]
# value -- tag name -- part[1]
# attributes -- a list of name/value pairs: [n1, v1, n2, v2,...] -- part[2]
# Make it a reader so we can peer inside for debugging
attr_reader :doc
# Last (open or closing) tag we processed.
# Participates in decision to return a NL.
@last_tag
# Type of last tag we processed (:open or :close).
# Participates in decision to return a NL.
@last_tag_type
# The RuDI XML_Text object that does line wrapping
@text
# Stores the index of the next character position available on the
# line. Updated when a tag is generated, used when text is generated
# using RuDI::indent_wrap. (Inspectable, for testing)
@line_index
# A pointer to the instance that includes the XML_Builder module.
# Needed by the Namespace_Handler class to invoke XML_Builder methods
# on that instance.
@curr_instance
def initialize(&block)
@indent = " " # Two space indentation, by default
@inline = [:b, :i, :u, :tt, :strong, :em]
@level = 0
@last_tag = ""
@last_type = ""
@doc = []
@text = RuDI::XML_Text.new
@line_index = 0
@curr_instance = self
# Delegate back to the wrapper class for further initialization.
# Used to configure subclass instances.
#
# Originally motivated by the Transform class, which needs to define default
# element-transforms when the class is initialized. (But initialize()
# is defined in this module, so it had now way to do so, unless this
# is a class that Transform delegates to--an alternative implementation
# possiblity that deserves further exploration.)
#
configure(&block) if defined? configure # Class-level options, if fcn defined
# This is included for transforms. It's how element transforms get added
# to a (document) Transform instance.
#
# Note: Code was: instance_eval(&block) if block_given?
# instance_eval was necessary when this was a class, because method
# existence is determined by this context, while variable values
# are determined by the calling context. So variable substitutions happen
# appropriately, but "@xml." wasn't needed on every function call.
# (But yield works just as well, now that XML_Builder is a mixin module.)
#
#yield if block_given?
end
# A method is expected to provide either a block or a value, but not both.
# (If a value is provided, it comes after the attribute list.)
# The attribute list consists of a list of pairs, best specified in a hash.
# Ex: @xml.A( {:href => "bar", ...} , "foo")
# Ex: @xml.A( {:href => "bar", ...} ) { ...code block... }
def method_missing(tag, *args, &block)
generate_tag(tag, *args, &block)
end
# Called by #method_missing to generate a tag element that corresponds to
# the method name.
#
# Note that "p" is a special case. It's a Ruby instruction defined in the
# Kernel that prints the value returned by an objects inspect() method.
# (See http://www.ruby-doc.org/docs/UsersGuide/rg/accessors.html)
#
# Similarly, "y" is a method that shows the YAML representation of an object.
#
# To solve such problems, an initial underscore is removed from the method
# name, if present.
#
# So:
# * To get <p> or <y>, code _p or _y
# * To get an initial underscore (if you happen to need one), code two.
def generate_tag(tag, *args, &block)
# Special case to create an element whose name is specified in a variable.
# (Needed for the identity transform in XML_Transform. Possibly useful for
# other purposes, as well.)
# Ex: If the variable foo holds the value "bar", then
# @xml.foo # => <foo>
# @xml.send(foo) # => <bar>
# Notes:
# * Not sure why, but "send" comes into method_missing. Trying to implement
# that method directly runs into an error when the method tests for
# respond_to?, because that also goes to method_missing. So I put the
# special case here, until I find out why standard methods aren't seen.
if tag == :send
name = args[0] # desired tag name
args.shift # remove first arg
return method_missing(name.to_sym, args, &block)
# Note that we convert the block back to a proc, so method_missing
# gets what it expects. (&block, instead of block)
end
# Remove initial underscore
tag_string = tag.to_s
if tag_string[0,1] == "_" # Test first character
tag_string[0,1] = "" # Strip it
end
tag_string.downcase! # for output consistency
tag = tag_string.to_sym
# 6 possible inputs, from 2 optional args and an optional block
# Format: [value],[attribute_hash],[block]
# value = block_given? ? nil : args.pop
value = args.pop
if value.is_a?(Hash) || value.is_a?(Array)
# Pop takes values from the right. If the "value" is a list or hash, it
# must be an attribute list.
#
# Notes:
# * A Ruby "array" is really a list
# * Nested lists can be passed: [["arg1","value1"], ...]
# or a hash table: { :arg1 => "value1", ... }
# * In general, the hash table syntax is much cleaner.
attributes = value
value = args.pop
end # if
if value
# Tag with a single data value: <foo>value</foo>
@doc << [:open, tag, attributes] << [:text, value] << [:close, tag]
else
if block_given?
@doc << [:open, tag, attributes]
# Note: Using instance_eval instead of yield means that the block
# is evaluated in the current context, rather than the context it
# came from. The big benefit: You get to code "a{}" instead of "xml.a{}".
# However, while method evaluation takes place in this context,
# string interpolations and other variable references take place in
# the calling context. So "li listItem" works as desired, producing
# an <li> tag that contans the contents of listItem. # (There may be
# no difference now that XML_Builder is a mixin module, but it was
# critical when it was a class. Leaving as-is, because it works.)
instance_eval(&block)
@doc << [:close, tag]
else
# No block and no data value. Generate a singleton tag: <foo/>
@doc << [:singleton, tag, attributes]
end
end
end # method_missing
# Set the line width to control wrapping.
def line_width= v
@text.line_width = v
end
# Add text to the document
def text!(text)
@doc << [:text, text]
end
alias_method :TEXT!, :text! # Uppercase alias for calling-code readability
# Add <pre>-formatted text to the document
def pre!(text)
@doc << [:open, "pre"] << [:verbatim, text] << [:close_pre, "pre"]
end
alias_method :PRE!, :pre! # Uppercase alias for calling-code readability
# Generate a CDATA section (unindented):
# <![CDATA[
# This is
# CDATA text.
# ]]>
def cdata!(text)
@doc << [:open_cdata] << [:verbatim, text] << [:close_cdata]
end
alias_method :CDATA!, :cdata! # Uppercase alias for calling-code readability
# This method generates a declaration by default, as though it were a
# "processing instruction". Of course, a declaration isn't a PI, but
# since there is no difference in syntax, it's only programs that process
# PIs that care about the difference. (Parsers quite rightly do not return
# declarations as PIs.)
def instruct!(target = :xml, atts = nil)
@doc << [:instruct, target, atts || {:version => "1.0", :encoding => "UTF-8"}]
end
# Add a doctype declaration. Default to HTML transitional:
# <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
# Args are a list, where the first two are labels. The remainder are strings.
def doctype!(args = [:HTML, :PUBLIC, "-//W3C//DTD HTML 4.01 Transitional//EN"])
@doc << [:doctype, args]
end
# Add a comment
def comment!(text)
@doc << [:comment, text]
end
# Return a string containing generated XML
def to_s
# Make multiple printings efficient.
return @__to_s if @__to_s != nil
# Original code sets a variable, but only if it isn't already defined.
# @__to_s ||= @doc.map{|i| _fmt_part i}.join
# The variable was cleared in method_missing. Not clear why it did that.
result = @doc.map{|e| _fmt_part e}.join
if (result != "" && indent != nil && result[0] == "\n")
# Remove initial NL from string and tack one on at the end
result[0,1] = ""
result += "\n"
end
return @__to_s = result
end
alias_method :to_xml, :to_s
alias_method :inspect, :to_s
# Return a REXML element node that contains the generated XML structure.
def to_node
# This implementation is slow, but it isn't expected to be used a lot
# A much faster implementation arises if builder is converted to use
# REXML as a base, instead of the list of ordered triples.
# (But that was a nice easy structure to extend and modify, making it
# easier to develop the usage syntax--the really important point of
# this exercise
require 'rexml/document'
doc = REXML::Document.new self.to_s
return doc.root
end
# Write to a file.
def to_file filename
f = File.new(filename, "w")
f.write(self.to_s)
f.close
end
def _fmt_part(part) # :NODOC:
type = part[0]
case type
when :instruct
# The "tag" (part[1]) is the processing-target name in the processing
# instruction. ("xml" and all case variants is reserved for the declaratn.)
result = _fmt_ws(:instruct, type) + "<?#{part[1]} #{_fmt_atts(part[2])}?>"
@line_index = 0
result
when :doctype
# The "tag" (part[1]) is the processing-target name in the processing
# instruction. ("xml" and all case variants is reserved for the declaration.)
values = part[1]
label1 = values[0]; values.shift
label2 = values[0]; values.shift
strings = values.map {|v| "\"#{v}\""}.join(" ") # Enclose each in quotes
result = "<!DOCTYPE #{label1} #{label2} #{strings}>"
@line_index = 0
return result
when :open
tag = part[1]
result = _fmt_ws(tag, type) + _fmt_tag(part) # "<...>"
@level += 1 if ! @inline.member?(part[1]) # Subsequent structs indent
return result
when :close
tag = part[1]
@level -= 1 if ! @inline.member?(part[1]) # Outdent for the close tag
result = _fmt_ws(tag, type) + "</#{part[1]}>"
@line_index = result.length
return result
when :close_pre
# No indentation before </pre>
tag = part[1]
@level -= 1 if ! @inline.member?(part[1]) # Outdent for the close tag
_fmt_ws(tag, type) # Record the tag, in case it's useful
result = "\n</#{part[1]}>" # Start tag on a new line
@line_index = 0
return result
when :text
text = part[1].to_s
# Remove final NL, if present (added back during output if indent != nil)
text.chomp! if @indent != nil
result = _fmt_ws(:text, type) + _fmt_text(text)
return result
when :verbatim
# Remove final NL, if present (added back during output if indent != nil)
result = part[1].to_s
result.chomp! if @indent != nil
return result
when :singleton
tag = part[1]
result = _fmt_ws(tag, type) + _fmt_tag(part, singleton=true) # "<.../>"
return result
when :comment
result = _fmt_ws(:comment, type) + "<!--#{part[1]}-->"
@line_index = 0
return result
when :open_cdata
_fmt_ws(:open_cdata, type) # Log the tag only (TODO: factor out log_tag)
result = "\n<![CDATA[\n"
return result
when :close_cdata # Log the tag
# Important to log the tag, so we don't generate an extra NL before
# the next element. (But we can't simply remove the final NL in this
# return string, in case formatting has been turned off, in general.
_fmt_ws(:close_cdata, type) # Log the tag only (TODO: refactor)
result = "\n]]>\n"
@line_index = 0
return result
else
fail "Unexpected structure element: #{type}"
end
end
def _fmt_text(text) # :NODOC:
return text if @indent == nil or @indent == ""
# Use the internal RuDI::XML_Text instance to wrap the tag.
# (Text continuations indent at the incremented evel set by the open tag.)
@text.value = text
continuation_indent = @indent * @level
if @line_index > LINE_WIDTH
# A tag with a long, unbreakable attribute went past the page boundary
# Start text on a new line
# Note: May fail if first text segment also extends past page boundary.
start_index = @indent.size * @level
@line_index = @text.wrap(start_index, continuation_indent)
return "\n"+continuation_indent+@text.to_s
end
@line_index = @text.wrap(@line_index, continuation_indent)
return @text.to_s
end
def _fmt_tag(part, singleton = false) # :NODOC:
tag = "<#{_tag_plus_atts(part)}" + (singleton ? "/>" : ">")
return tag if @indent == nil or @indent == ""
# Use the internal RuDI::XML_Text instance to wrap the tag
# (Tag continuations indent one level more.)
@text.value = tag
continuation_indent = @indent * (@level + 1)
if @line_index > LINE_WIDTH
# A long string of unbreakable text went past the page boundary.
# Start the tag on a new line.
# Note: May fail if tag name also extends past page boundary.
start_index = @indent.size * (@level - 1)
@line_index = @text.tag_wrap(start_index, continuation_indent)
return "\n"+continuation_indent+@text.to_s
end
@line_index = @text.tag_wrap(@line_index, continuation_indent)
return @text.to_s
end
def _tag_plus_atts(part) # :NODOC:
result = part[2] ? "#{part[1]} #{_fmt_atts(part[2])}" : "#{part[1]}"
return result
end
def _fmt_atts(atts) # :NODOC:
# Orignal code: to_s isn't needed, attr quotes missing
# "Inspect" seems clever. It allows the second attribute to be an object
# But it's not clear that's useful, so it's removed for now.
#atts.inject([]) {|m, i| m << "#{i[0]}=#{i[1].to_s.inspect}"}.join(' ')
result = atts.inject([]) {|m, i| m << "#{i[0]}=\"#{i[1]}\""}.join(' ')
end
# Original code. Nice and compact. But not good for NL's
#def _fmt_ws
# (indent == nil) ? "" : nl(tag) + (indent * @level)
#end
# Format whitespace. Holds newline & indentation string, if any
# NL before structure tags (tags other than text or @inline tags).
# NL before text after a closing tag, otherwise none.
# NL before @inline tags, unless last tag was an @inline tag
# Note:
# First @inline tag /does/ start on a new line. Otherwise,
# word wrapping becomes impossibly difficult (if and when
# we get to it.)
# where:
# tag = tag name (html, body, etc.)
# type = tag type (:open, :close, etc.)
def _fmt_ws(tag, type) # :NODOC:
return "" if (indent == nil)
indentation = (indent * @level)
last_tag = @last_tag; @last_tag = tag
last_type = @last_tag_type; @last_tag_type = type
ws = "\n" + indentation
if @line_index >= LINE_WIDTH
@line_index = indentation.length
return ws
end
ws = indentation if last_type == :close_cdata # initial NL from _fm_part
if tag == :text
result = last_type == :close ? ws : ""
elsif @inline.member?(tag)
# Pile up sequences of @inline tags like <b><i><a ...
result = last_tag == :text ? ws : ""
else
# Structure tag or processing instruction
result = ! @inline.member?(last_tag) ? ws : ""
end
# If generated ws is "", line_index is unchanged.
# Otherwise, ws is "\n"+indent string. Adjust the line_index, subtracting
# one to correct for the NL.
@line_index = result.length - 1 if result.length != 0
return result
end
# Define an XML namespace.
# Usage:
# ns(:xsl) # => Method named "xsl" that returns an object
# xsl.foo ... # => <xsl:foo> ... </xsl:foo>
def ns(namespace_sym) # :xsl input arg
ns_var_name = "@ns_#{namespace_sym.to_s}" # => @ns_xsl
ns_var_sym = ns_var_name.to_sym # => :@ns_xsl
if ! instance_variable_defined? ns_var_sym
instance_variable_set(
ns_var_sym,
Namespace_Handler.new(namespace_sym, self)
)
end
code = <<-code
def #{namespace_sym}
return #{ns_var_name}
end
code
instance_eval code
end
# An instance of this object is returned by methods defined using #ns.
# Methods can then be invoked on the handler object to create element tags
# in the namespace.
#--
# Implementation Note:
# --Could also be made to work by extending the XML_Builder class and
# invoking super.generate_tag(ns_tagname, ...)
class Namespace_Handler # :NODOC:all
# Keep a pointer to the Builder instance that generated us, so we
# can call its methods.
def initialize _namespace_sym, _defining_instance
@namespace = _namespace_sym
@defining_instance = _defining_instance
end
def method_missing(tag, *args, &block)
ns_tagname = "#{@namespace}:#{tag}"
@defining_instance.generate_tag(ns_tagname, *args, &block)
end
end # Namespace_Handler
end # XML_Builder
end # RuDI module





