rst2html5 Design Notes¶
The following documentation describes the knowledge collected during rst2html5 implementation. Probably, it isn’t complete or even exact, but it might be helpful to other people who want to create another rst converter.
Note
rst2html5 had to be renamed to rst2html5_ due to a conflict with docutils’ rst2html5.
Docutils¶
Docutils is a set of tools for processing plaintext documentation in restructuredText markup (rst) into other formats such as HTML, PDF and Latex. Its documents design issues and implementation details are described at http://docutils.sourceforge.net/docs/peps/pep-0258.html
In the early stages of the translation process, the rst document is analyzed and transformed into an intermediary format called doctree which is then passed to a translator to be transformed into the desired formatted output:
Translator
+-------------------+
| +---------+ |
---> doctree -------->| Writer |-------> output
| +----+----+ |
| | |
| | |
| +------+------+ |
| | NodeVisitor | |
| +-------------+ |
+-------------------+
Doctree¶
The doctree is a hierarchical structure of the elements of a rst document. It is defined at docutils.nodes and is used internally by Docutils components.
The command rst2pseudoxml.py produces a textual representation of a doctree that is very useful to visualize the nesting of the elements of a rst document. This information was of great help for both rst2html5 design and tests.
Given the following rst snippet:
Title
=====
Text and more text
The textual representation produced by rst2pseudoxml.py is:
<document ids="title" names="title" source="snippet.rst" title="Title">
<title>
Title
<paragraph>
Text and more text
Translator, Writer and NodeVisitor¶
A translator is comprised of two parts: a Writer
and a NodeVisitor
.
The Writer
is responsible to prepare
and to coordinate the translation made by the NodeVisitor
.
The NodeVisitor
is used for visiting each doctree node and
it performs all actions needed to translate the node to the desired format
according to its type and content.
Important
To develop a new docutils translator, you need to specialize these two classes.
Note
Those classes correspond to a variation of the Visitor pattern, called “Extrinsic Visitor” that is more commonly used in Python. See The “Visitor Pattern”, Revisited.
+-------------+
| |
| Writer |
| translate |
| |
+------+------+
|
| +---------------------------+
| | |
v v |
+------------+ |
| | |
| Node | |
| walkabout | |
| | |
+--+---+---+-+ |
| | | |
+---------+ | +----------+ |
| | | |
v | v |
+----------------+ | +--------------------+ |
| | | | | |
| NodeVisitor | | | NodeVisitor | |
| dispatch_visit | | | dispatch_departure | |
| | | | | |
+--------+-------+ | +---------+----------+ |
| | | |
| +--------------|---------------+
| |
v v
+-----------------+ +------------------+
| | | |
| NodeVisitor | | NodeVisitor |
| visit_NODE_TYPE | | depart_NODE_TYPE |
| | | |
+-----------------+ +------------------+
During the doctree traversal through docutils.nodes.Node.walkabout()
,
there are two NodeVisitor
dispatch methods called:
dispatch_visit()
and
dispatch_departure()
.
The former is called early in the node visitation.
Then, all children nodes walkabout()
are visited and
lastly the latter dispatch method is called.
Each dispatch method calls another method whose name follows the pattern
visit_NODE_TYPE or depart_NODE_TYPE
such as visit_paragraph or depart_title,
that should be implemented by the NodeVisitor
subclass object.
rst2html5¶
In rst2html5_
,
Writer
and NodeVisitor
are specialized through
HTML5Writer
and HTML5Translator
classes.
rst2html5_.HTML5Translator
is a NodeVisitor
subclass
that implements all visit_NODE_TYPE and depart_NODE_TYPE methods
needed to translate a doctree to its HTML5 content.
The rst2html5_.HTML5Translator
uses
an object of the ElemStack
helper class that controls a context stack
to handle indentation and the nesting of the doctree traversal:
rst2html5_
+-----------------------+
| +-------------+ |
doctree ---|--->| HTML5Writer |----|--> HTML5
| +------+------+ |
| | |
| | |
| +--------+--------+ |
| | HTML5Translator | |
| +--------+--------+ |
| | |
| | |
| +-----+-----+ |
| | ElemStack | |
| +-----------+ |
+-----------------------+
The standard visit_NODE_TYPE action initiates a new node context:
def default_visit(self, node):
'''
Initiate a new context to store inner HTML5 elements.
'''
if 'ids' in node and self.once_attr('expand_id_to_anchor', default=True):
# create an anchor <a id=id></a> on top of the current element
# for each id found.
for id in node['ids'][1:]:
self.context.begin_elem()
self.context.commit_elem(tag.a(id=id))
node.attributes['ids'] = node.attributes['ids'][0:1]
self.context.begin_elem()
return
The standard depart_NODE_TYPE action creates the HTML5 element according to the saved context:
def default_departure(self, node):
'''
Create the node's corresponding HTML5 element and combine it with its
stored context.
'''
tag_name, indent, attributes = self.parse(node)
elem = getattr(tag, tag_name)(**attributes)
self.context.commit_elem(elem, indent)
return
Not all rst elements follow this procedure. The Text element, for example, is a leaf-node and thus doesn’t need a specific context. Other elements have a common processing and can share the same visit_ and/or depart_ method. To take advantage of theses similarities, the rst_terms dict maps a node type to a visit_ and depart_ methods:
rst_terms = {
# 'term': ('tag', 'visit_func', 'depart_func', use_term_in_class,
# indent_elem)
# use_term_in_class and indent_elem are optionals.
# If not given, the default is False, True
'Text': (None, 'visit_Text', None),
'abbreviation': ('abbr', dv, dp),
'acronym': (None, dv, dp),
'address': (None, 'visit_address', None),
'admonition': ('aside', 'visit_aside', 'depart_aside', True),
'attention': ('aside', 'visit_aside', 'depart_aside', True),
'attribution': ('p', dv, dp, True),
'author': (None, 'visit_bibliographic_field', None),
'authors': (None, 'visit_authors', None),
'block_quote': ('blockquote', 'visit_blockquote', dp),
'bullet_list': ('ul', dv, dp, False),
'caption': ('figcaption', dv, dp, False),
'caution': ('aside', 'visit_aside', 'depart_aside', True),
'citation': (None, 'visit_citation', 'depart_citation', True),
'citation_reference': ('a', 'visit_citation_reference',
'depart_reference', True, False),
'classifier': (None, 'visit_classifier', None),
'colspec': (None, pass_, 'depart_colspec'),
'comment': (None, 'visit_comment', None),
'compound': ('div', dv, dp),
'contact': (None, 'visit_bibliographic_field', None),
'container': ('div', dv, dp),
'copyright': (None, 'visit_bibliographic_field', None),
'danger': ('aside', 'visit_aside', 'depart_aside', True),
'date': (None, 'visit_bibliographic_field', None),
'decoration': (None, 'do_nothing', None),
'definition': ('dd', dv, dp),
'definition_list': ('dl', dv, dp),
'definition_list_item': (None, 'do_nothing', None),
'description': ('td', dv, dp),
'docinfo': (None, 'do_nothing', None),
'doctest_block': ('pre', 'visit_literal_block', 'depart_literal_block', True),
'document': (None, 'visit_document', 'depart_document'),
'emphasis': ('em', dv, dp, False, False),
'entry': (None, dv, 'depart_entry'),
'enumerated_list': ('ol', dv, 'depart_enumerated_list'),
'error': ('aside', 'visit_aside', 'depart_aside', True),
'field': (None, 'visit_field', None),
'field_body': (None, 'do_nothing', None),
'field_list': (None, 'do_nothing', None),
'field_name': (None, 'do_nothing', None),
'figure': (None, 'visit_figure', dp),
'footer': (None, dv, dp),
'footnote': (None, 'visit_citation', 'depart_citation', True),
'footnote_reference': ('a', 'visit_citation_reference', 'depart_reference', True, False),
'generated': (None, 'do_nothing', None),
'header': (None, dv, dp),
'hint': ('aside', 'visit_aside', 'depart_aside', True),
'image': ('img', dv, dp),
'important': ('aside', 'visit_aside', 'depart_aside', True),
'inline': ('span', dv, dp, False, False),
'label': ('th', 'visit_reference', 'depart_label'),
'legend': ('div', dv, dp, True),
'line': (None, 'visit_line', None),
'line_block': ('pre', 'visit_line_block', 'depart_line_block', True),
'list_item': ('li', dv, dp),
'literal': ('code', 'visit_literal', 'depart_literal', False, False),
'literal_block': ('pre', 'visit_literal_block', 'depart_literal_block'),
'math': (None, 'visit_math_block', None),
'math_block': (None, 'visit_math_block', None),
'meta': (None, 'visit_meta', None),
'note': ('aside', 'visit_aside', 'depart_aside', True),
'option': ('kbd', 'visit_option', dp, False, False),
'option_argument': ('var', 'visit_option_argument', dp, False, False),
'option_group': ('td', 'visit_option_group', 'depart_option_group'),
'option_list': (None, 'visit_option_list', 'depart_option_list', True),
'option_list_item': ('tr', dv, dp),
'option_string': (None, 'do_nothing', None),
'organization': (None, 'visit_bibliographic_field', None),
'paragraph': ('p', 'visit_paragraph', dp),
'pending': (None, dv, dp),
'problematic': ('a', 'visit_problematic', 'depart_reference', True, False),
'raw': (None, 'visit_raw', None),
'reference': ('a', 'visit_reference', 'depart_reference', False, False),
'revision': (None, 'visit_bibliographic_field', None),
'row': ('tr', 'visit_row', 'depart_row'),
'rubric': ('p', dv, 'depart_rubric', True),
'section': ('section', 'visit_section', 'depart_section'),
'sidebar': ('aside', 'visit_aside', 'depart_aside', True),
'status': (None, 'visit_bibliographic_field', None),
'strong': (None, dv, dp, False, False),
'subscript': ('sub', dv, dp, False, False),
'substitution_definition': (None, 'skip_node', None),
'substitution_reference': (None, 'skip_node', None),
'subtitle': (None, 'visit_target', 'depart_subtitle'),
'superscript': ('sup', dv, dp, False, False),
'system_message': ('div', 'visit_system_message', dp),
'table': (None, 'visit_table', 'depart_table'),
'target': ('a', 'visit_target', 'depart_reference', False, False),
'tbody': (None, dv, dp),
'term': ('dt', dv, dp),
'tgroup': (None, 'do_nothing', None),
'thead': (None, 'visit_thead', 'depart_thead'),
'tip': ('aside', 'visit_aside', 'depart_aside', True),
'title': (None, dv, 'depart_title'),
'title_reference': ('cite', dv, dp, False, False),
'topic': ('aside', 'visit_aside', 'depart_aside', True),
'transition': ('hr', dv, dp),
'version': (None, 'visit_bibliographic_field', None),
'warning': ('aside', 'visit_aside', 'depart_aside', True),
}
HTML5 Tag Construction¶
HTML5 Tags are constructed by the genshi.builder.tag
object.
ElemStack¶
For the previous doctree example, the sequence of visit_… and depart_… calls is this:
1. visit_document
2. visit_title
3. visit_Text
4. depart_Text
5. depart_title
6. visit_paragraph
7. visit_Text
8. depart_Text
9. depart_paragraph
10. depart_document
For this sequence, the behavior of a ElemStack context object is:
Initial State. The context stack is empty:
context = []
visit_document. A new context for document is reserved:
context = [ [] ] \ document context
visit_title. A new context for title is pushed into the context stack:
title context / context = [ [], [] ] \ document context
3. visit_Text. A Text node doesn’t need a new context because it is a leaf-node. Its text is simply added to the context of its parent node:
title
context
/
context = [ [], ['Title'] ]
\
document
context
depart_Text. No action performed. The context stack remains the same.
depart_title. This is the end of the title processing. The title context is popped from the context stack to form an h1 tag that is then inserted into the context of the title parent node (document context):
context = [ [tag.h1('Title')] ] \ document context
visit_paragraph. A new context is added:
paragraph context / context = [ [tag.h1('Title')], [] ] \ document context
visit_Text. Again, the text is inserted into its parent’s node context:
paragraph context / context = [ [tag.h1('Title')], ['Text and more text'] ] \ document context
depart_Text. No action performed.
depart_paragraph. Follows the standard procedure where the current context is popped and form a new tag that is appended into the context of the parent node:
context = [ [tag.h1('Title'), tag.p('Text and more text')] ] \ document context
depart_document. The document node doesn’t have an HTML tag. Its context is simply combined to the outer context to form the body of the HTML5 document:
context = [tag.h1('Title'), tag.p('Text and more text')]
rst2html5 Tests¶
The tests executed in rst2html5_.tests.test_html5writer
are bases on generators.
The test cases are located at tests/cases.py
and
each test case is a dictionary whose main keys are:
rst: | text snippet in rst format |
---|---|
out: | expected output |
part: | specifies which part of rst2html5_ output will be compared to out. Possible values are head, body or whole. |
Other possible keys are rst2html5_ configuration settings such as indent_output, script, script-defer, html-tag-attr or stylesheet.
When a test fails,
three auxiliary files are created on the temporary directory (/tmp
):
TEST_CASE_NAME.rst
contains the rst snippet of the test case.;TEST_CASE_NAME.result
contais the result produced by rst2html5_ andTEST_CASE_NAME.expected
contains the expected result.
Their differences can be easily visualized by a diff tool:
$ kdiff3 /tmp/TEST_CASE_NAME.result /tmp/TEST_CASE_NAME.expected