======================
rst2html5 Design Notes
======================
The following documentation describes the knowledge collected during **rst2html5** implementation.
Probably, it isn't complete or even exact,
but it might be helpful to other people who want to create another rst converter.
.. note::
**rst2html5** had to be renamed to **rst2html5_**
due to a conflict with **docutils**' **rst2html5**.
Docutils
========
Docutils_ is a set of tools for processing plaintext documentation in restructuredText_ markup (rst)
into other formats such as HTML, PDF and Latex.
Its documents design issues and implementation details are described at
http://docutils.sourceforge.net/docs/peps/pep-0258.html
In the early stages of the translation process,
the rst document is analyzed and transformed into an intermediary format called *doctree*
which is then passed to a translator to be transformed into the desired formatted output::
Translator
+-------------------+
| +---------+ |
---> doctree -------->| Writer |-------> output
| +----+----+ |
| | |
| | |
| +------+------+ |
| | NodeVisitor | |
| +-------------+ |
+-------------------+
Doctree
-------
The doctree_ is a hierarchical structure of the elements of a rst document.
It is defined at **docutils.nodes** and is used internally by Docutils components.
The command :command:`rst2pseudoxml.py` produces a textual representation of a doctree
that is very useful to visualize the nesting of the elements of a rst document.
This information was of great help for both **rst2html5** design and tests.
Given the following rst snippet:
.. code-block:: rst
Title
=====
Text and more text
The textual representation produced by :command:`rst2pseudoxml.py` is:
.. code-block:: xml
Title
Text and more text
Translator, Writer and NodeVisitor
----------------------------------
A translator is comprised of two parts: a |Writer| and a |NodeVisitor|.
The |Writer| is responsible to prepare
and to coordinate the translation made by the |NodeVisitor|.
The |NodeVisitor| is used for visiting each doctree node and
it performs all actions needed to translate the node to the desired format
according to its type and content.
.. important::
To develop a new docutils translator, you need to specialize these two classes.
.. note::
Those classes correspond to a variation of the Visitor pattern,
called "Extrinsic Visitor" that is more commonly used in Python.
See
`The "Visitor Pattern", Revisited `_.
.. seealso::
`Double Dispatch and the "Visitor" Pattern `_.
::
+-------------+
| |
| Writer |
| translate |
| |
+------+------+
|
| +---------------------------+
| | |
v v |
+------------+ |
| | |
| Node | |
| walkabout | |
| | |
+--+---+---+-+ |
| | | |
+---------+ | +----------+ |
| | | |
v | v |
+----------------+ | +--------------------+ |
| | | | | |
| NodeVisitor | | | NodeVisitor | |
| dispatch_visit | | | dispatch_departure | |
| | | | | |
+--------+-------+ | +---------+----------+ |
| | | |
| +--------------|---------------+
| |
v v
+-----------------+ +------------------+
| | | |
| NodeVisitor | | NodeVisitor |
| visit_NODE_TYPE | | depart_NODE_TYPE |
| | | |
+-----------------+ +------------------+
.. http://www.asciiflow.com/#Draw
During the doctree traversal through :func:`docutils.nodes.Node.walkabout`,
there are two |NodeVisitor| dispatch methods called:
:func:`~docutils.nodes.NodeVisitor.dispatch_visit` and
:func:`~docutils.nodes.NodeVisitor.dispatch_departure`.
The former is called early in the node visitation.
Then, all children nodes :func:`~docutils.nodes.Node.walkabout` are visited and
lastly the latter dispatch method is called.
Each dispatch method calls another method whose name follows the pattern
*visit_NODE_TYPE* or *depart_NODE_TYPE*
such as *visit_paragraph* or *depart_title*,
that should be implemented by the |NodeVisitor| subclass object.
rst2html5
=========
In :mod:`rst2html5_`,
|Writer| and |NodeVisitor| are specialized through
:class:`~rst2html5_.HTML5Writer` and :class:`~rst2html5_.HTML5Translator` classes.
:class:`rst2html5_.HTML5Translator` is a |NodeVisitor| subclass
that implements all *visit_NODE_TYPE* and *depart_NODE_TYPE* methods
needed to translate a doctree to its HTML5 content.
The :class:`rst2html5_.HTML5Translator` uses
an object of the :class:`~rst2html5_.ElemStack` helper class that controls a context stack
to handle indentation and the nesting of the doctree traversal::
rst2html5_
+-----------------------+
| +-------------+ |
doctree ---|--->| HTML5Writer |----|--> HTML5
| +------+------+ |
| | |
| | |
| +--------+--------+ |
| | HTML5Translator | |
| +--------+--------+ |
| | |
| | |
| +-----+-----+ |
| | ElemStack | |
| +-----------+ |
+-----------------------+
The standard *visit_NODE_TYPE* action initiates a new node context:
.. literalinclude:: ../src/rst2html5_.py
:pyobject: HTML5Translator.default_visit
:emphasize-lines: 12
The standard *depart_NODE_TYPE* action creates the HTML5 element
according to the saved context:
.. literalinclude:: ../src/rst2html5_.py
:pyobject: HTML5Translator.default_departure
:emphasize-lines: 6-8
Not all rst elements follow this procedure.
The *Text* element, for example, is a leaf-node and thus doesn't need a specific context.
Other elements have a common processing and can share the same *visit_* and/or *depart_* method.
To take advantage of theses similarities,
the *rst_terms* dict maps a node type to a *visit_* and *depart_* methods:
.. literalinclude:: ../src/rst2html5_.py
:pyobject: HTML5Translator
:lines: 3-108
HTML5 Tag Construction
----------------------
HTML5 Tags are constructed by the :class:`genshi.builder.tag` object.
ElemStack
---------
For the previous doctree example,
the sequence of *visit_...* and *depart_...* calls is this::
1. visit_document
2. visit_title
3. visit_Text
4. depart_Text
5. depart_title
6. visit_paragraph
7. visit_Text
8. depart_Text
9. depart_paragraph
10. depart_document
For this sequence,
the behavior of a ElemStack context object is:
0. **Initial State**. The context stack is empty::
context = []
1. **visit_document**. A new context for *document* is reserved::
context = [ [] ]
\
document
context
2. **visit_title**. A new context for *title* is pushed into the context stack::
title
context
/
context = [ [], [] ]
\
document
context
3. **visit_Text**. A *Text* node doesn't need a new context because it is a leaf-node.
Its text is simply added to the context of its parent node::
title
context
/
context = [ [], ['Title'] ]
\
document
context
4. **depart_Text**. No action performed. The context stack remains the same.
5. **depart_title**. This is the end of the title processing.
The title context is popped from the context stack to form an *h1* tag
that is then inserted into the context of the title parent node (*document context*)::
context = [ [tag.h1('Title')] ]
\
document
context
6. **visit_paragraph**. A new context is added::
paragraph
context
/
context = [ [tag.h1('Title')], [] ]
\
document
context
7. **visit_Text**. Again, the text is inserted into its parent's node context::
paragraph
context
/
context = [ [tag.h1('Title')], ['Text and more text'] ]
\
document
context
8. **depart_Text**. No action performed.
9. **depart_paragraph**. Follows the standard procedure
where the current context is popped and form a new tag that is appended into
the context of the parent node::
context = [ [tag.h1('Title'), tag.p('Text and more text')] ]
\
document
context
10. **depart_document**. The document node doesn't have an HTML tag.
Its context is simply combined to the outer context to form the body of the HTML5 document::
context = [tag.h1('Title'), tag.p('Text and more text')]
.. _tests:
rst2html5 Tests
===============
The tests executed in :mod:`rst2html5_.tests.test_html5writer` are bases on `generators
`_.
The test cases are located at :file:`tests/cases.py` and
each test case is a dictionary whose main keys are:
:rst: text snippet in rst format
:out: expected output
:part: specifies which part of **rst2html5_** output will be compared to **out**.
Possible values are **head**, **body** or **whole**.
Other possible keys are **rst2html5_** configuration settings such as
*indent_output*, *script*, *script-defer*, *html-tag-attr* or *stylesheet*.
When a test fails,
three auxiliary files are created on the temporary directory (:file:`/tmp`):
#. :file:`TEST_CASE_NAME.rst` contains the rst snippet of the test case.;
#. :file:`TEST_CASE_NAME.result` contais the result produced by **rst2html5_** and
#. :file:`TEST_CASE_NAME.expected` contains the expected result.
Their differences can be easily visualized by a diff tool::
$ kdiff3 /tmp/TEST_CASE_NAME.result /tmp/TEST_CASE_NAME.expected