Files in directory /sxhtml in any check-in
- printer.go
- README.md
- sxhtml.go
- sxhtml_test.go
SxHTML - Generate HTML from S-Expressions
HTML can be represented as a symbolic expression, also called s-expression or sexpr (for short). This is a similar approach compared to SXML, an attempt to encode XML as S-expressions.
For example, the following simple HTML text:
<html>
<head><title>Example</title></head>
<body>
<h1 id="main">Title</h1>
<p>This is some example text.</p>
<hr>
<div class="small" id="footnote">Small text.</div>
</body>
</html>
A s-expression representation could be:
(html
(head (title "Example"))
(body
(h1 (@ (id "main)) "Title")
(p "This is some example text.")
(hr)
(div (@ (class "small") (id "footnote)) "Small text.")
)
)
The s-expression representation has the advantage of easier parsing than the
HTML text. In addition, a s-expression can be easier analysed and possibly
optimized, compared to a string representation. For example, a ((p) (p))
can
be simplified to ((p))
. Similar there are circumstances, where a
(li (p "text))
should be transformed to (li "text")
.
This library allows to generate HTML from s-expressions created by Sx.
Often, HTML is generated by using string template libraries, like Mustache (many programming languages), Jinja (Python), or html/template (Go).
One problem area is to escape certain characters, which have a special
meaning in various parts of the HTML text. Obviously, the less-than character
"<
" signals the beginning of a tag and cannot be used literally in normal
text. It must be replaced by "<
". Now, the ampersand character "&
" has a
special meaning too. It must be replaced with "&
". But this is only true
for ordinary HTML content. Within HTML attributes (for example "href" in "<a
href="...">...</a>
"), other characters must not occur. If you embed JavaScript
in your HTML text, there is another set of rules.
Most string template libraries fail on certain scenarios. Mustache provide replacement characters only for HTML content, but not even for HTML attributes. Similar for Jinja. The html/template library for Go requires the developer to correctly specify the adequate escaping mode.
This is because string template libraries operates just at the string level. All structure of the HTML text is lost.
By using a structured representation of HTML, the HTML generator knows about the specific context and can automatically select the appropriate escape mode.
Language
SxHTML is based on Sx.
SxHTML is relatively lenient about the supported HTML language. However, if in doubt, it is targeted for HTML5. All tag and attribute names must be lowercase symbols. Do not use strings to specify a tag or an attribute. SxHTML does not check, if a symbol specifies a valid HTML tag or attribute. Some tag and attribute symbols have a special meaning.
https://html.spec.whatwg.org/multipage/syntax.html#void-elements specifies the list of void elements that does not have and end tag. All other tags will haven an end tag.
https://html.spec.whatwg.org/multipage/indices.html#attributes-1 associates attribute names with expected content. This will result in an additional escaping mechanism for specific content type. Currently, only URL content is recognized and escaped.
In addition to the list above, the are some heuristics in detecting content type based on the attribute name.
- A prefix of "data-" is stripped. For example,
data-href
is also treated as an URL attribute. - If there is no "data-" prefix, any namespace prefix is stripped. For example,
svg:href
is also treated as an URL attribute, but notsvg:data-href
. - The namespace "xmlns" will always result in treating the attribute as an URL
attribute, e.g.
xmlns:svg
. - If the attribute name contains one of the strings "url", "uri", "src", it will be treated as an URL attribute.
- If the attribute name starts with "on", it will be treated in future versions as JavaScript.
- An attribute name "style" will treat the attribute value as CSS in the future.
SxHTML defines some additional symbols, all starting with "@":
@
specifies the attribute list of an HTML tag. If must follow immediately the tag symbol and contains a list of pairs, where the first component is a symbol and the second component is a string, the nil value, or a number.@C
marks some content that should be written as<![CDATA[...]]>
.@H
specifies some HTML content that must not be escaped. For example,(@H "&")
is transformed to&
, but not&amp;
.@L
contains elements that just just be transformed, without specifying a tag. It is used by generating software that wants to generate HTML for a sequence of elements that do not belong to a certain tag.@@
specifies a HTML comment, e.g.(@@ "comment")
is transformed to<!-- comment -->
.@@@
specifies a multiline HTML comment, e.g.(@@@ "line1" "line2")
is transformed to\n<!--\nline1\nline2\n-->\n
.@@@@
specifies the doctype statement, e.g.(@@@@ (html ...))
is transformed to<!DOCTYPE html>\n<html>...</html>
.
Tags
HTML defines some tags as void elements. A void element has no content,
they have a start tag only. End tags must not be specified, SxHTML will not
generated them. Any content except attributes are ignored. Void elements are:
area
, base
, br
, col
, embed
, hr
, img
, input
, link
, meta
,
source
, track
, and wbr
.
Attributes
Attributes are always in the second position of a list containing a tag
symbol. For example (a (@ (href . "https://t73f.de/r/sxhtml")) "SxHTML)
specifies a link to the page of this library. It will be transformed to
<a href="https://t73f.de/r/sxhtml">SxHTML</a>
.
The syntax for attributes is as follows:
- The first element of the attribute list must be the symbol
@
. - Remaining elements must be lists, where the first element of each list is a symbol, which names the attribute.
- If there is no second element in the list, the attribute is an empty
attribute. For example,
(input (@ (disabled)))
will be transformed to<input disabled>
, - If there is a second element in the list, it must be an atomic value,
preferably a string. For example,
(input (@ (disabled "yes")))
will be transformed to<input disabled="yes">
. - If the lists contains more elements, they are ignored.
- if the list is a pair, the second element of the pair must be an atomic
value, preferably a string. For example,
(input (@ (disabled . "yes")))
will be transformed to<input disabled="yes">
.
Since the attribute list is just a list, there might be duplicate symbols
as attribute names. Only the first occurrence of the symbol will create an
attribute. For example, (input (@ (disabled "no") (disabled . "yes")))
will
be transformed to <input disabled="no">
. This allows to extend the list
of attributes at the front, if you later want to overwrite the value of an
attribute.
If you want to prohibit the generation of some attribute while still extending
the list of attributes at the front, use the nil value () as the value of the
attribute. For example, (input (@ (disabled ()) (disabled . "yes")))
will be
transformed to <input>
.
Content
HTML is not just about tags and attributes. These are needed to structure the content. To specify content, use preferabily strings. Numbers are allowed too, you don't have to convert them into a string. Other Sx types, such as symbols, vectors, and undefined values, are ignored.