McSinyx

Comments for Static Sites without JavaScripts

I'm open for criticism
But really, is it any room for criticism?

Recently, I've switched my feed reader from Newsboat to Liferea. The latter has a GUI and some extra features which make the experience a lot more comfy. For instance, custom enclosure handling lets me to finally migrate all of my YouTube subscriptions to Atom and conveniently browse and watch videos using mpv. Image support also allows me to directly view web comics.[1] One of them, The Monster Under the Bed,[2] does not embed the strips in its feed, but it has comments.

Yes, RSS includes support for <comments>, and I was not aware of it until very recently. I suppose many other people late to the (web feed) party are neither. Since the rise of static sites, feeds have regain popularity, even for Google to reconsider its direction. Compare to RSS or Atom, alternatives have the following shortcomings:

On the other hand, news feeds are commonly read-only: only a few readers can render comments and even fewer are able to post one. On the server side, a dynamic server is needed to accept comments. Traditionally, it's the same as the system serving the website. Although this works, it is significantly more costly than a server dedicated to static sites, which scale a lot better.

Hackers have came up with multiple workarounds such as using microblogging or instant messaging to add comments to their static sites, but all require client-side code execution, which is an option for neither RSS nor Atom. Furthermore, JavaScript hurts portability and performance on the WWW, hence it should be avoided unless it is absolutely impossible to implement a feature otherwise. Commenting is not an exception.

Following is my adventure implementing a comment section for this very blog. If you're also up to the task, I think you should view what I did as an inspiration (rather than a reference) and don't be afraid to experiment around until satisfaction.

  1. Choosing Back-End
  2. Designing Data Flow
  3. Implementation
    1. Accepting Replies
    2. Rendering Comments
    3. Injecting Comments
  4. Moderation

Choosing Back-End

As mentioned earlier, static sites or not, there still needs to be a dynamic component to accept incoming replies. HTTP requests would be the most portable since all netizen obviously have a web browser, but those are what we're trying to replace here. What else does everyone has nowadays? Something so common that it can be used to identify people upon service registrations? Exactly, emails and phone numbers!

OK, Imma stop horsing around. My back-end of choice would be emails. It's global, it's cheap and federated. Cellular services almost fit the bill, except that they would cost an arm and leg for one to comment around the web everyday via SMS, whose character limit is not facilitating thoughtful discussions either. As for forum, social medium or instant messaging, no platform has nearly as large of an user base as electronic mails.

HTML is often a trojan horse for JavaScript

It's not like any email would fit the comment section though. Especially not the HTML kind with a few hundred kilobytes of embedded CSS, JS and non-content images. From the security standpoint alone 'tis already a no-go. A light markup language like Markdown[4] would be much better.

One great thing about using a mature technology like email is that we have all use cases covered. Filtering, exporting and parsing emails work out-of-box regardless of one's provider, MUA and programming preferences. I have an SourceHut account with which I can create mailing lists on-demand so I'm using it; however there's no reason exporting from your private inbox is any more difficult, presuming you have set up offline email.

Tips and tricks

Speaking of SourceHut, exporting a mailing list archive is rather easy, one could either use the button on the web UI or download from the API. As the operation is not exactly cost-free, the former is protected by a CSRF token and the latter by OAuth 2.0. If you are a fellow sr.ht user, you can use acurl on the build service with the URL from the GraphQL query { me { lists { results { name, archive } } } }.

Designing Data Flow

I promise, this sounds bigger than it really is, but first, let's have a glance at how static generators work. Typically, there are three times templating happens:

  1. Conversion of individual articles into HTML content

  2. Inserting each article content in a page template to create a complete HTML document

  3. Inserting multiple HTML contents into one RSS or Atom feed template

At completion, two kinds of output are generated: website and web feed. Similarly, comments have to be rendered for both targets: an HTML comment section for web browsing and a separate RSS feed for each article's <wfw:commentRss>.[5] Therefore, injections should be done separately at stage 2 and 3. The overall process of static site generation with email comments is illustrated as follows.

Data transformation during generation process

For clarity, HTML and RSS input templates for comments and their parent page and web feed are omitted. Path to each comment feed output being injected in the respective web feed item is also not shown in the figure.

Implementation

At the time of writing, this personal website of mine was generated by Julia Franklin, who was neither fast[6] nor semantic, but was the only one I knew supporting LaTeX prerendering out of the box. Franklin is also rather extendable via Julia functions.

Accepting Replies

Let's start with how each article can be programmatically and uniquely identified. By default in RSS, a GUID[7] is the permanent URL of the associated web page. I am not exactly a creative person, so I mirrored this idea, although I only used the difference between URLs, i.e. minus the scheme, network location and trailing index.html (Franklin always appends it to the target path of any source file that is neither index.md nor index.html):

dir_url() = strip(dirname(locvar(:fd_url)), '/')
message_id() = "%3C$(dir_url())@cnx%3E"

For maximum portability, threading identification is used in emails' In-Reply-To header, which expects a message ID, which must match <.+@.+>. Once again, to avoid having to think, I opted for the path difference for the left hand side and my nickname cnx for the right. The mailto URI could be then be constructed accordingly:

using Printf: @sprintf

function hfun_mailto_comment()
  @sprintf("mailto:%s?%s=%s&%s=Re: %s",
           "~cnx/site@lists.sr.ht",
           "In-Reply-To", message_id(),
           "Subject", locvar(:title))
end

The anchor was then added to the page foot:

<a href="{{mailto_comment}}"
   title="Reply via email">{{author}}</a>

Rendering Comments

This is when the fun begins. Julia's standard library does not include an email parser, and I doubt your favorite language does either, unless it is named after a British comedy troupe. Python is often described as batteries included, or at least it used to (seemingly the consensus among current core devs has shifted towards favoring third-party libraries).

Off-topic rambling

Standard library inclusion wasn't really the deal breaker here though. I still needed a Markdown engine and a HTML sanitizer (because Markdown can include HTML), and AFAICT no stdlib has them. The read issue was with the lack of Julia packaging on most distributions (apart from Guix), and most certainly not on NixOS, my current distro. For the same reason the idea of rewriting Franklin in Python has been running in my head for a while now. Python packaging is much more downstream-friendly and unlike Julia compilation overhead is almost non-existent.

On the other hand, it's trivial to pipe an external program's output to Julia, e.g. readchomp(`echo foo bar`) would give you the string "foo bar". Thus, the to-be-written comment generator should take (the path to) a mail box, the message ID of the article and a template, and write the result to stdout. Argument parsing is, again, thankfully in Python's stdlib:

from argparse import ArgumentParser
from pathlib import Path
from urllib.parse import unquote

parser = ArgumentParser()
parser.add_argument('mbox')
parser.add_argument('id', type=unquote)
parser.add_argument('template', type=Path)
args = parser.parse_args()

I then parsed the mbox into a mapping indexed by parent message IDs as follows. They would be HTML-unquoted so that was why I needed to do the same for the input message ID.

from collections import defaultdict
from email.utils import parsedate_to_datetime
from mailbox import mbox

date = lambda m: parsedate_to_datetime(m['Date']).date()
archive = defaultdict(list)
for message in sorted(mbox(args.mbox), key=date):
    archive[message['In-Reply-To']].append(message)

As said earlier, arbitrary HTML content is not exactly suitable for comments. However, it is undeniable that HTML emails have taken over the world and compromises must be made: allowing multipart/alternative of both text/plain and text/html. It is not the only multipart, so are attachments and cryptographic signatures. Since we are only interested in the plaintext part, it is actually easier done than said to extract it:

from bleach import clean, linkify
from markdown import markdown

def get_body(message):
    if message.is_multipart():
        for payload in map(get_body, message.get_payload()):
            if payload is not None: return payload
    elif message.get_content_type() == 'text/plain':
        body = message.get_payload(decode=True)
        return clean(linkify(body, output_format='html5')),
                     tags=..., protocols=...)
    return None

Now all that's left is to render that body and relevant headers as an HTML segment or an RSS item. This is when we revisit the template. Jinja is probably the most popular in Python, thanks to Django and Flask, but its complexity is rather unnecessary. Instead, I went with the built-in str.format.

Double braces are brilliant, but I prefer single ones

What are templates for, exactly? Not the complete document, apparently, because that would differs from article to article and increase the complexity for injection. Neither a single comment, as comments are threaded into trees (or a forest) and their relationship can be useful. We gotta meet in tha middle and use recursive templates instead, e.g. for nested comments:

<div class=comment>
  ...
  {children}
</div>

To render linear comments, such as for <wfw:commentRss>, simply move the children out of the item as follows.

<item>
  ...
</item>
{children}

The rest substitutions are mostly just extracted from the email's headers. Another bit that needs some extra decisions, though, is the parameters for the mailto URI to reply to each comment:

This is getting boring with a lot of trivial code, so I'll leave you with a pointer to the completed script named formbox and move on to more interesting stuff.

Injecting Comments

Inserting HTML comment sections is pretty simple. First I wrote a simple Julia function render_comments calling formbox under the hood, then

hfun_comments_rendered() = render_comments("comment.html")

comments_rendered is then injected below the article. For RSS, it took an extra steps:

  1. Insert render_comments("comment.xml") to the comment feed template comments.xml (notice they are two different templates) and write it next to the article's output index.html

  2. Insert the path of the written comment feed to the <wfw:commentRss> tag in the article's feed item

That's it!

Moderation

I don't want a Terms of Services page, it'd feel too corporate for my personal website, so I will list the rules here:

  1. Please be excellent to each other. Disagreements are okay, personal insults are not.

  2. Stay on topic. If you want to publicly discuss with me about something else, start a new thread on a mailing list or reach me via social media.

  3. Use plaintext emails and do not top post. Markdown inline markups, block quotes, lists and code blocks are supported.

  4. Comments are implied to be under CC BY-SA 4.0 unless declared otherwise.

  5. I reserve the right to remove any comment I don't like. I generally don't delete comments, but if you want to exercise your freedom of speech, publish it yourself.

  6. I do not warrant the availability of the comments either. I will try my best but one day all comments may just disappear, just like this website itself. Archive what you deem important.

  7. These rules are subject to change according to my personal liking without notice.

Replies will only be rendered on the website and feed after I see them, so please expect a delay of at least 24 hours. If you are eager to reply to each other, subscribe to the site's mailing list instead.

[1] TBF there are image preview scripts in Newsboat's contrib.
[2] Content warning: occasionally NSFW
[3] Federation is getting there for social media; not so much for fora.
[4] But don't use text/markdown for your emails.
[5] Unfortunately there's no equivalence for Atom.
[6] Over 30 seconds to generate a few hundred kB of web pages.
[7] Not to be confused with the micro soft hijacked term for UUID.

Tags: fun recipe net Nguyễn Gia Phong, 2022-01-09

Comments

Hmm I often see separate feed for comments on some sites (though I can't find them now), which is quite annoying to me because these comments are separated from the post. I'm not sure if it's just my feed reader not understanding this field, but I guess it's unlikely the case.

Follow the anchor in an author's name to reply

Why not just make a "Reply" link instead? Is it intentional to be unintuitive?

Re: deleting comments, I wonder how would you do that. Do you have to manually goes through the comments in mbox and delete it every time you regenerate the comments?

Anyway, nice work, I think I might look into implementing something similar at some point.

FYI this email sets Reply-To instead of In-Reply-To so it won't show up on the site. I've drafted the reply for this, could you please resend the comment with the corrected header?

FWIW, the workaround works:

neomut "mailto:..."

It is a bit annoying, though, because for some reason Firefox stripped the parameters when I copy it.

Another way that works for neomutt is to use "edit header" (hot key: E). ↑ Note for future me

Ngô Ngọc Đức Huy, 2022-01-11

Hmm I often see separate feed for comments on some sites (though I can't find them now), which is quite annoying to me because these comments are separated from the post. I'm not sure if it's just my feed reader not understanding this field, but I guess it's unlikely the case.

Yea not many readers support wfw:commentRss, which is the only semantic way to deliver comments AFAICT, with WFW pepsi for quite a while. It should not be difficult to add it to your favorite reader though so it might be worth a shot discussing with its maintainers.

Follow the anchor in an author's name to reply

Why not just make a "Reply" link instead? Is it intentional to be unintuitive?

Thank you for the feedback, it was not a concious decision (-;

Re: deleting comments, I wonder how would you do that. Do you have to manually goes through the comments in mbox and delete it every time you regenerate the comments?

lists.sr.ht has moderation features so I'm using them. If one uses a personal mail box, deleting or moving inappropriate comments to a separate folder should do.

Anyway, nice work, I think I might look into implementing something similar at some point.

Thanks d-; I'm looking forward to seeing it!

FWIW, the workaround works:

neomutt "mailto:..."

It is a bit annoying, though, because for some reason Firefox stripped the parameters when I copy it.

Seconded, Firefox does not treat URIs indiscriminately and pretends that mailto is only about the email address )-; Try setting the following script for mailto handling in Firefox:

#!/bin/sh
exec urxvt -e neomutt "$@"

Nguyễn Gia Phong, 2022-01-11

Follow the anchor in an author's name to reply. Please read the rules before commenting.