Agora: SSG after eleven months' crawling
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad ( void, when )
import Data.Char ( isAlphaNum, toLower )
import Data.List ( sortBy )
import Data.Map.Strict ( Map )
import qualified Data.Map.Strict as Map
import Data.Ord ( Down (..), comparing )
import Data.Text ( Text )
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.Encoding as TLE
import Data.Void ( Void )
import Options.Applicative ( ParserInfo, argument, execParser,
fullDesc, header, help, helper,
info, metavar, str )
import System.Directory ( createDirectoryIfMissing,
doesDirectoryExist, listDirectory )
import System.Exit ( ExitCode (..) )
import System.FilePath ( takeBaseName, takeExtension, (</>) )
import System.Process.Typed ( proc, readProcess )
import qualified Text.Megaparsec as M
import Text.Megaparsec ( Parsec, anySingle, between, choice,
eof, many, manyTill, optional,
parse, parseMaybe, satisfy,
skipMany, some, takeWhile1P,
takeWhileP, try, (<|>) )
import Text.Megaparsec.Char ( char, eol, hspace1, newline,
string )
import Text.Read ( readMaybe )The crawling to ἀγορά 1.0.0
Please note that ἀγορά is absolutely opinionated & have zero configuration support. Unless we share the same appreciation, it would be confusing/inefficient for you to use.
In my research for the perfect static site generator (SSG), only Bagatto truly caught my eye. It self-describes as a transparent, extensible SSG, plus the (most) minimal & elegant frontpage, I appreciated Bagatto immediately.
But yet still, i'm searching for a mechanism that fits my needs exactly. Rather than extensibility/flexibility, i prefer a more compact/accurate one that aligns with my ideomatic preferences. I believe that only in an absolutely customized/tailored environment, I shall create stuff of quality.
I (naïvely) thought a month is more than enough to formalize this SSG. I started the prototyping phase in april.2025. The first blog was published in july. A month later my mind totally shifted, completely erased the existence of it & began to rebuild it from scratch, but in a completely opposite manner. I temporarily leaped to lisp markup but soon transited to literate programming. The first blog was then published in november. Another four months later, I now consider the SSG stable. In other words, this is how i crafted ἀγορά, the SSG that does exactly what i want.
Genesis of στοά: a minimal markup language
(Un)fortunately, the revision of this version of ἀγορά is completely lost.
The first attempt to actually build my own SSG began in april.2025. It was written in NimLang, named stoa/στοά. To be precise, στοά is the surface markup language for ἀγορά. The directory where στοά files are located is called stoae/στοές (the plural of stoa), and stoac is the compiler that transforms στοές into the polis/πόλις (the website). Other aliases follow this central Greek theme.
Why not markdown
It is fundamentally an aesthetic matter. For example, I find | visually prettier than # as the delimiter for headings. As I've mentioned earlier, I believe we are more likely to create something of value when operating in a medium we find most comfortable with. If I consider | prettier than #, this single aesthetic deviation is sufficient justification to design an entirely new markup language. From there, I am free to redefine every delimiter and keep a strictly ideomatic set of markup types.
Let's construct a tiny implementation to capture this initial design. We will use the Gen* prefix so this genesis vocabulary stays distinct from the later formal definitions.
Data models for genesis markup
We start from an empty set of markup kinds. The foundational rule is strict minimalism: a markup is added only when absolutely necessary, and it must be given the most preferable, minimal delimiters.
The lowest level of στοά is lexis/λέξις (inline). An inline is either plain text or a decorated text element (like bold or a link). We define the AST for inlines as a simple sum type.
data GenInl
= GenTxt Text
| GenBld Text
| GenLnk Text Text -- label, link
deriving (Show, Eq)The next structural level is meros/μέρος (block). A block is a discrete paragraph, heading, list, or code snippet. Blocks are generally composed of an array of inlines, except for code blocks which preserve their content as raw, unparsed text.
data GenBlk
= GenPgr [GenInl]
| GenHdg Int [GenInl] -- level, content
| GenLst Int [GenInl] -- level, content
| GenCde Text Text -- lang, content
deriving (Show, Eq)Parsing without backtracking
We use megaparsec to build our parser.
type GenPar = Parsec Void TextBy default, megaparsec does not backtrack automatically. If a parser consumes any input before failing, alternative branches (<|>) are not tried. While you can force backtracking using the try combinator, this would slow down the parsing process by keeping arbitrary input in memory.
We can treat this limitation as another design constraint: every inline and block markup must possess a globally unique starting delimiter.
For instance, Markdown delimits bold with ** and italic with *. A parser seeing an * doesn't immediately know which branch to commit to, which requires backtracking. In στοά, we avoid this entirely: delimit bold with * and italic with / (similar to Neorg). Because the first character strictly defines the node type, our parser becomes incredibly fast and structurally predictable.
With this zero-backtracking rule in hand, we can implement the lexer/parser.
genInl is the root inline parser. It attempts to parse bold text, a link, or defaults to plain text. The choice combinator evaluates these in order without backtracking, which is safe because *, [, and plain characters do not overlap.
genInl :: GenPar GenInl
genInl = choice [genBld, genLnk, genTxt]
where
-- `genBld` expects an opening `*`, reads characters until a newline or closing `*`,
-- and consumes the closing `*`.
genBld = GenBld <$> between (char '*') (char '*') (takeWhile1P (Just "bold text") (`notElem` ("*\n" :: String)))
-- `genLnk` parses `[label](url)`. It leverages `between` for the label, expecting the exact literal `](` as the bridge,
-- then parses the URL until the closing `)`.
genLnk =
GenLnk
<$> between (char '[') (string "](") (takeWhile1P (Just "link label") (`notElem` ("]\n" :: String)))
<*> takeWhile1P (Just "link URL") (`notElem` (")\n" :: String)) <* char ')'
-- `genTxt` is the fallback inline element.
-- It greedily consumes any character that is not a newline or the start of another inline markup (`*`, `[`).
genTxt = GenTxt <$> takeWhile1P (Just "plain text") (`notElem` ("*[\n" :: String))Parsing block elements
For simplicity, we won't implement indented blocks here, meaning blocks will always span a single physical line (except for raw code blocks). A block logically terminates at either a newline or the end of the file. We factor this logic into endBlk so the block parsers don't repeat it.
endBlk :: GenPar ()
endBlk = void eol <|> eofThe block lexer evaluates four distinct block types. Since paragraphs do not have a distinct leading symbol, genPgr acts as the unconditional fallback at the end of choice.
genBlk :: GenPar GenBlk
genBlk = choice [genCde, genHdg, genLst, genPgr]
where
-- `genLvl` counts consecutive occurrences of a delimiter character (e.g., `|||`)
-- and uses its length as the integer "level", consuming trailing horizontal whitespace before the content.
genLvl c = T.length <$> takeWhile1P (Just "block marker") (== c) <* hspace1
-- `genInls` applies `genInl` one or more times to construct a block's content array.
genInls = some genInl
-- `genHdg` constructs a heading. It reads the level from `|`, followed by parsed inlines.
genHdg = GenHdg <$> genLvl '|' <*> (genInls <* endBlk)
-- `genLst` constructs a list item. It reads the level from `-`, followed by parsed inlines.
genLst = GenLst <$> genLvl '-' <*> (genInls <* endBlk)
-- `genCde` handles raw code blocks. The code block delimiter is `@` followed by the language string,
-- terminated by a standalone `@end`. It consumes the language line,
-- then greedily collects raw character data (`manyTill anySingle`) until the closing `@end` is reached.
genCde = do
_ <- char '@'
lng <- takeWhileP (Just "language") (/= '\n') <* eol
bod <- T.pack <$> manyTill anySingle (string "@end" *> endBlk)
pure (GenCde lng bod)
-- `genPgr` is the fallback paragraph block.
-- It simply accumulates parsed inline elements until the end of the line.
genPgr = GenPgr <$> (some genInl <* endBlk)Testing the genesis parser
We can now define a simple test suite to ensure our minimal genesis parser functions correctly.
-- sample test document generated by Gemini-3.1
sampleGenDoc :: Text
sampleGenDoc = T.unlines
[ "| Syntax and Semantics"
, "|| *The Role of Delimiters*"
, "||| [A minimal case study](https://example.com/study)"
, ""
, "The foundation of any markup language relies on plain text, followed by *syntactic forms*, and finally [references](https://example.com/ref) to external domains."
, ""
, "- Structural boundaries"
, "-- *Lexical analysis*"
, "--- [Abstract syntax trees](https://example.com/ast)"
, ""
, "@haskell"
, "genInl :: GenPar GenInl"
, "genInl = choice [genBld, genLnk, genTxt]"
, "@end"
, ""
, "@sh"
, "echo \"a minimal invocation\""
, "echo done"
, "@end"
]
testGenDoc :: IO ()
testGenDoc =
case parseMaybe genDoc sampleGenDoc of
Just blks -> putStrLn . T.unpack $ ppDoc blks
Nothing -> putStrLn "parse failed"
where
genDoc = many (genBlk <* skipMany eol) <* eofλ> testGenDoc
Hdg 1 "Syntax and Semantics"
Hdg 2 *The Role of Delimiters*
Hdg 3 [A minimal case study](https://example.com/study)
Par "The foundation of any markup language relies on plain text, followed by " *syntactic forms* ", and finally " [references](https://example.com/ref) " to external domains."
Lst 1 "Structural boundaries"
Lst 2 *Lexical analysis*
Lst 3 [Abstract syntax trees](https://example.com/ast)
Cod haskell
genInl :: GenPar GenInl
genInl = choice [genBld, genLnk, genTxt]
Cod sh
echo "a minimal invocation"
echo donePretty-printer generated by Sonnet-4.6
You can safely skip this section.
indent :: Int -> Text -> Text
indent n =
T.unlines
. map (T.replicate n " " <>)
. T.lines
quote :: Text -> Text
quote t = "\"" <> escape t <> "\""
where
escape =
T.concatMap $ \c -> case c of
'"' -> "\\\""
'\\' -> "\\\\"
'\n' -> "\\n"
'\t' -> "\\t"
_ -> T.singleton c
ppInl :: GenInl -> Text
ppInl = \case
GenTxt t -> quote t
GenBld t -> "*" <> t <> "*"
GenLnk lbl url -> "[" <> lbl <> "](" <> url <> ")"
ppInls :: [GenInl] -> Text
ppInls = T.unwords . map ppInl
ppBlk :: GenBlk -> Text
ppBlk = \case
GenHdg lvl inls -> "Hdg " <> T.pack (show lvl) <> " " <> ppInls inls
GenLst lvl inls -> "Lst " <> T.pack (show lvl) <> " " <> ppInls inls
GenPgr inls -> "Par " <> ppInls inls
GenCde lng bod -> "Cod " <> lng <> "\n" <> indent 3 bod
ppDoc :: [GenBlk] -> Text
ppDoc = T.unlines . map ppBlkEssentially, the elements above are already sufficient for a workable (but not decent) markup language. The design of ἀγορά will start from this.
Interlude: An aching depart from S-Expression
Before settling down to literate programming, I once experimented with a completely different paradigm: modeling the markup as an S-expression. The genesis design (which forbids backtracking by requiring unique leading symbols) remains entirely valid here. If we replace surface symbols like * or | with Lisp-style forms, we can maintain the same markup coverage while adopting a strictly nested, minimal syntax.
The transformation is straightforward. For inlines, plain text is wrapped in a basic group, while semantic markups use a keyword tag like :b or :a.
(plain text)
(:b bold text)
(:a (label) (url))
For blocks, we adopt a similar structure. A block starts with a keyword like :p (paragraph), :h (heading), or :l (list), followed by its arguments or inline contents.
(:p <inline> ...)
(:h <level> <inline> ...)
(:l <level> <inline> ...)
(:c lang (line 1) ( line 2))
Notice how the :c form handles code blocks. Instead of quoted strings (which would be tedious to type and require escaping), each code line sits in its own (...) group as raw text. This preserves leading spaces organically without introducing a secondary delimiter scheme. Inline forms also use raw text with no quoted string syntax. Prefixing markup forms with : cleanly avoids ambiguity against plain text groups.
Let's prototype this S-expression approach. We will use Lis* naming to keep this Lisp branch distinct from the earlier Gen* and the upcoming Lit* models.
Data models for Lisp markup
We define the AST using the same basic taxonomy: LisInl for inlines and LisBlk for blocks.
data LisInl
= LisTxt Text
| LisBld Text
| LisLnk Text Text
deriving (Show, Eq)
data LisBlk
= LisPgr [LisInl]
| LisHdg Int [LisInl]
| LisLst Int [LisInl]
| LisCde Text [Text]
deriving (Show, Eq)Parsers for Lisp markup
To parse S-expressions easily, we first need a few foundational helpers. S-expressions are heavily dependent on whitespaces and parentheses.
lisWsp consumes optional whitespace characters. It is used around parentheses where spacing doesn't affect semantics.
lisWsp :: GenPar ()
lisWsp = void $ many (char ' ' <|> char '\t' <|> char '\n' <|> char '\r')lisSep consumes mandatory whitespace characters. It is used to separate the leading tag from the content (e.g. between :h and 1).
lisSep :: GenPar ()
lisSep = void $ some (char ' ' <|> char '\t' <|> char '\n' <|> char '\r')lisPrn is a combinator that wraps another parser p in parentheses, absorbing any surrounding padding spaces.
lisPrn :: GenPar a -> GenPar a
lisPrn p = between (lisWsp *> char '(' <* lisWsp) (lisWsp *> char ')' <* lisWsp) pNow we define how to extract raw text content from inside a group. lisRaw greedily consumes any character until it hits a closing parenthesis. This effectively removes the need for quotes.
lisRaw :: GenPar Text
lisRaw = T.pack <$> many (satisfy (/= ')'))lisGrp is a full plain text group parser. It expects an opening parenthesis, extracts the raw text via lisRaw, and consumes the closing parenthesis.
lisGrp :: GenPar Text
lisGrp = lisWsp *> char '(' *> lisRaw <* char ')'For things like heading/list levels, we need to parse integers. lisInt reads continuous digit characters and converts them to an Int.
lisInt :: GenPar Int
lisInt = read . T.unpack <$> takeWhile1P (Just "integer") (`elem` ("0123456789" :: String))The language label in a code block is parsed as an atom, which is any continuous sequence of non-whitespace, non-parenthesis characters.
lisAtm :: GenPar Text
lisAtm =
takeWhile1P
(Just "atom")
(\c -> c /= '(' && c /= ')' && c /= ' ' && c /= '\t' && c /= '\n' && c /= '\r')lisTag is a simple helper to match explicit keywords (like :h or :p).
lisTag :: Text -> GenPar ()
lisTag t = void $ string tWith the helpers in place, we can parse inlines. We use M.try for :b and :a because if they fail to match the tag, we need to backtrack and try parsing them as plain text (...) which has no tag.
lisInl :: GenPar LisInl
lisInl = choice [M.try lisBld, M.try lisLnk, lisTxt]
where
lisTxt = LisTxt <$> lisGrp
lisBld = lisPrn $ LisBld <$> (lisTag ":b" *> lisSep *> lisRaw)
lisLnk = lisPrn $ LisLnk <$> (lisTag ":a" *> lisSep *> lisGrp) <*> (lisSep *> lisGrp)Code lines are just groups of raw text, exactly like plain text inlines.
lisCdeLne :: GenPar Text
lisCdeLne = lisGrpBlocks follow the same tagged S-expression pattern. We try to match specific block tags (:c, :h, :l), and if none match, we fall back to a paragraph (:p).
lisBlk :: GenPar LisBlk
lisBlk = choice [M.try lisCde, M.try lisHdg, M.try lisLst, lisPgr]
where
lisInls = some lisInl
lisPgr = lisPrn $ LisPgr <$> (lisTag ":p" *> lisSep *> lisInls)
lisHdg = lisPrn $ LisHdg <$> (lisTag ":h" *> lisSep *> lisInt) <*> (lisSep *> lisInls)
lisLst = lisPrn $ LisLst <$> (lisTag ":l" *> lisSep *> lisInt) <*> (lisSep *> lisInls)
lisCde = lisPrn $ LisCde <$> (lisTag ":c" *> lisSep *> lisAtm) <*> (lisSep *> some lisCdeLne)A whole Lisp document is simply a sequence of blocks surrounded by optional whitespace.
lisDoc :: GenPar [LisBlk]
lisDoc = lisWsp *> many lisBlk <* lisWspTesting the Lisp parser
We can now verify this approach with a comprehensive sample document.
sampleLisDoc :: Text
sampleLisDoc = T.unlines
[ "(:h 1 (Syntax and Semantics))"
, "(:h 2 (:b The Role of Delimiters))"
, "(:h 3 (:a (A minimal case study) (https://example.com/study)))"
, "(:p (The foundation of any markup language relies on plain text, followed by ) (:b syntactic forms) (, and finally ) (:a (references) (https://example.com/ref)) ( to external domains.))"
, "(:l 1 (Structural boundaries))"
, "(:l 2 (:b Lexical analysis))"
, "(:l 3 (:a (Abstract syntax trees) (https://example.com/ast)))"
, "(:c haskell (genInl :: GenPar GenInl) (genInl = choice [genBld, genLnk, genTxt]))"
, "(:c sh (echo \"a minimal invocation\") (echo done))"
]
testLisDoc :: IO ()
testLisDoc =
case parseMaybe (lisDoc <* eof) sampleLisDoc of
Just blks -> putStrLn . T.unpack $ ppLisDoc blks
Nothing -> putStrLn "parse failed"λ> testLisDoc
Hdg 1 "Syntax and Semantics"
Hdg 2 *The Role of Delimiters*
Hdg 3 [A minimal case study](https://example.com/study)
Par "The foundation of any markup language relies on plain text, followed by " *syntactic forms* ", and finally " [references](https://example.com/ref) " to external domains."
Lst 1 "Structural boundaries"
Lst 2 *Lexical analysis*
Lst 3 [Abstract syntax trees](https://example.com/ast)
Cod haskell
genInl :: GenPar GenInl
genInl = choice [genBld, genLnk, genTxt]
Cod sh
echo "a minimal invocation"
echo doneppLisInl :: LisInl -> Text
ppLisInl = \case
LisTxt txt -> quote txt
LisBld txt -> "*" <> txt <> "*"
LisLnk lbl url -> "[" <> lbl <> "](" <> url <> ")"
ppLisInls :: [LisInl] -> Text
ppLisInls = T.unwords . map ppLisInl
ppLisBlk :: LisBlk -> Text
ppLisBlk = \case
LisHdg lvl inls -> "Hdg " <> T.pack (show lvl) <> " " <> ppLisInls inls
LisLst lvl inls -> "Lst " <> T.pack (show lvl) <> " " <> ppLisInls inls
LisPgr inls -> "Par " <> ppLisInls inls
LisCde lng lns -> "Cod " <> lng <> "\n" <> indent 3 (T.unlines lns)
ppLisDoc :: [LisBlk] -> Text
ppLisDoc = T.unlines . map ppLisBlkWhy not lisp
In pure aesthetic terms, lisp markup can be strikingly beautiful. It surfaces the structural reality of the AST directly into the text, eliminating ad-hoc delimiter collisions and escaping rules entirely. Adding a new markup tag is as trivial as picking a new symbol.
But aesthetics and ergonomics are not always perfectly aligned. The cost of this uniform purity is raw friction in the act of authoring.
Take a standard mixed-format sentence: (:p (The foundation of any markup language relies on plain text, followed by ) (:b syntactic forms) (, and finally ) (:a (references) (https://example.com/ref)) ( to external domains.))
For a structural engineer or a parser, this is crystal. For a writer in flow, it's a nightmare. You're forced to aggressively slice your continuous thoughts into explicitly segmented nodes, maintaining parenthesis balance manually. What used to be a simple fluid string plain text, *a claim*, and [evidence]... becomes a cognitive load of opening, closing, and nesting groups.
Also, editing existing prose becomes brittle. Deleting a word that happens to cross a boundary forces you to restructure the tree. If you want to bold a phrase that was previously part of a larger (...) block, you must fracture that block into three new sibling elements.
So while Lisp markup solved our parsing rigidity and uniformity problems gracefully, the writing experience was unbearable for long-form prose. This realization pushed me toward the final iteration of ἀγορά.
A leap to literate programming
Most of my blogs are code-heavy, and it is strictly necessary to guarantee that all the embedded code correctly compiles. In the traditional workflow where we write in a dedicated markup file (e.g., foo.stoa), we're forced to build tooling that extracts code blocks and passes them to the underlying compiler (like ghc).
This creates massive friction: compiler errors map to the wrong line numbers, language servers fail to provide inline feedback, formatters break, and editor syntax highlighting struggles.
This structural pain naturally led me to literate programming. Instead of writing markup that contains code, we write code that contains markup. The source file is a completely valid foo.hs or foo.rkt file that natively compiles in its ecosystem. The prose resides inside the native comments of that language.
To keep this parser language-agnostic, the core invariant rules are:
Code blocks are no longer a markup construct.
Prose is written exclusively inside standalone, single-line comments.
A comment block is considered "prose" if it is contiguous and isolated (i.e., bounded by empty lines or other prose lines, but never hugging code directly).
Comments attached directly to code (like a type signature explanation) remain raw code.
Let's construct this final, stable version of ἀγορά using the Lit* prefix.
Data models for literate markup
Because we are back in plain text comments, we can revert to the exact Gen* syntax rules (e.g., *bold*, [label](url), | heading) for our inline elements.
data LitInl
= LitTxt Text
| LitBld Text
| LitLnk Text Text
deriving (Show, Eq)
litInl :: GenPar LitInl
litInl = choice [litBld, litLnk, litTxt]
where
litBld = LitBld <$> between (char '*') (char '*') (takeWhile1P (Just "bold text") (`notElem` ("*\n" :: String)))
litLnk =
LitLnk
<$> between (char '[') (string "](") (takeWhile1P (Just "link label") (`notElem` ("]\n" :: String)))
<*> takeWhile1P (Just "link URL") (`notElem` (")\n" :: String)) <* char ')'
litTxt = LitTxt <$> takeWhile1P (Just "plain text") (`notElem` ("*[\n" :: String))The block model is also identical to the genesis version, except the code block (LitCde) no longer takes a language string. The entire file is assumed to be written in the host language (e.g., Haskell), so we just store the raw code text.
data LitBlk
= LitHdg Int [LitInl]
| LitLst Int [LitInl]
| LitPgr [LitInl]
| LitCde Text
deriving (Show, Eq)Parsing literate blocks
The parser for the prose block is a simple choice among headings, lists, and paragraphs. Notice there is no code block parser here, because code is handled completely outside the markup parsing logic.
litPrsBlk :: GenPar LitBlk
litPrsBlk = choice [litHdg, litLst, litPgr]
where
litLvl c = T.length <$> takeWhile1P (Just "block marker") (== c) <* hspace1
litInls = some litInl
litHdg = LitHdg <$> litLvl '|' <*> (litInls <* eof)
litLst = LitLst <$> litLvl '-' <*> (litInls <* eof)
litPgr = LitPgr <$> (litInls <* eof)Extracting prose from code
We presume ourselves in a Haskell file, so a single-line comment is prefixed with --. In a generalized engine, this prefix would be dynamically loaded based on the file extension.
isEmp checks if a line is completely empty (or just spaces).
isEmp :: Text -> Bool
isEmp = T.null . T.stripisCmt checks if a line starts with the host language's comment prefix.
isCmt :: Text -> Bool
isCmt = T.isPrefixOf "--" . T.stripStartThe heuristic for isolating prose from code documentation is adjacency. isPrs determines if the current comment line is prose by checking its neighbors. A line is prose if it is a comment && both its previous and next lines are either empty lines or other comments. If a comment touches code, it's treated as code (comment).
isPrs :: Maybe Text -> Text -> Maybe Text -> Bool
isPrs prv cur nxt =
isCmt cur
&& maybe True (\ln -> isEmp ln || isCmt ln) prv
&& maybe True (\ln -> isEmp ln || isCmt ln) nxtOnce we've identified a line as prose, we need to extract its content. strCmt drops the comment prefix and the single mandatory space following it.
strCmt :: Text -> Text
strCmt ln =
if isCmt ln
then T.dropWhile (== ' ') . T.drop 2 . T.stripStart $ ln
else lnprsLne applies our markup parser litPrsBlk to the stripped comment body. If the line is empty after stripping, it returns nothing. If the parser fails (which shouldn't happen due to the LitPgr fallback), it defaults to a plain text paragraph.
prsLne :: Text -> [LitBlk]
prsLne ln =
case T.strip bod of
"" -> []
_ -> [maybe (LitPgr [LitTxt bod]) id (parseMaybe litPrsBlk bod)]
where
bod = strCmt lnLexing the literate document
We lex the whole literate file line-by-line, group lines into contiguous chunks of either prose or code.
toCde safely bundles a list of accumulated raw code lines into a single LitCde block.
toCde :: [Text] -> [LitBlk]
toCde lns =
case T.intercalate "\n" lns of
"" -> []
txt -> [LitCde txt]litFromLns is the core lexer. It steps through the file using a sliding window of three lines (previous, current, next) to satisfy the isPrs adjacency rules. It accumulates consecutive code lines into cde and parses prose lines instantly into acc, flushing the code buffer whenever a prose block begins.
litFromLns :: [Text] -> [LitBlk]
litFromLns lns = go trip [] []
where
prv = Nothing : map Just lns
nxt = map Just (drop 1 lns) <> [Nothing]
trip = zip3 prv lns nxt
go [] acc cde = acc <> toCde cde
go ((p, l, n) : rst) acc cde
| isPrs p l n = go rst (acc <> toCde cde <> prsLne l) []
| isEmp l =
case cde of
[] -> go rst acc []
_ -> go rst acc (cde <> [l])
| otherwise = go rst acc (cde <> [l])Finally, litDoc ties it all together: it consumes the entire file as raw text, splits it into lines, and feeds it through the lexer.
litDoc :: GenPar [LitBlk]
litDoc = litFromLns . T.lines <$> takeWhileP (Just "source text") (const True)
testLitDoc :: IO ()
testLitDoc =
case parseMaybe (litDoc <* eof) sampleLitDoc of
Just blks -> putStrLn . T.unpack $ ppLitDoc blks
Nothing -> putStrLn "parse failed"
sampleLitDoc :: Text
sampleLitDoc = T.unlines
[ "-- | Syntax and Semantics"
, "--"
, "-- The foundation of any markup language relies on plain text, followed by *syntactic forms*, and finally [references](https://example.com/ref) to external domains."
, "-- - Structural boundaries"
, "-- -- *Lexical analysis*"
, "-- --- [Abstract syntax trees](https://example.com/ast)"
, ""
, "genInl :: GenPar GenInl"
, "genInl = choice [genBld, genLnk, genTxt]"
, "-- comment after code: stays in code block"
, ""
, "-- comment before code: stays in code block"
, "echo :: Text"
, "echo = \"a minimal invocation\""
]λ> testLitDoc
Hdg 1 "Syntax and Semantics"
Par "The foundation of any markup language relies on plain text, followed by " *syntactic forms* ", and finally " [references](https://example.com/ref) " to external domains."
Lst 1 "Structural boundaries"
Lst 2 *Lexical analysis*
Lst 3 [Abstract syntax trees](https://example.com/ast)
Cod
genInl :: GenPar GenInl
genInl = choice [genBld, genLnk, genTxt]
-- comment after code: stays in code block
Hdg 1 "The Role of Delimiters"
Par "A parser without " *backtracking* " finds edge cases."
Cod
-- comment before code: stays in code block
echo :: Text
echo = "a minimal invocation"ppLitInl :: LitInl -> Text
ppLitInl = \case
LitTxt txt -> quote txt
LitBld txt -> "*" <> txt <> "*"
LitLnk lbl url -> "[" <> lbl <> "](" <> url <> ")"
ppLitInls :: [LitInl] -> Text
ppLitInls = T.unwords . map ppLitInl
ppLitBlk :: LitBlk -> Text
ppLitBlk = \case
LitHdg lvl inls -> "Hdg " <> T.pack (show lvl) <> " " <> ppLitInls inls
LitLst lvl inls -> "Lst " <> T.pack (show lvl) <> " " <> ppLitInls inls
LitPgr inls -> "Par " <> ppLitInls inls
LitCde txt -> "Cod\n" <> indent 3 txt
ppLitDoc :: [LitBlk] -> Text
ppLitDoc = T.unlines . map ppLitBlkThe anatomy of ἀγορά
Having walked through three approaches to markup: surface syntax, S-expressions, and literate programming, we can now build the actual site generator. The preceding sections established the markup vocabulary: headings (| heading, || heading, ||| heading), paragraphs, lists (- item, -- item), and literate code blocks. This section implements the full pipeline: from a directory of stoa haskell files (στοές) to a published website with blog pages, a homepage, tag pages, and an RSS feed (πόλις).
The pipeline has five stages.
A file is scanned to separate isolated comment blocks (prose) from the surrounding code.
The first comment block is parsed as metadata.
The remaining blocks are parsed into an AST.
Code blocks are syntax-highlighted by invoking tree-sitter on the original source file.
The AST is rendered to HTML and written to disk.
Inline and block types
The inline and block types mirror those from the genesis and literate sections, with one structural addition: headings carry a pre-computed element id alongside their content, used both as an HTML anchor and as the identifier for the sidebar outline. Code blocks store a string of line-number placeholders ({{n}}) which are later substituted with highlighted HTML by the tree-sitter stage.
data Inl
= Txt Text
| Bld Text
| Lnk Text Text
deriving (Show, Eq)
data Blk
= Hdg Int [Inl] Text -- level, content, element id
| Par [Inl]
| Lst Int [Inl]
| Cod Text -- highlighted HTML lines (or raw code in dry mode)
deriving (Show, Eq)Each article page carries a sidebar outline extracted from the document's headings. Only headings at levels one through three are included.
data Out = Out
{ outTxt :: Text
, outLvl :: Int
, outId :: Text
} deriving (Show, Eq)The metadata lives in the first isolated comment block of every stoa-haskell file, encoded as key = value pairs. The description field may span multiple lines.
data Met = Met
{ metTtl :: Text
, metPub :: Text
, metTag :: Text
, metDsc :: Text
} deriving (Show, Eq)A parsed article combines its metadata, block list, outline, and original file path. The path is kept so that tree-sitter can be invoked on the correct source file.
data Doc = Doc
{ docMet :: Met
, docBlk :: [Blk]
, docOut :: [Out]
, docPth :: FilePath
} deriving (Show, Eq)Tags aggregate documents under a shared label, used to build per-tag index pages.
data Tag = Tag
{ tagNam :: Text
, tagDoc :: [Doc]
} deriving (Show, Eq)Region extraction
A stoa-haskell file interleaves prose (isolated comment blocks) and code. The first task is to partition the file into a sorted sequence of regions. A region is either a comment block or a code span.
data Rgn
= RCom Text Int Int -- stripped comment text, first line, last line
| RCod Int Int -- code span, first line, last line
deriving (Show, Eq)As described in literate section, a comment block is isolated when the line immediately before it and the line immediately after it are both empty (or do not exist). Comments attached directly to code are treated as code.
findComs scans the indexed line list for sequences of comment lines that satisfy the isolation condition.
findComs :: [(Int, Text)] -> [(Text, Int, Int)]
findComs [] = []
findComs all = go all
where
go [] = []
go ((n, l) : rest)
| isCmt l =
let (blk, aft) = span (isCmt . snd) ((n, l) : rest)
befOk = n == 1 || maybe True isEmp (lookup (n - 1) all)
aftOk = case aft of { [] -> True; ((_, h) : _) -> isEmp h }
in if befOk && aftOk
then mkCom blk : go aft
else go rest
| otherwise = go restmkCom bundles a contiguous sequence of comment lines into (strippedText, firstLine, lastLine). It strips the -- prefix and the single space following it from each line.
mkCom :: [(Int, Text)] -> (Text, Int, Int)
mkCom [] = error "mkCom: empty block"
mkCom lns@((b, _) : _) =
( T.intercalate "\n" (map (strip1 . snd) lns)
, b
, fst (last lns)
)strip1 removes the -- comment prefix and the leading space after it.
strip1 :: Text -> Text
strip1 ln =
case T.stripPrefix "--" (T.stripStart ln) of
Just rst -> case T.uncons rst of
Just (' ', r) -> r
_ -> rst
Nothing -> lncodeGaps fills in the code regions that exist between and around the comment blocks. It takes the sorted comment block list and the total line count, then generates RCod entries for every line range not covered by a comment block.
codeGaps :: [(Text, Int, Int)] -> Int -> [Rgn]
codeGaps [] tot = [RCod 1 tot | tot > 0]
codeGaps all@((_, b0, _) : _) tot = bef ++ gaps ++ aft
where
bef = [RCod 1 (b0 - 1) | b0 > 1]
gaps = [ RCod (ea + 1) (bb - 1)
| ((_, _, ea), (_, bb, _)) <- zip all (drop 1 all)
, ea + 1 <= bb - 1
]
(_, _, le) = last all
aft = [RCod (le + 1) tot | le < tot]rgnBeg extracts the start line from a region, used for sorting.
rgnBeg :: Rgn -> Int
rgnBeg (RCom _ b _) = b
rgnBeg (RCod b _) = bextLns is the pure core of extraction. It takes the file's lines, finds all isolated comment blocks, fills in code gaps, and returns a sorted region list.
extLns :: [Text] -> [Rgn]
extLns lns =
let idx = zip [1..] lns
tot = length lns
cms = findComs idx
cds = codeGaps cms tot
in sortBy (comparing rgnBeg) (map toCom cms ++ cds)
where
toCom (txt, b, e) = RCom txt b eext is the IO wrapper: it reads the file from disk and delegates to extLns.
ext :: FilePath -> IO [Rgn]
ext pth = extLns . T.lines <$> TIO.readFile pthMetadata parsing
The first isolated comment block of every stoa-haskell file is its metadata header. It contains four required fields in key = value format. The description field may continue on subsequent indented lines.
pMet reads a sequence of key-value pairs and folds them into a Met record, which is then validated before being returned.
pMet :: GenPar Met
pMet = do
kvs <- many (pKV <* optional newline)
eof
pure (foldl apply (Met "" "" "" "") kvs)pKV reads a single key = value entry. For the description key it delegates to pCont to collect any continuation lines.
pKV :: GenPar (Text, Text)
pKV = do
key <- T.strip <$> takeWhile1P (Just "key") (/= '=')
_ <- char '='
val <- T.strip <$> takeWhileP (Just "value") (/= '\n')
rst <- if key == "description" then pCont val else pure val
pure (key, rst)pCont greedily consumes continuation lines. A line continues the description if it follows a newline and does not look like a new key = ... entry (detected by pKVLook).
pCont :: Text -> GenPar Text
pCont acc = do
nxt <- optional (try (newline *> M.notFollowedBy (eof <|> void newline <|> void pKVLook) *> pLn))
case nxt of
Nothing -> pure acc
Just ln -> pCont (acc <> " " <> T.strip ln)
where
pLn = takeWhileP Nothing (/= '\n')
pKVLook = takeWhile1P Nothing (\c -> c /= '=' && c /= '\n') *> char '=' *> pure ()apply folds a single key-value pair into the Met record.
apply :: Met -> (Text, Text) -> Met
apply m ("title", v) = m { metTtl = v }
apply m ("pubDate", v) = m { metPub = v }
apply m ("tags", v) = m { metTag = v }
apply m ("description", v) = m { metDsc = v }
apply m _ = mvalidate checks that all four required fields are present and that pubDate matches yyyy-mm-dd.
validate :: Met -> Either String Met
validate m
| T.null (metTtl m) = Left "missing 'title'"
| T.null (metPub m) = Left "missing 'pubDate'"
| not (validDate (metPub m)) = Left "invalid pubDate, expected yyyy-mm-dd"
| T.null (metTag m) = Left "missing 'tags'"
| T.null (metDsc m) = Left "missing 'description'"
| otherwise = Right m
validDate :: Text -> Bool
validDate t =
T.length t == 10
&& T.index t 4 == '-'
&& T.index t 7 == '-'met is the public entry point: it parses the stripped metadata text and validates the result.
met :: Text -> Either String Met
met txt = case parse pMet "<meta>" txt of
Left err -> Left (M.errorBundlePretty err)
Right m -> validate mtags splits the comma-separated tag string into a list of trimmed tag names.
tags :: Text -> [Text]
tags = filter (not . T.null) . map T.strip . T.splitOn ","pageId derives the page identifier from the source file's base name (without extension).
pageId :: FilePath -> Text
pageId = T.pack . takeBaseNameContent parsing
With metadata extracted, the remaining regions are converted into Blk values. Comment regions are parsed line-by-line into headings, lists, and paragraphs. Code regions become Cod blocks whose content is a sequence of line-number placeholder strings like {{42}}\n{{43}}\n..., later resolved by the highlighting stage.
mrk is the top-level content parser. It drops the first region (the metadata comment) and maps the rest to blocks, then extracts the outline.
mrk :: [Rgn] -> ([Blk], [Out])
mrk rgns =
let blks = concatMap rgnToBlks rgns
in (blks, mkOut blks)rgnToBlks dispatches on region type. For comment regions it calls parseLines on the stripped text; for code regions it synthesises one Cod block whose body is a newline-joined sequence of {{n}} placeholders for each line in the range.
rgnToBlks :: Rgn -> [Blk]
rgnToBlks (RCom txt _ _) = parseLines (T.lines txt)
rgnToBlks (RCod s e) =
let ph = T.intercalate "\n" ["{{" <> T.pack (show n) <> "}}" | n <- [s..e]]
in [Cod ph]parseLines iterates lines from a comment block. Blank lines are skipped. Heading and list lines are consumed one at a time. Paragraphs accumulate continuation lines until they hit a blank or a block-start.
parseLines :: [Text] -> [Blk]
parseLines = go
where
go [] = []
go (l : ls)
| isEmp (T.strip l) = go ls
| "|" `T.isPrefixOf` ln =
let (lvl, rst) = prefLvl '|' ln
in Hdg lvl (pInls rst) (genId rst) : go ls
| "-" `T.isPrefixOf` ln =
let (lvl, rst) = prefLvl '-' ln
in Lst lvl (pInls rst) : go ls
| otherwise =
let (cont, rest) = span (\x -> not (isEmp (T.strip x)) && not (isBlkStart (T.strip x))) ls
txt = T.intercalate " " (ln : map T.strip cont)
in Par (pInls txt) : go rest
where ln = T.strip lprefLvl counts leading repetitions of a delimiter character and returns the level alongside the remaining content. For example, prefLvl '|' "|| heading" gives (2, "heading").
prefLvl :: Char -> Text -> (Int, Text)
prefLvl c t =
let n = T.length (T.takeWhile (== c) t)
in (n, T.strip (T.drop n t))isBlkStart recognises lines that open a new block, so paragraph continuation stops at the right point.
isBlkStart :: Text -> Bool
isBlkStart t = "|" `T.isPrefixOf` t || "-" `T.isPrefixOf` tInline parsing reuses the zero-backtracking design from the genesis section. pInls applies pInl repeatedly over a single line of text.
pInls :: Text -> [Inl]
pInls t = case parseMaybe pManyInl t of
Just is -> is
Nothing -> [Txt t]
pManyInl :: GenPar [Inl]
pManyInl = many pInl <* eof
pInl :: GenPar Inl
pInl = choice [try pBld, try pLnk, pTxt']
pBld :: GenPar Inl
pBld = Bld <$> between (char '*') (char '*')
(takeWhile1P (Just "bold text") (`notElem` ("*\n" :: String)))
pLnk :: GenPar Inl
pLnk =
Lnk
<$> between (char '[') (string "](") (takeWhile1P (Just "link label") (`notElem` ("]\n" :: String)))
<*> takeWhile1P (Just "url") (`notElem` (")\n" :: String))
<* char ')'
pTxt' :: GenPar Inl
pTxt' = Txt . T.singleton <$> anySinglegenId transforms heading text into a valid HTML id: spaces become dashes, non-alphanumeric characters are dropped, and everything is lowercased.
genId :: Text -> Text
genId = T.pack . go . T.unpack
where
go [] = []
go (c : cs)
| c == ' ' || c == '\t' = '-' : go cs
| isAlphaNum c || c == '-' || c == '_' = toLower c : go cs
| otherwise = go csplain extracts the plain text from a list of inlines, used when generating outline entries.
plain :: [Inl] -> Text
plain = T.concat . map go
where
go (Txt t) = t
go (Bld t) = t
go (Lnk lbl _) = lblmkOut collects headings at levels one through three into the document outline.
mkOut :: [Blk] -> [Out]
mkOut blks =
[ Out { outTxt = plain cnt, outLvl = lvl, outId = eid }
| Hdg lvl cnt eid <- blks
, lvl <= 3
]Syntax highlighting
The syntax highlighting design works in two passes. During content parsing, every code region becomes a Cod block whose body is a newline-separated list of placeholder lines: {{1}}\n{{2}}\n...{{n}}. These refer back to the line numbers of the original source file.
In the second pass, tree-sitter is invoked on the original file with --css-classes and HTML output. It returns a table of per-line highlighted HTML. hlt reads this table into a Map Int Text and substitutes each {{n}} placeholder with the corresponding highlighted line.
Because tree-sitter operates on the whole file, every language construct is highlighted correctly, and the result integrates into the blog page as pre-escaped HTML.
parsePh recognises a {{n}} placeholder and extracts the line number.
parsePh :: Text -> Maybe Int
parsePh t
| "{{" `T.isPrefixOf` t && "}}" `T.isSuffixOf` t =
readMaybe (T.unpack (T.drop 2 (T.dropEnd 2 t)))
| otherwise = Nothingsubst replaces every {{n}} line in a Cod block with its corresponding entry from the line map.
subst :: Map Int Text -> Blk -> Blk
subst lmp (Cod txt) =
let lns = T.lines txt
out = map (\l -> case parsePh l of
Just n -> Map.findWithDefault "" n lmp
Nothing -> l) lns
res = T.dropWhileEnd (== '\n') (T.intercalate "\n" out)
in Cod res
subst _ blk = blkfindAfter advances the cursor past the first occurrence of a needle string, returning the remainder.
findAfter :: Text -> Text -> Maybe Text
findAfter needle hay =
let (_, aft) = T.breakOn needle hay
in if T.null aft then Nothing
else Just (T.drop (T.length needle) aft)parseHlt parses the HTML table emitted by tree-sitter highlight -H --css-classes into a line number to highlighted-HTML map. Each row looks like: <tr><td class=line-number>N</td><td class=line>...highlighted HTML...</td></tr>
parseHlt :: Text -> Map Int Text
parseHlt = go Map.empty
where
trS = "<tr><td class=line-number>"
trM = "</td><td class=line>"
trE = "</td></tr>"
go acc txt
| T.null txt = acc
| Just aft <- findAfter trS txt =
case T.breakOn trM aft of
(num, rst) | T.null rst -> acc
| otherwise ->
let cnt = T.drop (T.length trM) rst
in case T.breakOn trE cnt of
(ln, rst') | T.null rst' -> acc
| otherwise ->
case readMaybe (T.unpack (T.strip num)) of
Just n -> go (Map.insert n (T.dropWhileEnd (== '\n') ln) acc)
(T.drop (T.length trE) rst')
Nothing -> go acc (T.drop 1 rst')
| otherwise = acctreeSit calls tree-sitter on the source file and returns the parsed line map.
treeSit :: FilePath -> IO (Map Int Text)
treeSit pth = do
(code, out, _) <- readProcess (proc "tree-sitter" ["highlight", "-H", "--css-classes", pth])
case code of
ExitFailure _ -> pure Map.empty
ExitSuccess -> pure (parseHlt (TL.toStrict (TLE.decodeUtf8 out)))hlt runs the full highlighting pass for a document.
hlt :: FilePath -> [Blk] -> IO [Blk]
hlt pth blks = do
lmp <- treeSit pth
pure (map (subst lmp) blks)hltDry is a pure variant used in tests. Instead of calling tree-sitter, it resolves {{n}} placeholders directly from the source lines, yielding the original (unhighlighted) code text.
hltDry :: [Text] -> [Blk] -> [Blk]
hltDry lns blks =
let lmp = Map.fromList (zip [1..] lns)
in map (go lmp) blks
where
go lmp (Cod txt) =
let out = map (\l -> case parsePh l of
Just n -> Map.findWithDefault "" n lmp
Nothing -> l) (T.lines txt)
in Cod (T.intercalate "\n" out)
go _ blk = blkHTML generation
HTML is generated as plain Text using string concatenation. This keeps the implementation self-contained and avoids coupling the blog post to a template library.
escHtm escapes the five characters that are significant in HTML: &, <, >, ", and '. It is applied to all user-supplied text before it is embedded in HTML.
escHtm :: Text -> Text
escHtm = T.concatMap go
where
go '&' = "&"
go '<' = "<"
go '>' = ">"
go '"' = """
go '\'' = "'"
go c = T.singleton cinlHtm renders a list of inline elements to HTML. Bold becomes <strong>, links become <a href=...>.
inlHtm :: [Inl] -> Text
inlHtm = T.concat . map go
where
go (Txt t) = escHtm t
go (Bld t) = "<strong>" <> escHtm t <> "</strong>"
go (Lnk lbl u) = "<a href=\"" <> escHtm u <> "\">" <> escHtm lbl <> "</a>"blkHtm renders a single block. Code blocks emit pre-escaped HTML (already processed by tree-sitter); all other content goes through escHtm.
blkHtm :: Blk -> Text
blkHtm (Hdg lvl cnt eid) = hN lvl eid (inlHtm cnt)
blkHtm (Par cnt) = "<p>" <> inlHtm cnt <> "</p>\n"
blkHtm (Lst lvl cnt) = "<p class=\"l" <> T.pack (show lvl) <> "-list\">"
<> inlHtm cnt <> "</p>\n"
blkHtm (Cod txt) = "<pre><code>" <> txt <> "</code></pre>\n"hN wraps content in the appropriate heading tag with an id attribute.
hN :: Int -> Text -> Text -> Text
hN n eid cnt =
let tag = "h" <> T.pack (show (min n 6))
in "<" <> tag <> " id=\"" <> eid <> "\">" <> cnt <> "</" <> tag <> ">\n"page is the full HTML shell. The css parameter is the relative path to the stylesheet directory, which differs between blog pages (../styles/) and the home page (styles/).
page :: Text -> Text -> Text -> Text
page css ttl bdy = T.concat
[ "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n"
, "<meta charset=\"UTF-8\">\n"
, "<title>" <> escHtm ttl <> "</title>\n"
, "<link rel=\"stylesheet\" href=\"" <> css <> "prima.css\">\n"
, "</head>\n<body>\n"
, bdy
, "</body>\n</html>\n"
]outItem renders a single outline entry as a navigation link.
outItem :: Out -> Text
outItem o =
"<a href=\"#" <> outId o <> "\" class=\"h" <> T.pack (show (outLvl o))
<> "\">" <> escHtm (outTxt o) <> "</a>\n"tagLinks renders a comma-separated list of tag links given a path prefix.
tagLinks :: Text -> [Text] -> Text
tagLinks pfx tgs =
T.intercalate ", "
(map (\t -> "<a href=\"" <> pfx <> t <> ".html\">" <> escHtm t <> "</a>") tgs)blogPage assembles the full article page: an aside with the document outline and a main section with the title, publication date, tags, description, and rendered blocks.
blogPage :: Doc -> [Text] -> Text
blogPage doc tgs = page "../styles/" (metTtl m) bdy
where
m = docMet doc
bdy = T.concat
[ "<aside class=\"outline-sidebar\">\n"
, "<nav class=\"outline\">\n"
, T.concat (map outItem (docOut doc))
, "</nav>\n</aside>\n"
, "<main class=\"main-content\">\n"
, "<h1>" <> escHtm (metTtl m) <> "</h1>\n"
, "<div class=\"metadata\">"
, "<span class=\"date\">" <> metPub m <> "</span> "
, "<span class=\"tags\">" <> tagLinks "../tags/" tgs <> "</span>"
, "</div>\n"
, "<div class=\"description\">" <> escHtm (metDsc m) <> "</div>\n"
, T.concat (map blkHtm (docBlk doc))
, "</main>\n"
]blogEntry renders the summary card for a post as it appears on the home page and tag pages.
blogEntry :: Text -> Text -> Doc -> [Text] -> Text -> Text
blogEntry blgPfx tagPfx doc tgs pid = T.concat
[ "<article class=\"blog-entry\">\n"
, "<h2><a href=\"" <> blgPfx <> pid <> ".html\">"
, escHtm (metTtl (docMet doc)) <> "</a></h2>\n"
, "<div class=\"date\">" <> metPub (docMet doc) <> "</div>\n"
, "<div class=\"description\">" <> escHtm (metDsc (docMet doc)) <> "</div>\n"
, "<div class=\"tags\">" <> tagLinks tagPfx tgs <> "</div>\n"
, "</article>\n"
]homePage lists all posts in reverse-chronological order.
homePage :: [(Doc, [Text], Text)] -> Text
homePage docs = page "styles/" "home" bdy
where
bdy = "<main class=\"main-content\">\n<h1>all posts</h1>\n"
<> T.concat (map (\(d, t, p) -> blogEntry "blogs/" "tags/" d t p) docs)
<> "</main>\n"tagPage lists all posts sharing a given tag.
tagPage :: Tag -> [(Doc, [Text], Text)] -> Text
tagPage tag docs = page "../styles/" ("tag: " <> tagNam tag) bdy
where
bdy = "<main class=\"main-content\">\n<h1>tag: "
<> escHtm (tagNam tag) <> "</h1>\n"
<> T.concat (map (\(d, t, p) -> blogEntry "../blogs/" "../tags/" d t p) docs)
<> "</main>\n"RSS feed
The RSS feed lists every article as an <item> element. The description is wrapped in a <![CDATA[...]]> section so it can contain arbitrary text without needing XML escaping.
escXml escapes the five XML-significant characters, used in title and link attributes.
escXml :: Text -> Text
escXml = T.concatMap go
where
go '&' = "&"
go '<' = "<"
go '>' = ">"
go '"' = """
go '\'' = "'"
go c = T.singleton crssItem renders a single feed entry.
rssItem :: Text -> (Doc, [Text], Text) -> Text
rssItem url (doc, _, pid) = T.concat
[ "<item>\n"
, "<title>" <> escXml (metTtl m) <> "</title>\n"
, "<link>" <> escXml url <> "/blogs/" <> pid <> ".html</link>\n"
, "<pubDate>" <> metPub m <> "</pubDate>\n"
, "<description><![CDATA[" <> metDsc m <> "]]></description>\n"
, "</item>\n"
]
where m = docMet docrss assembles the full feed document.
rss :: Text -> [(Doc, [Text], Text)] -> Text
rss url docs = T.concat
[ "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
, "<rss version=\"2.0\">\n<channel>\n"
, "<title>blog</title>\n"
, "<link>" <> escXml url <> "</link>\n"
, T.concat (map (rssItem url) docs)
, "</channel>\n</rss>\n"
]Site builder
The site builder composites the full pipeline. scn finds all .hs files in a source directory recursively. processDoc runs the five-stage pipeline on a single file. Finally bld coordinates everything.
scn traverses the source directory and collects all .hs files.
scn :: FilePath -> IO [FilePath]
scn dir = do
ent <- listDirectory dir
fmap concat . mapM go $ map (dir </>) ent
where
go pth = do
isD <- doesDirectoryExist pth
if isD then scn pth
else pure [pth | takeExtension pth == ".hs"]mkEntry bundles a Doc with its parsed tag list and page id, forming the tuple used throughout the rendering stage.
mkEntry :: Doc -> (Doc, [Text], Text)
mkEntry doc = (doc, tags (metTag (docMet doc)), pageId (docPth doc))processDoc is the per-file pipeline: extract regions, parse metadata, parse content, highlight.
processDoc :: FilePath -> IO Doc
processDoc pth = do
raw <- TIO.readFile pth
let lns = T.lines raw
rgns = extLns lns
(txt, rest) <- case rgns of
(RCom t _ _ : rs) -> pure (t, rs)
_ -> ioError (userError (pth <> ": no metadata block"))
m <- case met txt of
Left err -> ioError (userError err)
Right m -> pure m
let (blks, ots) = mrk rest
blks' <- hlt pth blks
pure Doc { docMet = m, docBlk = blks', docOut = ots, docPth = pth }buildTagMap groups documents by tag, building a map from tag name to Tag.
buildTagMap :: [(Doc, [Text], Text)] -> Map Text Tag
buildTagMap = foldl go Map.empty
where
go acc (doc, tgs, _) = foldl (addTag doc) acc tgs
addTag doc acc t =
Map.alter (\case
Nothing -> Just (Tag t [doc])
Just tg -> Just (tg { tagDoc = tagDoc tg ++ [doc] })
) t accwriteTags generates one HTML page per tag plus a tag listing page (skipped here per the plan).
writeTags :: FilePath -> [(Doc, [Text], Text)] -> IO ()
writeTags out entries = do
let tgm = buildTagMap entries
tgs = Map.elems tgm
mapM_ (\tg ->
TIO.writeFile (out </> "tags" </> T.unpack (tagNam tg) <> ".html")
(tagPage tg (map mkEntry (tagDoc tg)))
) tgsbld is the main entry point. It scans the source directory, processes each file, sorts the results by publication date (newest first), writes all output files, and reports progress to stdout.
bld :: FilePath -> FilePath -> Text -> IO ()
bld src out url = do
putStrLn "scanning source files"
fps <- scn src
when (null fps) $ error "no .hs source files found"
putStrLn $ "found " <> show (length fps) <> " files"
putStrLn "processing files"
docs <- mapM processDoc fps
let sorted = sortBy (comparing (Down . metPub . docMet)) docs
entries = map mkEntry sorted
putStrLn "writing output"
createDirectoryIfMissing True (out </> "blogs")
createDirectoryIfMissing True (out </> "tags")
createDirectoryIfMissing True (out </> "styles")
mapM_ (\(doc, tgs, pid) ->
TIO.writeFile (out </> "blogs" </> T.unpack pid <> ".html") (blogPage doc tgs)
) entries
TIO.writeFile (out </> "index.html") (homePage entries)
writeTags out entries
TIO.writeFile (out </> "feed.xml") (rss url entries)
putStrLn "done"Sample documents
To test the pipeline, we define three complete stoa-haskell documents as Text values. Each has a metadata block, content headings (levels 1-3), paragraphs, lists, and code sections. The code sections are trivial Haskell so they parse cleanly as Cod blocks.
sampleDoc1 :: Text
sampleDoc1 = T.unlines
[ "-- title = A small parsing exercise"
, "-- pubDate = 2026-01-15"
, "-- tags = haskell, parsing"
, "-- description = Walking through a small parser for comma-separated values in Haskell."
, ""
, "import Data.Text (Text)"
, "import qualified Data.Text as T"
, ""
, "-- | Parsing CSV"
, "--"
, "-- A comma-separated value file is one of the simplest formats to parse. Each row is"
, "-- a newline-terminated sequence of fields separated by a comma."
, ""
, "-- || The field parser"
, "--"
, "-- We build from the bottom up. A field is any text without a comma."
, "-- - Split on commas to obtain the fields."
, "-- - Trim leading and trailing whitespace from each one."
, ""
, "parseRow :: Text -> [Text]"
, "parseRow = map T.strip . T.splitOn \",\""
, ""
, "-- || Testing"
, "--"
, "-- A quick sanity check confirms the parser handles basic input."
, ""
, "testParse :: Bool"
, "testParse = parseRow \"alpha, beta, gamma\" == [\"alpha\", \"beta\", \"gamma\"]"
]
sampleDoc2 :: Text
sampleDoc2 = T.unlines
[ "-- title = Minimal typeclasses"
, "-- pubDate = 2026-02-10"
, "-- tags = haskell, design"
, "-- description = How typeclasses replace explicit dispatch tables."
, ""
, "-- | What is a typeclass"
, "--"
, "-- A typeclass in Haskell is an interface for overloading operations across types. Unlike"
, "-- object-oriented interfaces, instances are declared outside the type definition, which"
, "-- cleanly decouples the two."
, ""
, "-- || Defining a typeclass"
, "--"
, "-- A simple typeclass for pretty-printing illustrates the structure."
, "-- - The class declares a single method `pp`."
, "-- - Instances provide the implementation."
, "-- - Any type has at most one canonical instance."
, ""
, "class Pretty a where"
, " pp :: a -> String"
, ""
, "-- || Instances"
, "--"
, "-- Two simple instances show how dispatch resolves at compile time."
, ""
, "instance Pretty Int where"
, " pp n = \"Int: \" ++ show n"
, ""
, "instance Pretty Bool where"
, " pp b = if b then \"yes\" else \"no\""
]
sampleDoc3 :: Text
sampleDoc3 = T.unlines
[ "-- title = On immutable data"
, "-- pubDate = 2026-02-28"
, "-- tags = haskell, design, 2026"
, "-- description = Persistent data structures eliminate a large class of bugs by forbidding mutation."
, ""
, "-- | Persistence by default"
, "--"
, "-- In Haskell, all data is immutable. A function that appears to modify a list"
, "-- actually produces a new one, sharing the unchanged tail with the original."
, ""
, "-- || What sharing looks like"
, "--"
, "-- Consider a simple prepend operation on a linked list."
, "-- - The new list points to the same tail as the old."
, "-- - The old list is unaffected and still accessible."
, "-- - No copying occurs: only a new head node is allocated."
, ""
, "prepend :: a -> [a] -> [a]"
, "prepend x xs = x : xs"
, ""
, "-- ||| A concrete example"
, "--"
, "-- With persistence, both `xs` and `ys` coexist in memory without conflict."
, ""
, "example :: ([Int], [Int])"
, "example ="
, " let xs = [1, 2, 3]"
, " ys = prepend 0 xs"
, " in (xs, ys)"
]Testing the anatomy pipeline
docPure runs the full pipeline in pure mode. It uses hltDry in place of hlt, resolving {{n}} placeholders directly from the source lines rather than calling tree-sitter. This makes the test self-contained and runnable without any external tools.
docPure :: FilePath -> Text -> Either String Doc
docPure pth raw = do
let lns = T.lines raw
rgns = extLns lns
(txt, rest) <- case rgns of
(RCom t _ _ : rs) -> Right (t, rs)
_ -> Left (pth ++ ": first region is not a comment")
m <- met txt
let (blks, ots) = mrk rest
blks' = hltDry lns blks
pure (Doc m blks' ots pth)testAnatomy processes all three sample documents and prints the key results: each document's metadata and outline, a truncated blog page for inspection, the homepage listing, and the RSS feed.
testAnatomy :: IO ()
testAnatomy = do
let samples = [ ("sample1.hs", sampleDoc1)
, ("sample2.hs", sampleDoc2)
, ("sample3.hs", sampleDoc3)
]
docs <- mapM (\(pth, raw) -> case docPure pth raw of
Left err -> ioError (userError err)
Right d -> pure d) samples
putStrLn "=== Metadata ==="
mapM_ (\d -> do
let m = docMet d
putStrLn $ T.unpack (metTtl m)
++ " [" ++ T.unpack (metPub m) ++ "]"
++ " tags: " ++ T.unpack (metTag m)
) docs
putStrLn "\n=== Outlines ==="
mapM_ (\d -> do
putStrLn $ "-- " ++ T.unpack (metTtl (docMet d))
mapM_ (\o -> putStrLn $ replicate (outLvl o * 2) ' ' ++ T.unpack (outTxt o)) (docOut d)
) docs
let entries = map mkEntry docs
putStrLn "\n=== Blog Page (sample1, truncated) ==="
case docs of
(d : _) -> TIO.putStrLn (T.take 400 (blogPage d (tags (metTag (docMet d))))) >> putStrLn "..."
[] -> pure ()
putStrLn "\n=== Home Page (truncated) ==="
TIO.putStrLn (T.take 400 (homePage entries))
putStrLn "..."
putStrLn "\n=== RSS Feed (truncated) ==="
TIO.putStrLn (T.take 400 (rss "https://example.com" entries))
putStrLn "..."λ> testAnatomy
=== Metadata ===
A small parsing exercise [2026-01-15] tags: haskell, parsing
Minimal typeclasses [2026-02-10] tags: haskell, design
On immutable data [2026-02-28] tags: haskell, design, 2026
=== Outlines ===
-- A small parsing exercise
Parsing CSV
The field parser
Testing
-- Minimal typeclasses
What is a typeclass
Defining a typeclass
Instances
-- On immutable data
Persistence by default
What sharing looks like
A concrete example
=== Blog Page (sample1, truncated) ===
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>A small parsing exercise</title>
<link rel="stylesheet" href="../styles/prima.css">
</head>
<body>
<aside class="outline-sidebar">
<nav class="outline">
<a href="#parsing-csv" class="h1">Parsing CSV</a>
<a href="#the-field-parser" class="h2">The field parser</a>
<a href="#testing" class="h2">Testing</a>
</nav>
</aside>
<main class="main-content">
<h1>A small parsing exercise</h1>
...
=== Home Page (truncated) ===
<!DOCTYPE html>
<html lang="en">
...
<h2><a href="blogs/sample1.html">A small parsing exercise</a></h2>
<div class="date">2026-01-15</div>
...
=== RSS Feed (truncated) ===
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>blog</title>
<link>https://example.com</link>
<item>
<title>A small parsing exercise</title>
<link>https://example.com/blogs/sample1.html</link>
<pubDate>2026-01-15</pubDate>
<description><![CDATA[Walking through a small parser...]]></description>
</item>
...Final step: CLI with optparse-applicative
optparse-applicative provides a declarative interface for building command-line argument parsers. The site tool takes three positional arguments: the source directory of stoa-haskell files, the output directory, and the base URL of the published site.
data Opt = Opt
{ optSrc :: FilePath
, optOut :: FilePath
, optUrl :: String
}
opt :: ParserInfo Opt
opt = info (helper <*> prs) (fullDesc <> header "site - static site generator")
where
prs = Opt
<$> argument str (metavar "SRC_DIR" <> help "source directory")
<*> argument str (metavar "OUT_DIR" <> help "output directory")
<*> argument str (metavar "BASE_URL" <> help "base URL for the site")
main :: IO ()
main = do
o <- execParser opt
bld (optSrc o) (optOut o) (T.pack (optUrl o))Reflexions
Eleven months is a long time to spend on an SSG. The honest accounting is: one month prototyping στοά in Nim, two months getting a first blog published, one month with S-expressions, three months rebuilding as literate Haskell, four months stabilizing. A weekend with mdbook would have produced a working blog back in april.2025. I chose not to do that because I wanted the writing environment to be mine in every detail.
The three iterations look wasteful in retrospect but yet still, each one clarified something. στοά proved that custom delimiters and zero-backtracking parsers were viable and fast. The S-expression experiment proved that structural purity and authoring fluency pull in opposite directions. Literate programming resolved both: the parser stays simple (isolated comments, code gaps), the writing stays fluid (plain comment syntax), and the code stays real (the compiler sees it directly). None of these conclusions were available before building and discarding the earlier versions.
I do not expect anyone else to use ἀγορά. The warning at the top of this essay is genuine. This tool encodes my preferences so precisely that it would be actively confusing for someone who doesn't share them. Whether the eleven months were worth it depends entirely on whether the writing produced in this environment turns out to be good.