Nomsu Under the Hood

This article describes the process that a Nomsu program goes through.

From source code to execution, Nomsu’s pipeline looks like:

Nomsu source code (.nom file):

if ($x == 10):
    say "hello"

…is parsed into a syntax tree:

Syntax Tree:

Action: if () ()
- Action: () == ()
  - Variable: $x
  - Number: 10
- Block:
  - Action: say ()
    - Text: hello

…which is translated to Lua code:

if x == 10 then
    say("hello")
end

…which is executed in the Lua virtual machine:
hello

Phase 1: Parsing

The first job of a compiler is to figure out the structure of the code in a text file. Understanding that 10 means a number, "hello" means a string, and so forth. The output of this phase is a syntax tree–a hierarchical representation of the code’s structure. Many programming languages take a multi-step approach to parsing, breaking the process down into different phases (scanning, tokenizing, lexing, and so on). Nomsu eschews that complexity and directly parses the source text into a syntax tree in one pass using LPEG (Lua Parsing Expression Grammar). This is the same approach used by Moonscript, which was one of Nomsu’s inspirations.

In a Parsing Expression Grammar (PEG), you define a set of rules for parsing, such as:

file <- (%nl / " " / action / expression)*

action <- (
    (word (" "* word)* ("!" / (" "* (expression / word))+))
    / (expression (" "* (expression / word))+)
)

word <- [a-zA-Z]+

expression <- (
    number / variable / ("(" " "* (action / expression) " "* ")")
)

number <- [0-9]+

variable <- "$" ("(" (" "* word)+ " "* ")" / word)?

PEGs have a lot of similarity to Regular Expressions and context-free grammars defined in Backus-Naur form, but there are some key differences.

Unlike other grammar definitions, Parsing Expression Grammars have very deterministically defined behavior that defines exactly when the parser will backtrack. This limited backtracking might seem very limiting, but it acutally allows PEGs to guarantee that arbitrarily long strings can be parsed in linear time, using linear space, while still being extremely expressive.

Nomsu’s full Parsing Expression Grammar can be viewed in the source code repository, and the whole language’s definition fits into only around 230 lines of PEG definition (using the format for LPEG’s re module) and the parser is defined in only 63 lines of Moonscript helper code in parser.moon. For comparison, GCC’s C parser is written in nearly 20,000 lines of C code.

Phase 2: Compiling

The Nomsu compiler has some rather straightforward code takes a syntax tree and outputs corresponding Lua code. For example, a Nomsu list like [1, 2, $(my var)] would be compiled to the Lua code List{1, 2, my_var}. The rules for compiling things like numbers and variables are fairly straightforward: numbers are the same, variables need to be modified to become valid Lua identiifers, so spaces are replaced with underscores, unicode characters with escape sequences, and so on.

The place where Nomsu’s compiler really shines, is when actions are compiled. “Action” is a broad term encompassing function calls, control flow statements, math operations, macros, and that sort of thing–essentially bits of code that do interesting stuff, as opposed to just literal values like numbers. Nomsu maintains a table of actions that have custom compilation rules. If a custom compilation rule exists for an action, the compiler will hand the action’s syntax tree over to the custom rule, and use whatever output it produces. Because Nomsu’s mixfix syntax is very flexible and expressive, control flow is defined using this type of custom compilation rules. This means that the parser can be very dumb, and not know anything about keywords or conditionals or control flow. For example, the if conditional is defined in core/control_flow.nom as:

(if $condition $body) compiles to ("
    if \($condition as lua expr) then
        \($body as lua)
    end
")

And once that is defined, Nomsu code like:

if $foo:
    say "it works!"

will work! Any actions that don’t have custom compilation rules will simply be translated to Lua function calls. For instance, $x as lua does not have a custom compilation rule, it is just a function call to _1_as_lua(x) (a library function that compiles syntax trees into Lua).

Phase 3: Executing

There’s not too much to say about what happens after the Lua code is generated. Lua provides a very useful function called load(), which takes a string of Lua code and returns a function that, when called, will execute it. It also allows you to provide an environment to run the code in, which Nomsu uses for module namespacing.

Lua’s ability to run arbitrary Lua code on the fly is a huge advantage of the language, but Nomsu files can also be precompiled into Lua ahead of time, which allows them to run a little faster.

Metaprogramming

You might have noticed in the example above, that the custom rule for compiling if is written in Nomsu code. When a language allows you to write code describing how things should compile, it’s known as metaprogramming. Nomsu has a number of extremely powerful metaprogramming tools built in. The lowest-level version is the $x compiles to $y action, which essentially lets you define a function that takes a syntax tree and returns Lua code. The example above used a very simple form of text substitution, which is similar to how C macros work, however it is possible to do arbitrarily complex logic and error-checking from within a custom compilation rule. For example:

(unroll $n times $body) compiles to:
    $lua = (Lua "")

    if ($n.type != "Number"):
        compile error at $n ("
            'unroll' requires a literal number
        ")

    if ($n.1 <= 1):
        compile error at $n ("
            'unroll' must be given a number greater than 1
        ")

    for $i in (1 to $n.1):
        if ($i > 1):
            $lua, add "\n"
        $lua, add ($body as lua)

    return $lua

After defining the above rule, it can be used as follows:

$x = 1
unroll 3 times:
    say $x
    $x += 1

…which will compile to the Lua code:

x = 1
say(x)
x = x + 1
say(x)
x = x + 1
say(x)
x = x + 1

At a slightly higher level of abstraction, Nomsu also provides a way to define custom compilation rules in terms of pre-existing Nomsu structures:

(make sure $x is okay) parses as
    $tmp = ($x * $x)
    if ($tmp > 100):
        fail "\$x is not okay!"

This form is easier to use for simple cases, and comes with some handy guarantees: $x parses as $y produces a hygienic macro, so in this example, $tmp will actually be compiled to a unique variable that won’t collide with any other variables called $tmp that are already in use elsewhere.

Operators and Math

You’ve already seen some examples of how actions with words like if and else can be compiled into control flow statements, but the same logic also applies to symbols. Nomsu’s parser is perfectly happy with an action like 1 != 2, so it’s easy to define a rule that compiles Nomsu’s $x != $y into Lua’s idiomatic x ~= y. The same is true of variable assignment like $x = 5 and math operations like $x + $y.

The only hitch is that Nomsu requires you to use parentheses to explicitly group actions. So if all math operations were defined this way, a simple expression like 1 + 2*3 + 4 would have to be written as ((1 + (2 * 3)) + 4, which is overly cumbersome. So, the Nomsu compiler makes a special concession for math, and if no other custom rules are found, and an action looks like a bunch of math operators, it will use a custom math rule that just compiles the math expression in a reasonable way.

The API

But the best way to learn the basics is by running the tutorial:

nomsu -t tutorial

Or you can look at the overview, which lists a bunch of common use cases.

Closing Remarks

The Nomsu core is mostly built up using the tools discussed here, $x compiles to $y to define the very basic language primitives, and $x parses as $y to combine those primitives in useful and ergonomic ways. Because of this bootstrapping approach, a very small number of predefined compile rules, and just a dash of inlined Lua code, most of Nomsu’s functionality is actually written in Nomsu code. The Nomsu compiler exposes all of this to the users, so it’s just as easy to write a 4-line extension to the language as it is to write a function in other programming languages.