Nomsu Under the Hood
This article describes the process that a Nomsu program goes through.
From source code to execution, Nomsu’s pipeline looks like:
Nomsu source code (.nom file):
$x == 10):
if ("hello" say
Syntax Tree:
-
Action:
if () ()
-
Action:
() == ()
-
Variable:
$x
-
Number:
10
-
Variable:
-
Block:
-
Action:
say ()
-
Text:
hello
-
Text:
-
Action:
-
Action:
…which is translated to Lua code:
if x == 10 then
("hello")
sayend
…which is executed in the Lua virtual machine:
hello
Phase 1: Parsing
The first job of a compiler is to figure out the structure of the code in a
text file. Understanding that 10
means a number, "hello"
means
a string, and so forth. The output of this phase is a syntax tree–a hierarchical
representation of the code’s structure. Many programming languages take a
multi-step approach to parsing, breaking the process down into different phases
(scanning, tokenizing, lexing, and so on). Nomsu eschews that complexity and
directly parses the source text into a syntax tree in one pass using LPEG (Lua Parsing Expression Grammar). This is the same
approach used by Moonscript, which was one of Nomsu’s inspirations.
In a Parsing Expression Grammar (PEG), you define a set of rules for parsing, such as:
<- (%nl / " " / action / expression)*
file
<- (
action " "* word)* ("!" / (" "* (expression / word))+))
(word (/ (expression (" "* (expression / word))+)
)
<- [a-zA-Z]+
word
<- (
expression / variable / ("(" " "* (action / expression) " "* ")")
number
)
<- [0-9]+
number
<- "$" ("(" (" "* word)+ " "* ")" / word)? variable
PEGs have a lot of similarity to Regular Expressions and context-free grammars defined in Backus-Naur form, but there are some key differences.
Unlike other grammar definitions, Parsing Expression Grammars have very deterministically defined behavior that defines exactly when the parser will backtrack. This limited backtracking might seem very limiting, but it acutally allows PEGs to guarantee that arbitrarily long strings can be parsed in linear time, using linear space, while still being extremely expressive.
Nomsu’s full Parsing Expression Grammar can be viewed in the source code repository, and the whole language’s definition fits into only around 230 lines of PEG definition (using the format for LPEG’s re module) and the parser is defined in only 63 lines of Moonscript helper code in parser.moon. For comparison, GCC’s C parser is written in nearly 20,000 lines of C code.
Phase 2: Compiling
The Nomsu compiler has some rather straightforward code takes a syntax tree
and outputs corresponding Lua code. For example, a Nomsu list like [1, 2, $(my var)]
would be compiled to the Lua code List{1, 2, my_var}
.
The rules for compiling things like numbers and variables are fairly
straightforward: numbers are the same, variables need to be modified to become
valid Lua identiifers, so spaces are replaced with underscores, unicode
characters with escape sequences, and so on.
The place where Nomsu’s compiler really shines, is when actions are
compiled. “Action” is a broad term encompassing function calls, control flow
statements, math operations, macros, and that sort of thing–essentially bits of
code that do interesting stuff, as opposed to just literal values like
numbers. Nomsu maintains a table of actions that have custom compilation rules.
If a custom compilation rule exists for an action, the compiler will hand the
action’s syntax tree over to the custom rule, and use whatever output it
produces. Because Nomsu’s mixfix syntax is very flexible and expressive, control
flow is defined using this type of custom compilation rules. This means that the
parser can be very dumb, and not know anything about keywords or conditionals or
control flow. For example, the if
conditional is defined in core/control_flow.nom as:
$condition $body) compiles to ("
(if if \($condition as lua expr) then
\($body as lua)
end
")
And once that is defined, Nomsu code like:
$foo:
if "it works!" say
will work! Any actions that don’t have custom compilation rules will simply
be translated to Lua function calls. For instance, $x as lua
does not have
a custom compilation rule, it is just a function call to _1_as_lua(x)
(a library function that compiles syntax trees into Lua).
Phase 3: Executing
There’s not too much to say about what happens after the Lua code is generated. Lua provides a very useful function called load(), which takes a string of Lua code and returns a function that, when called, will execute it. It also allows you to provide an environment to run the code in, which Nomsu uses for module namespacing.
Lua’s ability to run arbitrary Lua code on the fly is a huge advantage of the language, but Nomsu files can also be precompiled into Lua ahead of time, which allows them to run a little faster.
Metaprogramming
You might have noticed in the example above, that the custom rule for
compiling if
is written in Nomsu
code. When a language allows you to write code describing how things should
compile, it’s known as metaprogramming. Nomsu has a number of extremely powerful
metaprogramming tools built in. The lowest-level version is the $x compiles to $y
action, which essentially lets you define a function that takes a syntax tree
and returns Lua code. The example above used a very simple form of text
substitution, which is similar to how C macros work, however it is possible to
do arbitrarily complex logic and error-checking from within a custom compilation
rule. For example:
$n times $body) compiles to:
(unroll $lua = (Lua "")
$n.type != "Number"):
if ($n ("
compile error at 'unroll' requires a literal number
")
$n.1 <= 1):
if ($n ("
compile error at 'unroll' must be given a number greater than 1
")
$i in (1 to $n.1):
for $i > 1):
if ($lua, add "\n"
$lua, add ($body as lua)
$lua return
After defining the above rule, it can be used as follows:
$x = 1
3 times:
unroll $x
say $x += 1
…which will compile to the Lua code:
= 1
x
say(x)= x + 1
x
say(x)= x + 1
x
say(x)= x + 1 x
At a slightly higher level of abstraction, Nomsu also provides a way to define custom compilation rules in terms of pre-existing Nomsu structures:
$x is okay) parses as
(make sure $tmp = ($x * $x)
$tmp > 100):
if ("\$x is not okay!" fail
This form is easier to use for simple cases, and comes with some handy
guarantees: $x parses as $y
produces a hygienic macro, so in this example, $tmp
will actually be
compiled to a unique variable that won’t collide with any other variables called
$tmp
that are
already in use elsewhere.
Operators and Math
You’ve already seen some examples of how actions with words like if
and else
can be compiled into control flow
statements, but the same logic also applies to symbols. Nomsu’s parser is
perfectly happy with an action like 1 != 2
,
so it’s easy to define a rule that compiles Nomsu’s $x != $y
into Lua’s idiomatic x ~= y
. The same is true
of variable assignment like $x = 5
and math operations like $x + $y
.
The only hitch is that Nomsu requires you to use parentheses to explicitly
group actions. So if all math operations were defined this way, a simple
expression like 1 + 2*3 + 4
would have to be written as ((1 + (2 * 3)) + 4
,
which is overly cumbersome. So, the Nomsu compiler makes a special concession
for math, and if no other custom rules are found, and an action looks like a
bunch of math operators, it will use a custom math rule that just compiles the
math expression in a reasonable way.
The API
But the best way to learn the basics is by running the tutorial:
nomsu -t tutorial
Or you can look at the overview, which lists a bunch of common use cases.
Closing Remarks
The Nomsu core is mostly built up using the tools discussed here, $x compiles to $y
to define the very basic language primitives, and $x parses as $y
to combine those primitives in useful and ergonomic ways. Because of this
bootstrapping approach, a very small number of predefined compile rules, and
just a dash of inlined Lua code, most of Nomsu’s functionality is actually
written in Nomsu code. The Nomsu compiler exposes all of this to the users, so
it’s just as easy to write a 4-line extension to the language as it is to write
a function in other programming languages.