# Parsing and evaluating mathematical expressions in Swift / Part 1: Tokenization

Ever needed to interpret mathematical expressions with variables, like `a.field1 + (a.field2 - b.field1) * 2`, in Swift? I did. This series of blog posts will walk you through my solution. This is part 1 of the series:

1. Tokenization
2. Building a syntax tree
3. Evaluating the syntax tree

## Background

I needed to handle math expressions with a solution that would allow me to parse them in one place and later evaluate the parsed expressions in another place. The second place had to be Swift and the first place could be Swift, so all-Swift it is.

I must lead with a disclaimer: I have no background in interpreters or compilers, so it's possible I'm doing something very unoptimal here. But it does seem to work.

## Tokenization with parser combinators

First thing you have to do is break the input expression into tokens. Tokens in the case of the example expression above are `a.field1`, `+`, `(`, `a.field2`, `-`, `b.field1`, `)`, `*`, and `2`.

A nice tool for tokenization is parser combinators. They're a parsing technique from the functional programming world, focusing on small, composable parser functions. For an in-depth introduction, I heartily recommend Point-Free's multi-episode series on them. Some of my code is cribbed straight from their examples, and I'm leaving out some building blocks they had.

Parser combinators consist of two things: simple parser functions that extract data, and other functions that combine those parsers into larger parsers.

A parser function takes in an input and consumes anything at the start of the input it's interested in. It may return the prefix it consumed or just swallow it, and it'll always return rest of the input it didn't care about. A parser that parsed integers would, given the input of `"123abc"` return `123` and `"abc"`. If you fed it only the `"abc"`, you'd get back `nil` and still the same `"abc"`.

In fully generic Swift we'd write the type as `(Input) -> (Output?, Input)`. We can be a bit more specific here: first of all, we're dealing with strings and can optimize things by using the `Substring` type. And instead of accepting the `Substring` and returning it, we can use an `inout` parameter. Our type will look like this: `(inout Substring) -> Output?`.

To make the functions easier to deal with, it's useful to wrap them in a struct. Like this:

``````public struct Parser<A> {
public let run: (inout Substring) -> A?
}
``````

A parser that returns the first character of input looks like this:

``````let char = Parser<Character> { str in
guard !str.isEmpty else { return nil }
return str.removeFirst()
}
``````

You can imagine plenty of other, similar parsers: ones that parse a fixed string, ones that parse integers, ones that parse decimal numbers, ones that parse dates, etc. The other half of the parser combinator story is the combinators, the thing you use to glue parsers together. The most fundamental ones are things like `map` (convert the return value to something else), `flatMap` (convert the return value to another parser) and `zip` (run multiple parsers in sequence):

``````extension Parser {
func map<B>(_ f: @escaping (A) -> B) -> Parser<B> {
Parser<B> { str -> B? in
self.run(&str).map(f)
}
}

func flatMap<B>(_ f: @escaping (A) -> Parser<B>) -> Parser<B> {
Parser<B> { str -> B? in
let original = str
let parserB = self.run(&str).map(f)
guard let matchB = parserB?.run(&str) else {
str = original
return nil
}
return matchB
}
}
}

func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> {
Parser<(A, B)> { str -> (A, B)? in
let original = str
guard let matchA = a.run(&str) else { return nil }
guard let matchB = b.run(&str) else {
str = original
return nil
}
return (matchA, matchB)
}
}
``````

There's a lot you can build on those. I won't reproduce my complete hierarchy of parsers here, but let's take a closer look at parsing the dotted names. I don't need to get too ambitious with the language; just ASCII and some pretty basic rules.

One additional thing I want to do is break apart those dotted names: they represent hierarchical name spaces so it makes sense to deal with an array of name components instead of one long string.

Let's start with a couple of rules, one for the first character of the name and another for the rest of it:

``````func isValidNameStart(_ c: Character) -> Bool { c.isASCII && (c.isLetter || c == "_") }
func isValidNameContinuation(_ c: Character) -> Bool { isValidNameStart(c) || (c.isASCII && c.isNumber) }
``````

Then let's define a parser for the first character. Use the `char` parser defined earlier and `flatMap` the result with a function that checks if the character from the `char` parser is valid and if so, return a `Parser` that returns the character, otherwise return a `Parser` that returns `nil`:

``````let nameStart: Parser<Character> = Parsers.char.flatMap {
isValidNameStart(\$0) ? Parsers.always(\$0) : Parsers.failing()
}
``````

Next define a parser for the rest of the name. Use a parser called `prefix` that eats up 0–n first characters that satisfy a test:

``````let nameContinuation: Parser<Substring> = Parsers.prefix(while: isValidNameContinuation)
``````

Then declare combinations: a `namePart` is `nameStart` followed by `nameContinuation`, a `dottedNamePart` is a period followed by a `namePart`, a `nameTail` is zero or more `dottedNamePart`s, and a complete name is a `namePart` followed by a `nameTail`.

``````let namePart: Parser<String> = zip(nameStart, nameContinuation).map { "\(\$0)\(\$1)" }
let dottedNamePart: Parser<String> = zip(Parsers.literal("."), namePart).map { \$1 }
let nameTail: Parser<[String]> = Parsers.zeroOrMore(dottedNamePart, separatedBy: Parsers.always(()))

let namePartsParser: Parser<[String]> = zip(namePart, nameTail)
.map { [\$0] + \$1 }
``````

Then just declare all the things we support and join them up with a couple of other combinators left as an excercise to the reader, `oneOf` and `zeroOrMore`.

``````let numberParser: Parser<ExpressionToken> = Parsers.double.map(ExpressionToken.number)
let subtractionParser: Parser<ExpressionToken> = Parsers.literal("-").map { ExpressionToken.subtraction }
let multiplicationParser: Parser<ExpressionToken> = Parsers.literal("*").map { ExpressionToken.multiplication }
let divisionParser: Parser<ExpressionToken> = Parsers.literal("/").map { ExpressionToken.division }
let openParenParser: Parser<ExpressionToken> = Parsers.literal("(").map { ExpressionToken.openParen }
let closeParenParser: Parser<ExpressionToken> = Parsers.literal(")").map { ExpressionToken.closeParen }
let spaceParser: Parser<ExpressionToken> = Parsers.oneOrMoreSpaces.map { ExpressionToken.space }

let nameParser = namePartsParser.map(ExpressionToken.name)

let tokenParser = Parsers.oneOf([
numberParser,
subtractionParser,
multiplicationParser,
divisionParser,
openParenParser,
closeParenParser,
nameParser,
spaceParser,
])

let tokensParser = Parsers.zeroOrMore(tokenParser, separatedBy: Parsers.always(()))
``````

`tokensParser` is the final product. Run it on an expression and you'll get the a list of tokens out, like this:

``````[
name(["a", "field1"]),
space,
space,
openParen,
name(["a", "field2"]),
space,
subtraction,
space,
name(["b", "field1"]),
closeParen,
space,
multiplication,
space,
number(2.0)
]
``````

All right, we're getting somewhere with this. Next up: how to mangle that list into a shape that when evaluated gives us expected results? We'll look into that it part 2.