Parsing and evaluating mathematical expressions in Swift / Part 1: Tokenization

Ever needed to interpret mathematical expressions with variables, like a.field1 + (a.field2 - b.field1) * 2, in Swift? I did. This series of blog posts will walk you through my solution. This is part 1 of the series:

  1. Tokenization
  2. Building a syntax tree
  3. Evaluating the syntax tree

Background

I needed to handle math expressions with a solution that would allow me to parse them in one place and later evaluate the parsed expressions in another place. The second place had to be Swift and the first place could be Swift, so all-Swift it is.

I must lead with a disclaimer: I have no background in interpreters or compilers, so it's possible I'm doing something very unoptimal here. But it does seem to work.

Tokenization with parser combinators

First thing you have to do is break the input expression into tokens. Tokens in the case of the example expression above are a.field1, +, (, a.field2, -, b.field1, ), *, and 2.

A nice tool for tokenization is parser combinators. They're a parsing technique from the functional programming world, focusing on small, composable parser functions. For an in-depth introduction, I heartily recommend Point-Free's multi-episode series on them. Some of my code is cribbed straight from their examples, and I'm leaving out some building blocks they had.

Parser combinators consist of two things: simple parser functions that extract data, and other functions that combine those parsers into larger parsers.

A parser function takes in an input and consumes anything at the start of the input it's interested in. It may return the prefix it consumed or just swallow it, and it'll always return rest of the input it didn't care about. A parser that parsed integers would, given the input of "123abc" return 123 and "abc". If you fed it only the "abc", you'd get back nil and still the same "abc".

In fully generic Swift we'd write the type as (Input) -> (Output?, Input). We can be a bit more specific here: first of all, we're dealing with strings and can optimize things by using the Substring type. And instead of accepting the Substring and returning it, we can use an inout parameter. Our type will look like this: (inout Substring) -> Output?.

To make the functions easier to deal with, it's useful to wrap them in a struct. Like this:

public struct Parser<A> {
    public let run: (inout Substring) -> A?
}

A parser that returns the first character of input looks like this:

let char = Parser<Character> { str in
    guard !str.isEmpty else { return nil }
    return str.removeFirst()
}

You can imagine plenty of other, similar parsers: ones that parse a fixed string, ones that parse integers, ones that parse decimal numbers, ones that parse dates, etc. The other half of the parser combinator story is the combinators, the thing you use to glue parsers together. The most fundamental ones are things like map (convert the return value to something else), flatMap (convert the return value to another parser) and zip (run multiple parsers in sequence):

extension Parser {
    func map<B>(_ f: @escaping (A) -> B) -> Parser<B> {
        Parser<B> { str -> B? in
            self.run(&str).map(f)
        }
    }

    func flatMap<B>(_ f: @escaping (A) -> Parser<B>) -> Parser<B> {
        Parser<B> { str -> B? in
            let original = str
            let parserB = self.run(&str).map(f)
            guard let matchB = parserB?.run(&str) else {
                str = original
                return nil
            }
            return matchB
        }
    }
}

func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> {
    Parser<(A, B)> { str -> (A, B)? in
        let original = str
        guard let matchA = a.run(&str) else { return nil }
        guard let matchB = b.run(&str) else {
            str = original
            return nil
        }
        return (matchA, matchB)
    }
}

There's a lot you can build on those. I won't reproduce my complete hierarchy of parsers here, but let's take a closer look at parsing the dotted names. I don't need to get too ambitious with the language; just ASCII and some pretty basic rules.

One additional thing I want to do is break apart those dotted names: they represent hierarchical name spaces so it makes sense to deal with an array of name components instead of one long string.

Let's start with a couple of rules, one for the first character of the name and another for the rest of it:

func isValidNameStart(_ c: Character) -> Bool { c.isASCII && (c.isLetter || c == "_") }
func isValidNameContinuation(_ c: Character) -> Bool { isValidNameStart(c) || (c.isASCII && c.isNumber) }

Then let's define a parser for the first character. Use the char parser defined earlier and flatMap the result with a function that checks if the character from the char parser is valid and if so, return a Parser that returns the character, otherwise return a Parser that returns nil:

let nameStart: Parser<Character> = Parsers.char.flatMap {
    isValidNameStart($0) ? Parsers.always($0) : Parsers.failing()
}

Next define a parser for the rest of the name. Use a parser called prefix that eats up 0–n first characters that satisfy a test:

let nameContinuation: Parser<Substring> = Parsers.prefix(while: isValidNameContinuation)

Then declare combinations: a namePart is nameStart followed by nameContinuation, a dottedNamePart is a period followed by a namePart, a nameTail is zero or more dottedNameParts, and a complete name is a namePart followed by a nameTail.

let namePart: Parser<String> = zip(nameStart, nameContinuation).map { "\($0)\($1)" }
let dottedNamePart: Parser<String> = zip(Parsers.literal("."), namePart).map { $1 }
let nameTail: Parser<[String]> = Parsers.zeroOrMore(dottedNamePart, separatedBy: Parsers.always(()))

let namePartsParser: Parser<[String]> = zip(namePart, nameTail)
    .map { [$0] + $1 }

Then just declare all the things we support and join them up with a couple of other combinators left as an excercise to the reader, oneOf and zeroOrMore.

let numberParser: Parser<ExpressionToken> = Parsers.double.map(ExpressionToken.number)
let additionParser: Parser<ExpressionToken> = Parsers.literal("+").map { ExpressionToken.addition }
let subtractionParser: Parser<ExpressionToken> = Parsers.literal("-").map { ExpressionToken.subtraction }
let multiplicationParser: Parser<ExpressionToken> = Parsers.literal("*").map { ExpressionToken.multiplication }
let divisionParser: Parser<ExpressionToken> = Parsers.literal("/").map { ExpressionToken.division }
let openParenParser: Parser<ExpressionToken> = Parsers.literal("(").map { ExpressionToken.openParen }
let closeParenParser: Parser<ExpressionToken> = Parsers.literal(")").map { ExpressionToken.closeParen }
let spaceParser: Parser<ExpressionToken> = Parsers.oneOrMoreSpaces.map { ExpressionToken.space }

let nameParser = namePartsParser.map(ExpressionToken.name)

let tokenParser = Parsers.oneOf([
    numberParser,
    additionParser,
    subtractionParser,
    multiplicationParser,
    divisionParser,
    openParenParser,
    closeParenParser,
    nameParser,
    spaceParser,
])

let tokensParser = Parsers.zeroOrMore(tokenParser, separatedBy: Parsers.always(()))

tokensParser is the final product. Run it on an expression and you'll get the a list of tokens out, like this:

[
    name(["a", "field1"]), 
    space, 
    addition, 
    space, 
    openParen, 
    name(["a", "field2"]), 
    space, 
    subtraction, 
    space, 
    name(["b", "field1"]), 
    closeParen, 
    space, 
    multiplication, 
    space, 
    number(2.0)
]

All right, we're getting somewhere with this. Next up: how to mangle that list into a shape that when evaluated gives us expected results? We'll look into that it part 2.

Looser dependencies with Swift

What kind of types do you use to manage dependencies between objects in Swift? How do you keep your objects that depend on others testable? How do you prevent an addition of a method from causing changes to test code? Couplings should be loose, but how to achieve that?

Assumptions

I’m making a few assumptions in this article:

  1. Your software has semi-permanent pieces — let’s call them services — that other parts of your software depend on. Model, networking, database layers, etc.
  2. Your services aren’t laser-focused, perfectly factored little gems that each provide just the right set of methods that everyone using them requires.
  3. You want to test the code that uses those services.

Everyone gets an interface

If you spent time in the enterprisey Java mines of the 90s and 00s like I did, you probably learned that your classes should have separate interfaces. And on the projects I saw, more often than not, there was nothing abstract about the interfaces: it was one interface per class, because most of them didn’t describe behavior, they just were there so it was possible to substitute them with dummies/mocks/fakes/whatevers¹ when writing tests.

In Objective-C land, it’s possible to substitute a class with another because it’s not particularly strongly typed, but that same paradigm has been pretty popular there too, with the difference that it’s called a protocol, not an interface. And the same thing in Swift.

So you have a ProfileService, and it implements ProfileServiceInterface, with the exact same set of public methods. You can agonize over whether it should be called ProfileServiceInterface or just ProfileService and have the implementation carry a more awkward name, if you want to pretend there’s abstraction happening here. Whatever makes you feel good.

struct Profile {
    let name: String
}

protocol ProfileServiceProtocol {
    func readLoggedInProfile() -> Profile
}

class ProfileService: ProfileServiceProtocol {
    func readLoggedInProfile() -> Profile {
        fatalError()
    }
}


class ProfileViewModel {
    private let profileService: ProfileServiceProtocol

    init(profileService: ProfileServiceProtocol) {
        self.profileService = profileService
    }
}

Square pegs

There’s a bunch of problems with the approach I described above, including:

  1. It’s arguably a misuse of protocols. Protocols are well suited to things where you have different behaviors hiding behind similar interfaces — collection type hierarchies being a favorite example — and in Swift’s case about code reuse. You aren’t describing behavior here, you’re just copying the implementation declarations.
  2. It cements a supply side view of the interface. Services evolve and rarely manage to stay tightly focused over their lifetimes. Once your interface has more than one method and more than one user, it’s almost certain some users are only going to care about a subset.
  3. Adding things to the protocol causes all implementations of that protocol — of which there can be several in tests — to change.

While we ignore that first point, Swift gives us a way to address the other issues: you can declare conformance to protocols outside the implementation. For dependencies, this means that the units that need profiles can each define their own protocol and declare that ProfileService implements it, moving from a supply side definition to a demand side one. Like this:

class ProfileViewModel {
    private let profileService: ProfileViewModelProfileServiceAPI

    init(profileService: ProfileViewModelProfileServiceAPI) {
        self.profileService = profileService
    }
}

protocol ProfileViewModelProfileServiceAPI {
    func readLoggedInProfile() -> Profile
}

extension ProfileService: ProfileViewModelProfileServiceAPI {}

And voilà: as long as the service implements the method described in the protocol, you’re done. You don’t need to worry about implementing the rest of it in tests and you can even use that protocol to evolve ProfileViewModel at a different pace from ProfileService.

Less protocol, more action

Really the above, with the depender defining the protocol of its dependency, is not too bad.

There’s a few small downsides:

  1. Protocols are top level objects in Swift, at least for now.
  2. Having a protocol means you have to have a type that implements it. That’s fine with the “just protocolify a concrete service” case, but can be a hassle in tests.
  3. Not usually an issue in this case, but protocols can be treacherous in that their behavior radically changes if you decide an associated type would be appropriate.

None of these are deal breakers, but there’s a way I find to be nicer: instead of defining a protocol for your dependency, just define a struct that carries closures. Swift's structs are both syntactically and performance-wise extremely lightweight and perfectly suited to the task of providing an insulation layer.

class ProfileViewModel {
    struct ProfileServiceAPI {
        let readLoggedInProfile: () -> Profile

        init(readLoggedInProfile: @escaping () -> Profile) {
            self.readLoggedInProfile = readLoggedInProfile
        }
    }

    private let profileService: ProfileServiceAPI

    init(profileService: ProfileServiceAPI) {
        self.profileService = profileService
    }
}

extension ProfileViewModel.ProfileServiceAPI {
    init(profileService: ProfileService) {
        self.init(
            readLoggedInProfile: profileService.readLoggedInProfile)
    }
}

Now you don’t have an extra top-level protocol and you don’t need a type to implement the protocol with, which can make the code a lot nicer to test: with protocols, more often than not, I find myself writing the equivalent of ProfileViewModel.ProfileServiceAPI in tests anyway, and have it implement the protocol. This cuts out that one extra step. And the struct retains basically all the benefits of the one protocol per depender approach, keeping your dependencies loose and adaptable.

One downside can be that closures don’t have named parameters. If you’re wrapping an API with lots of those, it’s simple enough to add methods to your struct that restore the names:

class ProfileService {
    func readLoggedInProfile() -> Profile {
        fatalError()
    }

    func readProfile(ofUser name: String) -> Profile? {
        fatalError()
    }
}

class OtherUserViewModel {
    struct ProfileServiceAPI {
        private let readUserProfile: (String) -> Profile?

        init(readUserProfile: @escaping (String) -> Profile?) {
            self.readUserProfile = readUserProfile
        }

        func readProfile(ofUser name: String) -> Profile? {
            return self.readUserProfile(name)
        }
    }

    private let profileService: ProfileServiceAPI

    init(profileService: ProfileServiceAPI) {
        self.profileService = profileService
    }
}

extension OtherUserViewModel.ProfileServiceAPI {
    init(profileService: ProfileService) {
        self.init(
            readUserProfile: profileService.readProfile(ofUser:))
    }
}

You could continue one step further: in some cases it may make sense to forego that struct completely and instead just inject the closures directly into the class using them. But having the struct there may make it easier to see afterwards where they're coming from.

Over the course of these changes we've made the view model depend only on the signatures of the functions it requires, regardless of who implements them and what else they implement. It's now easier to test and to evolve. None of this will prevent you from tying your codebase and yourself into knots with a badly designed dependency graph, but if should help at least locally.

Footnotes

  1. If you remember the difference between a dummy, a mock and a fake you’re better, or more thoroughly brainwashed, than me. Take your pick.

GraphQL with Swift

Know that thing where you start writing a tool for something, discover it needs (for some values of “need”) something slightly complicated, you decide to write a library for said complicated thing, then discover you don’t need the tool you were working on in the first place but decide to write the library anyway?

So I wrote GraphQLer (pronounced “graph quiller”), a library for generating GraphQL from Swift. It’s not the only Swift GraphQL library around, but I believe it’s far simpler — mostly by being far less ambitious — than the other solutions. Throwing in the kitchen sink seems to be popular in this space, forcing you to shape your code around the libraries. That may work out great in some situations, but it wasn’t what I wanted.

GraphQLer just defines a bunch of data types that model the things found in the spec, and functions for turning those data types into strings. Its only purpose is to make it easier to write syntactically valid GraphQL documents. It has no dependencies — it doesn’t even import Foundation — and you handle all networking and responses as you see fit.

It works with the Swift Package Manager. There’s also a Xcode workspace and a playground, but you need to run SPM to get an Xcode project. I'll add an Xcode project if I have to to make it work with Carthage, but the WWDC 2019 keynote is tomorrow and I'm hoping there will be surprises related to SPM.

PS. A shout out to jazzy and GitHub Pages: it's ridiculous how easy it is to publish documentation for a Swift project these days.

© Juri Pakaste 2023