Swift networking with AsyncHTTPClient

When you need to access resources over HTTP in Swift, in most cases the answer is URLSession from Foundation. On server side that's most probably not the right choice; there you are most likely running on SwiftNIO and you'll want something that integrates with it. On command line it's a toss up; on macOS URLSession is great, on other platforms… well, hope you don't run into any unimplemented corners.

I needed a command line tool that fetches a JSON file, parses it, downloads the files listed in the JSON file, and saves the JSON file too. I'm mostly on macOS so URLSession would have been fine, but I wanted to explore my options. SwiftNIO ships with a low-level HTTP client implementation, but that's not the right choice for a quick utility. The good news is there's also a higher-level implementation: AsyncHTTPClient. It's a lovely future based (while we wait for async/await in Swift) asynchronous HTTP client that makes this task a breeze.

The format of the JSON manifest file looks like this:

{
    "files": [
        {
            "file": "filename1",
            "bytes": 42,
            "sha256": "8e079926d7340822e6b4c501811af5d1edc47d796b97f56c1cbe3177b47d588b"
        },

        {
            "file": "filename2",
            "bytes": 10,
            "sha256": "4998ab8c155e03ebf843d4adf51d702b9560dc0fbafe35405c826d9a76460289"
        }
    ]
}

And so on. That's just a couple of structs:

struct ManifestFile: Codable {
    let file: String
}

struct Manifest: Codable {
    let files: [ManifestFile]
}

I'll just ignore bytes and sha256. They are relevant in other contexts, but here they don't matter.

Command line

Let's start by defining a command line interface. We'll use Swift Argument Parser:

import ArgumentParser
import Foundation

struct DownloadManifestFiles: ParsableCommand {
    static var configuration = CommandConfiguration(
        commandName: "manifestdl",
        abstract: "Download manifest files"
    )
    
    @Argument(help: "URL to the manifest")
    var url: String

    @Option(help: "Target directory, default to working directory")
    var directory: String = "."

    mutating func validate() throws {
        guard URL(string: self.url) != nil else {
            throw ValidationError("url \(self.url) was not valid")
        }
    }

    mutating func run() throws {
        guard let reallyURL = URL(string: self.url) else {
            throw ValidationError("URL \(self.url) was not valid")
        }
        let cwd = URL(fileURLWithPath: FileManager.default.currentDirectoryPath, isDirectory: true)
        let directory = URL(fileURLWithPath: self.directory, isDirectory: true, relativeTo: cwd)

        fatalError("TODO")
    }
}

DownloadManifestFiles.main()

Nice and easy. An interesting wrinkle is that ArgumentParser doesn't support URLs, probably because they're in Foundation and ArgumentParser uses only stdlib. And NSURL, which backs URL, is a class with far more responsibilities than is comfortable in a simple data wrapper.

Downloading…

Next we need the actual networking. Let's wrap it in a helper class:

import AsyncHTTPClient
import Foundation
import NIO
import NIOHTTP1

class Downloader {
    let httpClient = HTTPClient(eventLoopGroupProvider: .createNew)

    func syncShutdown() throws {
        try self.httpClient.syncShutdown()
    }

    func downloadListedFiles(url: URL, directory: URL) -> EventLoopFuture<Void> {
        fatalError("TODO")
    }
}

We don't really care about any details about the download, we just want to know if it succeeded or not, hence EventLoopFuture<Void>. EventLoopFuture uses untyped errors, so we don't need to include an error in the type signature. It makes a bit odd next to Result and Combine's Publisher, but it does help when integrating with Swift's exceptions.

Next let's implement the downloadListedFiles method.

class Downloader {
    /* … */
    func downloadListedFiles(url: URL, directory: URL) -> EventLoopFuture<Void> {
        let future = self.downloadManifest(url: manifestURL)
            .flatMap { manifest, data in
                self.downloadManifestContent(
                    manifest: manifest,
                    manifestURL: manifestURL,
                    directory: directory
                ).map { data }
            }
            .flatMap { data in
                self.saveManifest(directory: directory, data: data)
            }
        return future
    }

    func downloadManifest(url: URL) -> EventLoopFuture<(manifest: Manifest, data: Data)> {
        fatalError("TODO")
    }

    func downloadManifestContent(
        manifest: Manifest,
        manifestURL: URL,
        directory: URL
    ) -> EventLoopFuture<Void> {
        fatalError("TODO")
    }

    func saveManifest(directory: URL, data: Data) -> EventLoopFuture<Void> {
        fatalError("TODO")
    }
}

That looks like an acceptable outline for it. Give downloadListedFiles an URL to a listing file and a directory to download to and it'll download the file manifest, parse it, and then download the listed files, and finally save the manifest too. I'll fill in the blanks one by one.

Next let's look how downloadManifest should look.

func downloadManifest(url: URL) -> EventLoopFuture<(manifest: Manifest, data: Data)> {
    self.httpClient.get(url: url.absoluteString)
        .flatMapThrowing { response in
            guard var body = response.body,
                    let data = body.readData(length: body.readableBytes)
            else {
                throw MissingBodyError()
            }
            return (manifest: try JSONDecoder().decode(Manifest.self, from: data), data: data)
        }
}

As you can see, just like ArgumentParser, HTTPClient eschews URL as a Foundation type. Other than that, HTTPClient gives us a really easy interface. Just .get a String containing an URL, and then do whatever you need to do with a chaining method like .map, .flatMap, or as in this case, .flatMapThrowing.

Next we can tackle downloadManifestContent. It's the function that's responsible for downloading all the listed files.

func downloadManifestContent(
    manifest: Manifest,
    manifestURL: URL,
    directory: URL
) -> EventLoopFuture<Void> {
    let baseURL = manifestURL.deletingLastPathComponent()
    let requestFutures: [EventLoopFuture<Void>]
    do {
        requestFutures = try manifest.files.map { manifestFile in
            let localURL = directory.appendingPathComponent(manifestFile.file)
            try FileManager.default.createDirectory(
                at: localURL.deletingLastPathComponent(),
                withIntermediateDirectories: true,
                attributes: nil
            )
            let localPath = localURL.path
            let delegate = try FileDownloadDelegate(path: localPath, pool: self.threadPool)
            let request = try HTTPClient.Request(
                url: baseURL.appendingPathComponent(manifestFile.file).absoluteString
            )
            return self.httpClient.execute(request: request, delegate: delegate)
                .futureResult
                .map { _ in () }
        }
    } catch {
        return self.httpClient.eventLoopGroup.next().makeFailedFuture(error)
    }
    return EventLoopFuture.andAllSucceed(requestFutures, on: self.eventLoopGroup.next())
}

var eventLoopGroup: EventLoopGroup {
    self.httpClient.eventLoopGroup
}

This one's not quite as simple, but it's not too bad. For each listed file we create a future for downloading it. The download to disk, as opposed to memory, happens with the help of FileDownloadDelegate, a delegate included in AsyncHTTPClient that can write downloads to disk and report progress. Then once we have a list of futures, we smash them all together with andAllSucceeded. Again we don't care about anything else other than success, so Void is a perfectly fine value type.

One detail I need to point out here is the eventLoopGroup property. SwiftNIO works with EventLoops, and EventLoops are apparently usually threads. While we're working only with with networking code it's probably not a problem to ask HTTPClient for its EventLoopGroup instance.

File I/O

We have read the manifest and written the files listed in it to disk. One thing left to do: saving the manifest. Writing the file with SwiftNIO isn't quite as friendly as AsyncHTTPClient is, and if you were doing more of this you'd want to put a nicer façade on it, but here we just need it this once.

To prepare for this, lets first set up a bit of scaffolding. It feels cleaner to move the management of the EventLoopGroup to our own code now that we're using it for not just the HTTP client, and we'll also need a thread pool for the file I/O.

class Downloader {
    let eventLoopGroup: EventLoopGroup // replaces the computed property
    let httpClient: HTTPClient
    let threadPool: NIOThreadPool

    init() {
        self.eventLoopGroup = NIOTSEventLoopGroup()
        self.httpClient = HTTPClient(eventLoopGroupProvider: .shared(self.eventLoopGroup))
        self.threadPool = NIOThreadPool(numberOfThreads: 1)

        self.threadPool.start()
    }

    func syncShutdown() throws {
        try self.httpClient.syncShutdown()
        try self.threadPool.syncShutdownGracefully()
        try self.eventLoopGroup.syncShutdownGracefully()
    }

    /* … */
}

Forcing NIOTSEventLoopGroup here probably ties this code to macOS. For portability, there are other implementations. Here's what AsyncHTTPClient does when you ask it to create the event loop group itself:

#if canImport(Network)
    if #available(OSX 10.14, iOS 12.0, tvOS 12.0, watchOS 6.0, *) {
        self.eventLoopGroup = NIOTSEventLoopGroup()
    } else {
        self.eventLoopGroup = MultiThreadedEventLoopGroup(numberOfThreads: 1)
    }
#else
    self.eventLoopGroup = MultiThreadedEventLoopGroup(numberOfThreads: 1)
#endif

Doing something similar in your own code should help make this more cross platform.

With that setup done, we can dive into the file writing itself.

func saveManifest(directory: URL, data: Data) -> EventLoopFuture<Void> {
    let io = NonBlockingFileIO(threadPool: self.threadPool)
    let eventLoop = self.eventLoopGroup.next()
    let buffer = ByteBuffer(data: data)
    return io
        .openFile(
            path: directory.appendingPathComponent("manifest.json").path,
            mode: .write,
            flags: .allowFileCreation(),
            eventLoop: eventLoop
        ).flatMap { handle in
            io.write(fileHandle: handle, buffer: buffer, eventLoop: eventLoop)
                .map { handle }
        }.flatMapThrowing { handle in
            try handle.close()
        }
}

It's two async operations and one synchronous one in a pipeline. Open file asynchronously, write file asynchronously, close it synchronously. The flatMaps can feel a little daunting if you're not used to them, as always with future libraries. But once you get used to them, it's pretty OK. Async/await will hopefully help.

After all that work we're ready to loop back to our run method. We left it calling fatalError after processing the arguments. Now we can finish it up:

mutating func run() throws {
    guard let reallyURL = URL(string: self.url) else {
        throw ValidationError("URL \(self.url) was not valid")
    }
    let cwd = URL(fileURLWithPath: FileManager.default.currentDirectoryPath, isDirectory: true)
    let directory = URL(fileURLWithPath: self.directory, isDirectory: true, relativeTo: cwd)

    let downloader = Downloader()
    let dlFuture = downloader.downloadListedFiles(url: reallyURL, directory: directory)

    defer {
        do {
            try downloader.syncShutdown()
        } catch {
            print("Error shutting down: \(error)")
        }
    }
    try dlFuture.wait()
}

And that's it! Create a Downloader, call it to get a future, set up cleanup, then wait until the future is done.

Conclusion

SwiftNIO is a fantastic library that powers the Swift on the server ecosystem. It's also great with command line tooling. It can occasionally be a bit more involved than Foundation, but especially with HTTP requests the difference is negligible. You'd have had to bring in Combine too to make URLSessions composable.

The Foundation/standard library split is a bit awkward here, as it often is when working with Swift command line tools. It's not that Foundation doesn't work, but it's clear that often there's the Swift way and then there's the Foundation way. And Foundation's cross platform story has been a bit rocky.

As Swift's async story progresses a lot of this code can be simplified, I hope. In the ideal case the structure would stay pretty much as is, but those nested maps and flatMaps could be replaced with more straightforward code. However, I don't think you need to wait for async/await and all the related enhancements to arrive. This is already pretty great.

Database connections in Vapor 4

Version 4 of the Swift web framework Vapor was released a while ago. Vapor emphasizes their ORM, Fluent, and it seems that version 4 has changed how a database connection can be acquired if you prefer to write the SQL yourself. They've also skipped documenting it, so getting things working requires some digging. In this post I'll explain how to do it. I'm using PostgreSQL.

You need a connection pool. The right place to set it up is your app's configure(_:Application) method. Use an environment variable to feed a database URL to your app:

import Vapor

public func configure(_ app: Application) throws {
    guard let dbUrlString = Environment.get("DBURL") else {
        preconditionFailure("Missing DBURL")
    }
    /* … */

    // register routes
    try routes(app)
}

Now that you have a URL, import PostgresKit and set up the pool:

import PostgresKit
import Vapor

public func configure(_ app: Application) throws {
    guard let dbUrlString = Environment.get("DBURL") else {
        preconditionFailure("Missing DBURL")
    }
    
    let postgresConfiguration = PostgresConfiguration(url: dbUrlString)!
    let pool = EventLoopGroupConnectionPool(
        source: PostgresConnectionSource(configuration: postgresConfiguration),
        on: app.eventLoopGroup
    )
    /* … */

    // register routes
    try routes(app)
}

Next we need to make the pool available to our request handlers and make sure it's shut down correctly. Making it available to request handlers happens by inserting it into the Application object's storage. Shutdown requires implementing Vapor's LifecycleHandler and registering it with the Application.

First define a struct that wraps the pool:

struct DatabaseService {
    let pool: EventLoopGroupConnectionPool<PostgresConnectionSource>
}

To keep the service in Application.storage, we need a key type:

struct DatabaseServiceKey: StorageKey {
    typealias Value = DatabaseService
}

Add an Application extension property to make it easier to access the service in the storage:

extension Application {
    var databaseService: DatabaseService? {
        get { self.storage[DatabaseServiceKey.self] }
        set { self.storage[DatabaseServiceKey.self] = newValue }
    }
}

The lifecycle implementation looks like this:

extension DatabaseService: LifecycleHandler {
    func shutdown(_ application: Application) {
        self.pool.shutdown()
    }
}

Now you just have slot these pieces in place in configure:

import PostgresKit
import Vapor

public func configure(_ app: Application) throws {
    guard let dbUrlString = Environment.get("DBURL") else {
        preconditionFailure("Missing DBURL")
    }
    
    let postgresConfiguration = PostgresConfiguration(url: dbUrlString)!
    let pool = EventLoopGroupConnectionPool(
        source: PostgresConnectionSource(configuration: postgresConfiguration),
        on: app.eventLoopGroup
    )

    let dbService = DatabaseService(pool: pool)
    app.databaseService = dbService
    app.lifecycle.use(dbService)

    // register routes
    try routes(app)
}

Now you have a working database setup. If you want to run migrations, the configure method is probably a good place to do it, before you set up the routes. I keep the DB code in a domain specific DBClient type; you can use any division of responsibilities you like.

    app.lifecycle.use(dbService)

    let db = dbService.pool.database(logger: app.logger)
    let dbClient = DBClient(database: db)
    app.logger.info("Will run migrate on DB")
    _ = try dbClient.migrate().wait()
    app.logger.info("DB migration done")

    // register routes

When handling requests, you'll just have to get the database service from Request.application. I like to create a struct called RequestEnvironment that encapsulates acquisition of the service and creation of domain logic services. Something like this:

struct RequestEnvironment {
    var makeFooService: () -> FooService

    static func makeDefault(req: Request) -> RequestEnvironment {
        guard let dbService = req.application.databaseService else {
            fatalError("Missing DatabaseService")
        }
        let db = dbService.pool.database(logger: req.logger)
        let dbClient = DBClient(database: db)
        return RequestEnvironment(
            makeFooService: { FooService(dbClient: dbClient, request: req) }
        )
    }    
}

Now when your controller handles a request, create the RequestEnvironment object and use it to call your services with the database client:

struct FooController {
    func create(_ req: Request) -> EventLoopFuture<FooExternal> {
        let newFoo = try req.content.decode(NewFooIncoming.self)
        let env = RequestEnvironment.makeDefault(req: req)
        return env.makeFooService().makeFoo(newFoo)
    }
}

That's it! Go forth and SQL, swiftly.

Alfred Script Filter with find and jq

Looks like this is a jq blog now, so here's another one.

I work on an iOS repository that's used to create a large number of apps and a few frameworks. Each app has a directory with configuration and a script that regenerates the associated Xcode project with XcodeGen.

You can run the script from the shell, or from Finder. Both of these require that you navigate to the appropriate directory or find a window that's already there. Both approaches work, and both are unsatisfactory.

I use Alfred for launching apps and all sorts of other things on macOS. One of the things it allows is workflows, a sort of Automator-like thing where after typing in a keyword Alfred will prompt you for input and execute things and so on. I built a workflow for helping with launching those regenerate scripts. Alfred's workflow sharing thing isn't great, as it creates hard to inspect zip files, and besides my specific circumstances probably aren't relevant to many people. I'll explain here in prose how it works. Adapt it to your needs as necessary.

The repository contains publisher folders. Inside the publisher folders are app folders. In each app folder is a script called regenerate-project.command. The hierarchy looks something like this:

├── publisher1
│   ├── app1
│   │   └── regenerate-project.command
│   └── app2
│       └── regenerate-project.command
└── publisher2
    └── app1
        └── regenerate-project.command

We want Alfred to ask us which one of the scripts to run after we've typed a keyword.

Let's see how we can make it happen. First, to get a list of those files we can run find in the terminal:

find . -maxdepth 3 -mindepth 3 -name regenerate-project.command -print

This gives us a list of files, one per line, like:

./publisher1/app1/regenerate-project.command
./publisher1/app2/regenerate-project.command
./publisher2/app1/regenerate-project.command

etc1.

Now, looking at Alfred's documentation, looks like we need to create a document in the Script Filter JSON Format. It should look like this:

{
    "items": [
        {
            "uid": "publisher1/app1",
            "type": "file",
            "title": "publisher1/app1",
            "arg": "publisher1/app1",
            "match": "publisher1: app1"
        }
    ]
}

And so on. The one thing that breaks the monotony of identical keys is the match value. Its purpose there is to make Alfred give better completions. Alfred has a "word boundary" matching logic, but apparently / doesn't count as a word boundary.

What do we do when we need to handle JSON on the command line? We reach for jq.

Jq has a number of parameters that modify how it handles input. To get it to ingest the list of strings produced by find, what seemed to work was using a combination of the --raw-input/-R and --null-input/-n flags, and the inputs builtin function. So the first thing to do is to build the wrapping object.

find . -maxdepth 3 -mindepth 3 -name regenerate-project.command -print | jq -nR '{ "items": [inputs] }'

Running that produces output like this:

{
  "items": [
    "./publisher1/app2/regenerate-project.command",
    "./publisher1/app1/regenerate-project.command",
    "./publisher2/app1/regenerate-project.command"
  ]
}

You could pipe find through sort or you could use jq's sort function, but the order doesn't matter as Alfred will reorder the choices by usage anyway, which is nice.

Next, just because we're careful developers, let's filter out empty entries, just in case we're ever using this with some other source of data:

find … | jq -nR '{ "items": [inputs | select(length>0)] }'

When you're running this with find, it shouldn't affect the output, but if you ever end up feeding it a text file it might be a different story.

Next drop the extra bits from the lines. We don't care about the leading ./ or the script name. They're all the same on all the lines. To lose them split the line into path components, take the two central elements and recombine them:

find … | jq -nR '{
    "items": [
        inputs |
        select(length>0) |
        split("/")[1:3] |
        join("/")
    ]
}'
{
  "items": [
    "publisher1/app2",
    "publisher1/app1",
    "publisher2/app1"
  ]
}

One thing we have to do to before we can build the object literals is capture the values — both the parts array and the combined string — in variables. This is a slightly longer version of the above jq snippet. It produces exactly the same output, but it defines the variables we need in the next step:

find … | jq -nR '{
    "items": [
        inputs |
        select(length>0) |
        split("/")[1:3] as $parts |
        $parts |
        join("/") as $file |
        $file
    ]
}'

OK, good. Now we have a the two folders as an array in $parts and as a string in $file. Then just replace that last bit that produces the array elements with an object literal.

find … | jq -nR '{
    "items": [
        inputs |
        select(length>0) |
        split("/")[1:3] as $parts |
        $parts |
        join("/") as $file |
        {
            "uid": $file,
            "type": "file",
            "title": $file,
            "arg": $file,
            match: $parts | join(": ")
        }
    ]
}'

That's a whole lot of $file and one special element that produces the value for the match field. Now the output looks like this:

{
  "items": [
    {
      "uid": "publisher1/app2",
      "type": "file",
      "title": "publisher1/app2",
      "arg": "publisher1/app2",
      "match": "publisher1: app2"
    },
    {
      "uid": "publisher1/app1",
      "type": "file",
      "title": "publisher1/app1",
      "arg": "publisher1/app1",
      "match": "publisher1: app1"
    },
    {
      "uid": "publisher2/app1",
      "type": "file",
      "title": "publisher2/app1",
      "arg": "publisher2/app1",
      "match": "publisher2: app1"
    }
  ]
}

All right, that's what we were after! Now we need to glue things together. In Alfred's Preferences, go to Workflows and create a new blank workflow. First tap on the "[𝑥]" button to set up variables. You'll need at least one, to specify where your project lives. Call it root, specify your folder as the value, and uncheck "Don't Export" as you want it as an environment variable in your script.

Next ctrl-click in the workflow background to get the context menu and select Inputs > Script Filter. In the filter configuration panel, give your workflow a keyword — I call mine regenios, this is how I invoke it in Alfred — uncheck "with space", and select "Argument Required". Select /bin/bash as the script language, and as text add this:

cd $root
find . -maxdepth 3 -mindepth 3 -name regenerate-project.command -print | jq -nR '{
    "items": [
        inputs |
        select(length>0) |
        split("/")[1:3] as $parts |
        $parts |
        join("/") as $file |
        {
            "uid": $file,
            "type": "file",
            "title": $file,
            "arg": $file,
            match: $parts | join(": ")
        }
    ]
}'

Now click Save to save your Script Filter. Then ctrl-click in the workflow background again and this time select Actions > Terminal Command. Insert the following as the terminal command:

{var:root}/{query}/regenerate-project.command && exit

Again click save. Finally in the workflow editor drag a connection from the Script Filter shape to the Terminal Command box and you're done.

Now when you open the Alfred command window and type regenios and two spaces, you should get a full list of all the items your script produced. If you start typing after the first space, Alfred will match the beginning of each word of the match field of the JSON objects we produced and give a list of the matching items.

As I said at the start of this article, this probably isn't of much use to you as is. But it might be useful as inspiration.

1

Yes, I'm aware of -print0, but it seems jq isn't.

© Juri Pakaste 2021