◀ Back to NotesPublished on September 1, 2023

Restarting macOS apps automatically on crash

Normally apps crash because of bugs introduced by the developer, but in my experience, there are cases where crashes can be out of the developer’s control.

For example the app might work with hardware closely, and the expectations the system has on that hardware are not met when a power or connection cord is yanked out (or if there’s any electromagnetic interference caused by your IKEA chair)

Or maybe the app works with a network shared disk which disconnects abruptly in the middle of busy I/O, or a rogue driver changes the hardware tree while the app is traversing it, or God forbid, there is a kernel bug.

In all those cases, the crash is just a nuisance for the end user as they now have to relaunch the app. The crash report is also of no use to the developer, and mostly serves as a time consuming confusion.

After we reach a certain confidence in the app stability, we can explore some ways to have the app restart automatically when the inevitable crash happens.

# The launchd way

com.example.MyApp.restartOnCrash.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>KeepAlive</key>
        <dict>
            <key>Crashed</key>
            <true/>
        </dict>

        <key>Label</key>
        <string>com.example.MyApp.restartOnCrash</string>

        <key>ProgramArguments</key>
        <array>
            <string>/usr/bin/open</string>
            <string>-W</string>
            <string>/Applications/MyApp.app</string>
        </array>
    </dict>
</plist>

launchd is the daemon that manages all macOS processes. By placing a plist file in the right place, you can have a process start at boot, run on specific times or events, be kept alive etc.

So with a few lines of XML, we can have launchd start our app and keep it alive on crashes.

The problem with this is that the first app launch needs to be a false start that registers the agent and exits. The agent is the one who will actually start the app.

let service = SMAppService.agent(plistName: "com.example.MyApp.restartOnCrash.plist")
try! service.register()

exit(0)

Also this API is only available since macOS 13.

Older versions of macOS would need to use a .pkg installer to copy the plist file to ~/Library/LaunchAgents and then start the agent using the launchctl command.

# The signal handler way

func restart() {
    guard CommandLine.arguments.count == 1 else {
        exit(1)
    }

    let proc = Process()
    proc.launchPath = "/usr/bin/open"
    proc.args = ["-n", Bundle.main.bundlePath]
    try! proc.run()

    exit(0)
}

func restartOnCrash() {
    NSSetUncaughtExceptionHandler { _ in restart() }
    signal(SIGABRT) { _ in restart() }
    signal(SIGILL) { _ in restart() }
    signal(SIGSEGV) { _ in restart() }
    signal(SIGFPE) { _ in restart() }
    signal(SIGBUS) { _ in restart() }
    signal(SIGPIPE) { _ in restart() }
    signal(SIGTRAP) { _ in restart() }
    signal(SIGHUP) { _ in restart() }
}

Installing this is as simple as calling restartOnCrash() once at the start of the app lifecycle (usually in didFinishLaunching)

This is the method I use in Lunar and Clop and it is reliable enough while also being easy to use and compatible with old macOS versions.

Those signals are what is emitted by various parts of system code when the app gets in a crashing state. It could be because it accessed a wrong memory address, or because a panic() was called etc.

For a more complete implementation that uses execve for instant restart and also checks for a restart loop, check this gist: RestartOnCrash.swift

# Bonus: restart on hang

If the app is mostly a menubar utility sitting in background, it could be useful to detect and recover from app hangs.

An app hang is when the app does not respond to user input anymore. It might appear as Not responding in Activity Monitor and show the beach ball cursor when hovering the cursor over its icon.

For example, Lunar is well suited for this because it mostly sits in the background adapting monitor brightness automatically, and responding to brightness and volume keys. It is also prone to deadlocks caused by the macOS Gamma APIs.

A document based app like Sketch, Pages or Pixelmator would not be suited for this as the hang might be caused by saving a file or waiting for a CPU-heavy operation. In that case it would be best to wait it out, rather that restarting and causing data corruption.

So how can a hang be detected? By definition, all user inputs and UI renders happen on the main thread, so an app that’s not responding will most likely have its main thread busy or locked.

We can periodically schedule a ping on the main thread, and if the ping is not received for n seconds, an app hang happened.

var lastPing = Date()
let hangThreshold = 60.0
let hangDetectionQueue = DispatchQueue(label: "hangDetectionQueue")

func restartOnAppHang() {
    let timer = Timer.scheduledTimer(withTimeInterval: 1, repeats: true) { _ in
        lastPing = Date()
    }
    timer.tolerance = 1
    RunLoop.main.add(timer, forMode: .common)

    hangDetectionQueue.async {
        while true {
            Thread.sleep(forTimeInterval: 1)

            if Date().timeIntervalSince(lastPing) > hangThreshold {
                restart()
            }
        }
    }
}

To make it more robust, you might need to listen to sleep/wake events, pause the detection queue while in standby and reset the ping time on wake.

I use Sentry's app hang detection which not only handles this already, but it also sends a traceback where the hang happened so that I can investigate and fix it in case it's an app bug causing it.

SentrySDK.start { options in
    options.appHangTimeoutInterval = 60
    options.beforeSend = { event in
        guard let exc = event.exceptions?.first, let mech = exc.mechanism, mech.type == "AppHang" else {
            return event
        }

        // Allow Sentry to send the event before restarting the app
        DispatchQueue.global().asyncAfter(deadline: .now() + 5) {
            restart()
        }
    }
}