Webapp Error Reporting

I’ve been playing around with JavaScript error reporting, lately. It started last September, when my team started thinking about ways to offer Socorro as a Service. A month later I found myself at the Mozilla Festival, where Popcorn Maker 1.0 had just launched, complete with a client-side error reporter. I got to spend some time chatting with Dave Humphrey and David Bruant about the challenges and rewards of web app error reporting, and brought those discussions back to the WebTools team.

In February, WebTools got together in Mountain View to build some prototypes. I had spent some time investigating the modern ecosystem, and found heavy active development error reporters and aggregation software – including the open source Sentry + Raven.js combo – across platforms. I also noted some undeveloped areas for FFOS, and set about investigating. I created a simple client library, Ripcord.js, that provided an interface like Raven.js, but with drastically simplified internals. There was also Cypress, a simple web app that created JS errors in a variety of circumstances. On a Firefox OS phone I eventually supplemented my client library with a local web app for queuing and submitting errors. A custom web activity sent the error data to the app, which could then act on the errors intelligently.

Through these experiments we’ve learned a lot.

With FFOS it can be useful to have additional information about the state of the device. As you pile on information about the hardware, in addition to any information about the crash itself, reports can get large and quickly eat up a capped data plan. It can also impact network performance and extend the time a user spends dealing with each error.

Ripcord.js can use information about the device APIs with a local queue to bulk submit errors, conditionally submit based on Internet connectivity, present a unified user experience for handling errors, and locally aggregate common errors between submissions. However, any app wanting to use this library would have to go through extra certification steps and a potential reorganization to accommodate. By centralizing error submission in a separate app we can extend these features to the simplest web apps and provide a central location for historic information about errors on your device. Only the local submitter would have to go through extra certification steps. In order to ensure its ubiquity, it would have to come packaged with the OS, meaning it will have to undergo the extra certification anyway. Ripcord.js can then detect the presence of a submitter, and in its absence fall back to a simpler behavior based on the developers presence. This has the added benefit of reducing the size of the client library.

The experience of handling an error can be significantly improved by an error reporter. Uncaught errors can cause the UI to stop responding to the user with no warning or notification, which is frustrating. By capturing the error and triggering a MozActivity, we gain some control over the user experience when an app enters a bad state. I did not get to investigate this very thoroughly, but there’s opportunity here.

There’s lot’s of potential for improving error reporting in FFOS, but there are more fundamental challenges to error reporting from web apps in general. The top level window.onerror is supposed to catch unhandled errors, but it doesn’t fire when errors originate in certain event listeners, or in web workers. When it does catch the error, it provides only line number, file of origin, and the error message. There’s no access to the error object itself – for that you have to instrument try catch blocks throughout your code, and possibly add event listeners. Even when you manage to catch an error, you don’t get much more information. In the Firefox family of browsers the stack trace remains a mystery, though we’ve been able to revive a longstanding bug to improve this behavior that is now seeing a flurry of activity.

The few values you do get are nearly useless, because production JavaScript is compacted to minimize the number of lines and files. In practice, this means that most errors occur on the same line and file, making it difficult or impossible to differentiate between multiple errors of the same type, or isolate where a single error came from. In Webkit based browsers you can access a trace. In nightly Webkit and Chrome Canary you can even use source maps, which can map compressed JavaScript back to it’s source. The Firefox implementation will likely yield column number before a full browser source maps implementation, but column is enough for an external program to use, given the map and original files. When that is complete, error reporters will be able to usefully trace an error’s origin to the source – even if the source is a transpiled language like CoffeeScript or Fey. With full source maps support in the browser, error reporters will be able to skip the intermediate step and get the error against the original source.

A lot of the information that an error reporter could want is already fully or partially implemented in browser developer tools, which is encouraging. Not only does it make providing similar info in the browser window easier, but it hints at a possible future where a client-side error reporter could hook into the browsers developer tools directly.

The group within Webtools exploring this issue will be formalizing our efforts a bit more in the next few months, in a new project we’re calling Bixie. We will work towards a v1 with a minimal feature set. It will accept JavaScript errors, store them, and present them to users through a web app. To achieve this quickly we will build on other open source projects – leveraging what we can from the modular infrastructure of the Socorro crash reporter and relying on Raven.js as our client library. Subsequent iterations may introduce additional features like custom reports derived from raw errors, the client library improvements discussed above, and improved integration with other parts of the developer’s ecosystem.

Moving forward we will also work closely with other teams inside Mozilla to tackle the aforementioned bugs and more, in order to bring improvements to the web platform. The tools-bixie mailing list is pending, but until then we’ll be coordinating in irc://irc.mozilla.org#bixie. More details about the Bixie prototype and how you can get involved will follow as we ramp up.

UPDATE: the mailing list is available at https://mail.mozilla.org/listinfo/tools-bixie