Improving Startup Time in the NYTimes Android App

Improving application startup and load time has been a priority for The New York Times Android development team, and we’re not alone. As device manufacturers continue to offer faster and more fluid experiences, users expect their native apps to be faster still.

Our team recently rewrote our news app to take advantage of modern patterns such as dependency injection and reactive programming. The rewrite offered improvements in maintenance, code modernization and modularization benefits, but required some adjustments to optimize.

When we initially released our new app, which we nicknamed Phoenix, startup time was a modest 5.6 seconds on a Nexus 5. This was longer than our goal: 2 seconds or less. This motivated us to put some effort into improving our performance.

We found that most of the slowdown was caused by issues with reflection. After addressing these and fixing other smaller things, we’ve reduced our current startup time to 1.6 seconds.

How We Did It

First, we captured our app’s startup time with Android’s method tracing feature. We measured from the Application Class constructor to the end of the progress indicator appearing on screen (see the documentation).

We then collected the resulting trace files for analysis by loading the trace into DDMS and finding the largest performance offender. We eventually switched to using NimbleDroid, which also offered a simple way to identify bottleneck issues, making it easier to compare performance across traces.

The Low Hanging Fruit

The first major slowdown we found was related to the large number of classes, memory-intensive runtime and expensive calls to loading Jar resources required by Groovy, something previously identified as a problem in other libraries such as Joda Time. We primarily used Groovy for closures; improvements in code folding within Android Studio have resolved that need. Since we didn’t use any other constructs of the language, we decided to move away from it. We reverted back to plain old Java 7 syntax and stripped our codebase of Groovy. We’re currently exploring other options but enhancements in IDE support for viewing anonymous classes in Java have made it less of a priority.

Next, we found some slowdowns within RxJava which were costing us about a second. Thankfully, a fix was issued in the next release.

Some third-party analytics clients we have incorporated into our app had some blocking calls during app startup. We made some changes to how they were being instantiated and worked with vendors to help improve large upfront instantiation.

We also uncovered a number of small technical debt issues in our codebase: blocking object instantiation on an md5 calculation, blocking in constructors and generally doing too much work up front.

After these tweaks were implemented, we cut our startup time to about 3.2 seconds — half of what it had been and faster than before the rewrite. Our next focus was optimizing data flows. This is especially critical for our app, which is highly data-intensive.

Introducing Data Stores

Since all of our data observations are asynchronous, we initially started with a single content manager to act as an abstraction between presentation and data layer. While this helped with encapsulation, it caused slowdowns by instantiating what gradually became a god object any time data was needed from persistent storage.

This became a pain point when, for example, an analytics configuration value was needed from the disk during app launch but had to wait for blocking SSL-enabled network clients to instantiate. As the app grew and more dependencies were added, our content manager instantiation time increased, causing subsequent app starts to be as slow as initial installs.

Rather than using a content manager as a single gatekeeper, we have moved toward individualized data stores, which let us load cached data as quickly as possible. Similar to recent work done by the Facebook Android team, we made sure to optimize the code path from disk to UI as much as possible.

We began by breaking our content manager into individual singleton data stores backed by disk and network DAOs. For example, we broke out a ConfigStore backed by ConfigNetworkDAO and ConfigDiskDAO from our content manager.

We leaned on Dagger’s Lazy instantiation, which allowed us to inject a lazy network client and not instantiate it until we actually do a network operation — important when offline or after the first load. Our architecture relies heavily on downloading data using background services. As a result, data is mostly loaded from disk storage to UI rather than having to make a network call. After we were able to create an optimal path from disk to screen we came to our next major speed hog: reflection.

Removing Reflection

While trying to improve performance in data loading, we found that it was taking 700 milliseconds or more to parse the data for our Top Stories section, regardless of whether we were fetching from persistent storage or over the network. It surprised us to see how poorly Gson performed by default on Android for a largely data-driven app like ours. By analyzing startup traces, we zoomed in on calls to Reflective Type Adapters as the culprit.

We tried to minimize and remove reflective calls from Gson, but the only viable approach was to painstakingly write type adapters by hand. This led to a long and fruitless journey to find serialization techniques that do not use reflection and do not have an expensive startup time. We were left with only a few options, all of which required adding additional code to our models. So we went back to the simplest working solution: Gson with custom type adapters.

We saw a tenfold improvement in parsing performance after writing our own type adapters. To keep developer overhead to a minimum, we have leaned heavily on the Immutables library. This generates custom type adapters for our data models at compile time and also gives us the added benefit of immutability in a manner similar to AutoValue.

Our current data flow looks something like this:

A background Service subscribes to an RxStore which lazily instantiates a network client and downloads fresh data in JSON. This is done on a schedule or from a push alert. We then stream the JSON data to disk. We use the streaming API rather than saving the JSON as a single object because it prevents us from needing to instantiate value objects over one megabyte into memory.

Now when the UI needs data, it subscribes to immutable data from a data store that is backed by caches in memory (Guava) and on disk. We only instantiate a network client if disk values are not present or if the format has changed since the last save. Data flows unidirectionally from persistent storage to UI and never directly from the network to our UI. This unidirectional data flow means that 99 percent of subscriptions to data store observables will never have to hit anything but disk values.

We will continue to explore other methods such as FlatBuffers for serializing data for disk storage; however, we are generally pleased with our current results. A relaunch of our app offers users a full view of the home page in under two seconds.

The Bottom Line

Reflection can cause significant performance issues on Android, especially for large, data-driven apps. For this reason, it should be avoided whenever possible, especially during app startup.

Finally, we’d like to challenge our fellow Android developers working on performance issues: Tweet your startup times with the hashtag #StartupTrace. Our current time on a Nexus 5 is 1.6 seconds and always improving.

For our typical users on newer devices (Nexus 6P, for example), relaunches take under 1.5 seconds to go from the home screen to all the news that’s fit to scroll (at 60 frames per second, of course).

Interested in working on ambitious Android problems at The New York Times? Come join us.