Instinctive Performance Optimization Wastes Time

As we were gearing up for the commercial release of Gibraltar one of the last things on our plate was a final pass of performance and memory optimization. You’ve probably read a lot about the dangers of preemptive optimization, and while I’m generally a fan of that philosophy as a team we just can’t shake the desire to write the fastest code we can as we go along. Usually that means right out of the gate our early betas are pretty responsive and we don’t have many hot spots to look into.

man-clipboard When it does come time to look into problem areas, it never ceases to amaze me that every developer has an opinion on what’s causing the performance issue that they’re completely confident in… And invariably wrong about. This is one reason I’m a big fan of having folks on your team that are experienced at code profiling, typically using one or more profiling tools. These tools (and the experienced developer that knows how to use them) create a set of facts about what is really happening that’s far more accurate than developer instinct at improving the real world performance.

One reason this is so is because of a natural tendency to assume that less code will run faster than more code. This is instinctive because:

It’s natural to assume that each line of code represents work, and therefore more lines is more work.
Opaque lines (such as keywords, runtime calls, or calls to third party libraries) tend to be viewed as all having similar performance because they are a black box.

Interestingly, making a program faster almost always involves giving it more lines of code, sometimes a lot more lines. This is more apparent in environments like .NET than in thinner environments (like Visual C++/MFC) because the underlying code libraries are so large. When you can display a sortable, pagable grid of data from a database in 10 lines of code it can feel very wrong to write two hundred lines of code to make it fast.

Code Gravity

Another reason that instinctive performance optimization is generally not effective is that the slowest operations are also the most abstracted away from the developer. Even on a modern computer there are some operations that can be measured by a stopwatch:

Network communication: The time it takes to establish a network connection, move one byte of data to another computer, and get one byte in response is massive. While there’s a lot of layers of abstractions to make this look fast, those facades can’t cheat nature all of the time.
Graphical User Interfaces: Drawing a UI page on screen, whether it’s rendering a web page or displaying a window, takes a lot of time.

Unfortunately, these areas are likely to stay slow because there are natural laws at work that thwart performance improvements. That means while the days of worrying about which index in an array you’re iterating over first are behind us, you still need to worry about every call that might go across a network, even if it’s just one line (and it often is).

Connecting to a remote server even when things go well is much slower than doing a very inefficient sort of 1000 objects - at least 100 times so. It’s even worse than that: Doing a sort is a very consistent operation; on a given computer it’ll take basically the same amount of time every time. Connecting to a remote server is a highly variable operation:

Is there a network interface available?
Does it have its IP address?
Do you have the server in your DNS cache or do we have to look it up in DNS? (which takes a separate connection, query, and result before we can even start what you wanted)
Is the other server still available?
and on and on…

It’s all Relative

Finally, when you look at a section of code you’ll tend to think in terms of the percentage you could improve that area, but you can’t mentally put that in the context of the entire time of the application. Perhaps that filter routine could be made 30% faster by having it check for some common boundary cases first, but if the filter is 0.1% of the total runtime of your application, that’s wasted time - and possibly very costly because you’ll need to test each of those optimizations.

With Gibraltar Analyst, we had a few counterintuitive optimizations that made a big difference:

With smaller data views the time it takes to rebind the grid views we’re using is relatively large (even though it’s just one line of code) so it was worth comparing updated datasets with the displayed dataset to make sure there were changes before rebinding.
We are calculating some joins, tops, and filters in memory manually instead of running more optimized queries because it lets us do one database query and use the results in many places, which is particularly faster on new multi-core computers where we can process the variations in parallel.
We were going to some lengths to avoid reading file data unnecessarily. It turns out that even on a basic laptop disk performance for small files is not bad. Slower than most things, but much faster than it used to be. Users were refreshing the entire display to force it to read data when we guessed wrong which was a far more expensive operation. We were better off reading the files each time.

Interestingly, as a team we’d marked several areas ahead of time we were sure were going to need optimization later. When we profiled the application in use none of the marked areas was above the noise - in fact in total they accounted for less than 5% of the processor utilization of the application. Even with an 80% performance improvement the user would only see a 4% change in the application, well under the threshold of perception.

And make it look fast

The final odd thing we ran into is something akin to what Microsoft discovered with Vista: People have poor mental stopwatches. After a range of improvements that we know objectively improved our startup time significantly as well as other key operations we got feedback from our user community that they thought the app was actually slower.

soap_box_derby_racer When we dug into this we discovered that users weren’t interpreting the operations as ending when the wait cursor went away and they could interact with the application but instead when all visible changes were complete.

We’d improved performance by pushing more work into asynchronous events, but when these completed they were triggering minor user interface recalculations that would result in very small shifts in small parts of the screen (as it recalculated the correct size of the data displays). Users were used to this small visual twitch and treated the application as being busy until they saw all of that stop, sure that they needed to wait.

In this case, we did two things: We added checks at the end of these routines to make sure the data changed before allowing the size calculation to eliminate the visual twitches (at the expense of some processor time, but remember that comparing data is much faster than moving pixels around on screen) and we moved many things back into the foreground to make sure they were completed as quickly as possible.

So in the end, for systems that interact with users it isn’t about achieving paper performance but doing what feels the fastest to the end users. Often they’re the same thing, but the only way to know for sure is with user experience testing.

Loupe View

Loupe Monitor

Loupe Resolve

Loupe Desktop

Instinctive Performance Optimization Wastes Time

Code Gravity

It’s all Relative

And make it look fast

Rock solid centralized logging