Optimization, part 1

All timestamps are based on your local time of:

Posted by: stak
Posted on: 2008-03-30 10:57:28

Having to write software for an embedded platform with limited resources means I have to spend a good chunk of time optimizing code. After thinking about it for a while, I've noticed that performance optimizations usually fall into three categories, two of which are good and one of which is bad.

The first good kind of optimization is using or coming up with more efficient data structures and algorithms. If you're doing something which takes O(n^2) time, and you reduce that to O(n) using a different algorithm, that's obviously good. Similarly, using a custom-built data structure instead of patching together one from pre-built hashtables/lists/arrays/etc. can also change the runtime of your algorithm by orders of magnitude. All software developers are probably aware of this kind of optimization, even if they don't necessarily use it. It may require a lot of thought to actually come up with better algorithms and data structures, and in some cases there may be none, but still, this is one well-known avenue for optimization so I'm not going to dwell on it.

When the above optimization doesn't result in enough performance gains, most software developers (and I'm guilty of this too) will start resorting to the bad kind of optimization. The bad kind of optimization is the one where you start "packing" the code and taking shortcuts. One example is changing private instance variables to be more visible (protected, package, public in Java, may vary for your language of choice) and accessing them directly rather than using an accessor method. Another example is (in C++) changing a virtual method to be non-virtual to avoid the vtable lookup when invoking it. Optimizations like these can have an beneficial impact on performance, specially if those methods are invoked lots of times in an inner loop somewhere, but they also have a detrimental impact on code maintainability. (Note: I've seen both of these examples used in practice and can attest to the fact that they did in fact help performance noticeably).

The reason I called this method of optimization "packing" is because I find it conceptually similar to compression. If you take a text file and zip it, you still have the same data, but in a packed format. It is possible to edit the zip file in a binary editor such that you change that data, but it is much harder to do. The same applies to the code - after you do packing optimizations, the code is that much harder to debug and edit. If you need to change a member variable that is directly accessed from a dozen classes, those dozen classes will have to change too, just like how editing a line of text in the zipped file will require you to adjust everything following it, as well as any checksums and header info. It's possible, but it's incredibly error-prone and is always turns out to be a bad idea in the long run.

When Hoare said that premature optimization is the root of all evil, I believe these are the kinds of "small efficiencies" he was talking about. The packing optimizations are acceptable if that code never needs to be modified again. But if it does, then such optimizations are premature and yes, they are the root of all evil. And really, when does code get written that never needs to be modified again? I don't think I've ever come across any code so perfect that it doesn't ever need to change.

The third kind of optimization is the one that requires the most work, but also provides the most benefit. It's the one that most developers don't even consider simply because it's so much work. But if you're really serious about squeezing every ounce of performance out of the code, as well as keeping it maintainable for the long run, it's the only solution that I can think of. This category of optimization involves going down a layer in the software stack. Go down to the code that's running your code, and make it run your code faster. Make the compiler smarter. Make the VM smarter. Make it detect those cases where the vtable lookup isn't needed and optimize them out. Why should you have to do it manually at the expense of code maintainability when the compiler can do it for you? That's what it's there for, after all. The more intelligence you add to those lower layers, the faster your code will run, and without the need for ugly hacks.

The benefits of this approach are obviously many. It speeds up not just your code, but anything else running on top of it too. The techniques you use will likely be reusable and portable in other places. It's also different from the first two kinds of optimization in that you can think of it as simply adding features to the compiler rather than "optimization" in the traditional sense. But it does have an optimizing effect. Going back to the analogy of the zipped text file, the packing method is similar to manually zipping the text file one byte at a time, whereas a smarter compiler is like having a compression engine to do it for you. Since the compression is automated, it's easily repeatable, and far more useful. You can change the original text file/code all you want, and then just re-run the compression engine/compiler on it again to regenerate the binary.

Most of the opposition to using this third method boils down to "it's not worth it". I think that in 99% of the cases, it is worth it. Code that's worth optimizing to this extent will likely be around for a very very long time (just ask those people getting paid millions of dollars to keep 30-year-old mainframes going) and if you add up the benefits over the entire life of the software, it's well worth it. But more on that in another post :)

Posted by Fai at 2008-03-31 23:39:43

I agree with the analysis on the first and second methods, but I agree on the third. Optimizing the compiler doesn't mean "make it better"; it means making a certain aspect of it better at the cost of another. The optimizations would make your program run faster, but will likely make other programs run slower, and possibly even your own program, if it's changed significantly. The compiler as it is today is making bets on what the average case is. If you skew it towards your less common case, the average case will suffer.
As our perf prof used to say, you can't just optimize. you have to optimize something.

[ Reply to this ]

Posted by stak at 2008-04-02 07:54:21

There are certain kinds of optimizations that will always have a tradeoff, yes. If you decide to cache a variable rather than recalculate it every time, that's always going to increase memory and decrease runtime. Those are the kind of optimizations that belong in groups 1 and 2, but those aren't the optimizations I'm talking about in group 3.

Every program written in a high-level programming language has an equivalent "perfectly optimized" (for some definition of perfectly optimized that depends on your requirements and the hardware) program in machine code. When we write code in a high-level language, it becomes impossible to represent that minimal program exactly; the high-level language adds much more, which reduces efficiency in favor of maintainability.

What group 3 optimizations attempt to do is bring those two representations closer. A compiler optimization will do things like strip out null checks if it knows the variable at that point would never be null. If you had written that minimal machine-code version of your program by hand, you would have known that the variable would never be null there and wouldn't have put the check in in the first place, so having the compiler strip out the null check brings the final generated code closer to that ideal program without any real tradeoffs that need to be dealt with.

[ Reply to this ]

[ Add a new comment ]