Thursday, December 10, 2015

The awesomeness that is ClangFormat

While working on the Linux project at Valve we started using this code formatting tool from the LLVM folks:

Coding StandardsMike Sartain (now at Rad) recommended we try it. ClangFormat is extremely configurable, and can handle more or less any convention you could want. It's a top notch tool. Here's vogl's .clang-format configuration file to see how easy it is to configure.

One day on the vogl GL debugger project, I finally got tired of trying to enforce or even worry about the lowest level bits of coding conventions. Things like if there should be a space before/after pointers, space after commas, bracket styles, switch styles, etc.

I find codebases with too much style diversity increase the mental tax of parsing code, especially for newcomers. Even worse, for both newcomers and old timers: you need to switch styles on the fly as you're modifying code spread throughout various files (to ensure the new code fits in to its local neighbor). After some time in such a codebase, you do eventually build a mental model and internalize each local neighborhood's little style oddities. This adds no real value I can think of, it only subtracts as far as I can tell.

I have grown immune to most styles now, but switching between X different styles to write locally conforming code seems like an inefficient use of programmer time.

In codebases like this, trying to tell newcomers to use a single nice and neat company sanctioned style convention isn't practically achievable. A codebase's style diversity tends to increase over time unless you stop it early. Your codebase is stuck in a style vortex.

So sometimes it makes sense to just blow away the current borked style and do a drive-by ClangFormat on all files and check them back in. Of course this can make diff'ing and 3-way merging harder, but after a while that issue mostly becomes moot as churn occurs. It's a traumatic thing, yes, but doable.

Next, you can integrate ClangFormat into your checkin process, or into the editor. Require all submitted code to be first passed through ClangFormat. Detecting divergence can be part of the push request process, or something.

On new codebases, be sure to figure this out early or you'll get stuck in the style diversity vortex.

Microsoft's old QuickBASIC product from the 80's would auto-format to a standard style as lines were being entered and edited. 

Perhaps in the far future (or a current alternate universe), the local coding style can just be an editor, diff-time, and grep-time only thing. Let's ditch text completely and switch to some sort of parse tree or AST that also preserves human things like comments.

Then, the editor just loads and converts to text, you edit there, and it converts back to binary for saving and checkin. With this approach, it should be possible to switch to different code views to visualize and edit the code in different ways. (I'm sure this has all been thought of 100 times already, and it ended in tears each time it was tried.)

As a plus, once the code has been pre-parsed perhaps builds can go a little faster.

Anyhow, here we are in 2015. We individually have personal super computers, and yet we're still editing text on glass TTY's (or their LCD equivalents). I now have more RAM than the hard drives I owned until 2000 or so. Hardware technology has improved massively, but software technology hasn't caught up yet.

No comments:

Post a Comment