2005-06-15

On beyond Makefiles

Makefiles are ugly technology, but we're mostly stuck with them for now. People often misunderstand and misuse Makefile dependency declarations. For example, if foo.o is the compiled form of foo.c, which in turn includes foo.h, people wind up writing something like this:

foo.o: foo.c
foo.c: foo.h
        touch foo.c

But that's Just Wrong. It's not true that if you change foo.h you must rebuild foo.c; indeed, foo.c is maintained by hand and can't be "rebuilt". The Right Thing is of course:

foo.o: foo.c foo.h

The GCC compiler can generate rules like this automatically with the -M switch, and there is a makedepend program packaged with X Windows that slurps up preprocessor output and figures out the dependencies from that. But people have to make sure to use these features carefully to modify their Makefiles, and random changes to the Makefile can ruin them.

Furthermore, there are ugly problems with programs that have multiple source directories; people tend to write one Makefile per directory, and when there are cross-directory dependencies, things can go very wrong indeed. See Peter Miller's excellent paper Recursive Make Considered Harmful.

Fortunately, there are other possibilities out there. The Ada language imposes requirements that object files always be consistent with all the source files they depend on (at least two and frequently more). Classical Ada translators manage this with an "Ada software library", which keeps the programmer out of the frying pan -- and into the fire. But the implementors of GNAT (the GNU Ada Translator) have evolved an excellent general solution.

{I am not advocating a wholesale rewrite of the world's code in Ada! Though that might not be such a bad long-term idea in a few cases: Ada's design point is stand-alone high-reliability embedded programs, which is what an operating system kernel really is. The downside is the scarcity of Ada programmers relative to C programmers, of course.)

The GNAT approach is to maintain a parallel file for each .o file, called the .ali (Ada Library Information) file. The .ali file is conceptually part of the .o file, and is not physically incorporated into it only because GNAT has to handle a.out and other inflexible object formats. Every successful GNAT compilation run produces both an .o and an .ali file.

An .ali file contains the pathnames of all the source files read by GNAT to produce this file. It also contains the last modification times of those source files. An .o file is considered up-to-date if there is a corresponding .ali file and all the source files mentioned in it have the same timestamps as the actual sources. (If a particular source cannot be found, that may or may not be an error, depending on switch settings.)

Otherwise, the .o file is recompiled by compiling the Ada source with the same name, and the regenerated or newly generated .ali file is then examined to determine what to do next. Make-ing in GNAT therefore requires no error-prone Makefiles; just say gnatmake vmlinux :-) and everything needed will be compiled, pre-linked ("bound" in Ada jargon), and linked.

For more detail on the tao of GNAT compilation, read The GNAT Compilation Model, part of the GNAT documentation. There's an excellent paper of the same name by Robert Dewar, but it seems the only online copy is behind the ACM's content firewall.

No comments: