Makefile tricks
GNU make is an extremely powerful program, and can be used to automate the building and testing of software. The only problem with using it is that makefile syntax is rather cryptic, and debugging complex makefiles can be difficult. If a dependency relationship is incorrect, then a file may not be rebuilt when it needs to be. This can cause intermittent bugs as inconsistent interfaces cause undefined behavior. Another possibility is rebuilding too much, which slows down the build-test development cycle. What we would like is a perfect dependency graph that allows make to rebuild exactly what is needed and no more.
Fortunately, it is possible to automate the production of dependencies within a makefile. This is possible due to the gcc -M flag. This outputs dependency information in the form make requires for the given source file. By collating these outputs for every source file, we may automatically construct the dependency graph.
Unfortunately, this isn't as simple as it appears. One could of course, call the compiler to get the dependency information on every invocation of make . This is obviously not ideal, as most of the time a file isn't modified, and so its dependency information is still up to date. By sending the dependency output to separate files, we can then use the make dependency information itself to determine when to update these files. The GNU make manual explains how this can be done by using the following target:
%.d: %.c
$(SHELL) -ec '$(CC) -M $(CPPFLAGS) $< | sed '\"s/$*.o/& $@/g'\" > $@'
This pipes the output through the sed program to change
foo.o : foo.c foo.h
into:
foo.o foo.d : foo.c foo.h
It does this by searching for the .o part of the target, and then rewriting the target to include the output dependency file name, which is within the $@ makefile variable.
The extra dependency on the *.d dependency file means that it will be rebuilt if the source file, or any of its dependencies are altered. This is perfect - by including all of the *.d files together, we no longer have to edit the makefile whenever header interfaces are refractored. The only requirement is to add new source files, and target programs when needed, much less work than before.
The above isn't the last word on dependency calculation though. The first problem is that it includes all dependencies, even those on system headers in the files. This can be fixed by changing to using the -MM instead of -M gcc flag. (This flag is non-portable though, and may not be implemented on other compilers. Whereas the -M flag is quite standard.)
The second problem is slightly annoying. When a header is deleted, the dependency files may still mention it as a prerequisite. Since make doesn't know how to make the header file, it will stop with a warning or error until you delete the broken dependency files. To fix this, all we need to do is wrap the prerequisite list with a $(wildcard ) function call. This will filter out all missing files, preventing the error. The sed script needs to change to:
sed -n "H;$$ {g;s@.*:\(.*\)@$*.o $@: \$$\(wildcard\1\)@;p}"
This sed script first collates the whole file into the hold area via the H; statement. At the end of the file the ${} regexp triggers. (Note the dollar sign needs to be escaped into two dollar signed due to it running within make .) This then copies the hold-space into pattern-space, and searches the file for the colon. It wraps the right part with the function call, and creates the correct targets on the left. Note that we use '@' as a separator instead of a slash because slashes within filenames would confuse sed . The reason the whole file needs to be read is that gcc can output multi-line dependency information. This needs to be properly wrapped by the $wildcard function call.
The third problem is that the above method doesn't work for whole-program compilation. This is where all the *.c files in a program are listed on the compilers command line, and are compiled together to produce the result. This allows inter-file optimization, producing faster programs and libraries. Since the object files are not used, the dependency graph is wrong, and thus the above needs to be modified.
The simplest way to do this is to also record the dependency information within variables. We may then use a function to calculate what is required from the variables depending what sources the target is compiled from. Thus the sed script needs to change once more:
sed -n "H;$$ {g;s@.*:\(.*\)@$< := \$$\(wildcard\1\)\n$*.o $@: $$\($<\)@;p}"
This version uses a trick of setting a variable with the same name as the source file to a list of its dependencies. i.e. we end up with something like:
foo.c := $(wildcard foo.c foo.h)
foo.o foo.c.d: $(foo.c)
Where we may use $(foo.c) to obtain the dependencies of foo.c when required. Note how we have used the .c.d extension instead of the more obvious .d This is done because there may be multiple files with the same root used to create an executable. i.e.
g++ foo.c foo.S foo.cpp -o foo
Is a perfectly reasonable command line, creating the program foo from a C file, a pre-processorable assembler file, and a C++ file. Thus multiple target patterns are required. The nicest way to write this may be to use a macro:
create_d = $(SHELL) -ec 'gcc -MM $(CPPFLAGS) $< | sed -n "H;$$ {g;s@.*:\(.*\)@$< := \$$\(wildcard\1\)\n$*.o $@: $$\($<\)@;p}" > $@'
%.S.d: %.S
$(create_d)
%.c.d: %.c
$(create_d)
%.cpp.d: %.cpp
$(create_d)
Finally, how do we use the dependency information we now have stashed away in variables? The trick here is constructing a make macro-function to expand the list of program sources into program dependencies. A set that do this is:
dependless = %.o %.a %.d %.h
expand = $($(var)) $(var) $(var).d
depend_test = $(if $(filter $(dependless),$(var)),$(var),$(expand))
depend = $(sort $(foreach var,$(1),$(depend_test)))
With these, we may use the target:prerequisite pair
$(PROGRAM): $(call depend,$(SRCS))
To automatically calculate the exact dependencies required for our program to be built. The only problem remaining is that the prerequisite list no longer contains just source files. We cannot use the helpful $^ automatic makefile variable in the compilation command line as it will drag in extra things we do not want. To fix this, another macro is required:
& = $(filter-out %.h %.d,$^)
This introduces the magic variable $& which will act exactly like $^ would have done. Note that since it is one character long, we do not have to use the syntax $(&) , and can use the nice short representation.
Now only one thing remains, we need to find the best way of including the dependency files. The obvious way of doing this is enumerating all of the source files, and then using pattern substitution to get the list of dependency files. This suffers from the problem that there may not be such a list in a complex makefile, with multiple programs being built. It turns out that such a list isn't required. The simple wildcard-include works:
include $(wildcard *.d)
This misses the dependencies for source files that haven't be built yet. However, if a source file hasn't been built, then we know it is out of date, and don't really care about its exact dependency information. Thus this quick trick really does work.
More complex build systems have multiple directories of source code. For many years, people have used recursive build systems, where sub-makes are spawned in the subdirectories, and so on up the directory tree. The problem with this technique is that the dependency graph is broken into pieces. The missing dependencies mean that make does not do its job properly, and either too much or too little is rebuilt. Another problem is that parallel builds are not efficiently done as the sub-makes cannot communicate between themselves to obtain the correct number of threads to run.
The solution to this is to use a non-recursive build system. One way to do this that obviously doesn't scale is to use a single monolithic makefile in the root directory. A better method is to have the makefiles for the subdirectories included into the main makefile somehow via the include directive. The problem there is that the paths in the included makefiles are relative to the root directory. Thus all the files need to have the path prefixed. This can be a major pain to do.
Fortunately, the path calculation and recursive inclusion can be automated. We define the following macros:
inc_before := mak_before.mk
inc_after := mak_after.mk
rest = $(wordlist 2,$(words $(1)),$(1))
get_dirlist = $(inc_before) $~$(firstword $(1))/makefile.inc $(wildcard $~$(firstword $(1))/*.d) $(inc_after)\
$(if $(rest),$(call get_dirlist,$(rest)),)
scan_subdirs = $(eval dirlist:=$(SUBDIRS) $(dirlist)) $(call get_dirlist,$(SUBDIRS))
This allows one to use the syntax:
SUBDIRS := dir1 dir2 dir3
include $(scan_subdirs)
To include the makefiles (called makefile.inc ) within each of the directories dir1, dir2 and dir3. The above also makes sure to correctly include the *.d dependency files. However, what are the mak_before.mk and mak_after.mk ? These are files used to keep track of the directory information. The source for mak_before.mk is:
# Called before including a sub directory makefile
# Save directory name
dstackp := $(dstackp).X
dstack$(dstackp) := $~
# Build directory name
~ := $~$(firstword $(dirlist))/
# Remove current directory from list
dirlist := $(call rest,$(dirlist))
This file records the previous path, and then sets the $~ variable to be the path for the current directory. It obtains the current path via the $(dirlist) variable which is itself updated in the $(scan_subdirs) macro. Finally, the mak_after.mk file is:
# Called after including a sub directory makefile
# Add defaults to clean files
cleanfiles += $(addprefix $~,$(clean_default))
# Restore directory name
~ := $(dstack$(dstackp))
dstackp := $(basename $(dstackp))
This resets the path variable, and also updates the list of files to be cleaned. An example list of default files to be removed is
clean_default = *.o *.a *.so *.so.* *.d
This will be expanded in every source directory and saved into the $(cleanfiles) variable for later usage with the clean: target in the main makefile.
The above will make the $~ variable contain the path relative to the root makefile. This allows the sub-makefiles to include rules like:
$~foo_prog:$(call depend $~foo.c)
which will be expanded into something like
directory/path/to/foo_prog:$(call depend directory/path/to/foo.c)
Thus a large amount of error-prone typing may be avoided. However, there is one problem. make does two passes through a makefile. It is only in the second pass that variables within shell commands are expanded. Thus if you use $~ within a compiler command line, then it will use the last definition for it. Since the commands are evaluated at "top level", this will result in an empty path, and thus empty variable. This is typically not what is desired.
Fortunately, there is a work-around for this problem. Typically, the target resides within the directory of the included sub-makefile, thus one may use the $(@D) variable with a trailing slash instead. To prevent misuse of the $~ variable, I suggest setting it to some invalid value after including the subdirectories in the root makefile. i.e.
SUBDIRS := some_dir some_other_dir
include $(scan_subdirs)
~:= tilde is broken in commands
Since it is unlikely you have a target or source called "tilde" or "is", this should flag an error.
A similar problem can exist with variables updated by the += construct. This is often used to update a list of targets within sub-makefiles, so that invocation by the root makefile does what is wanted. (An example usage is the $(cleanfiles) variable above.) The problem is that if the variable is initially initialized by a '=' instead of a ':=' operator, then the $~ evaluation will wait until after the first makefile pass. The wrong value will be used, and wrong results will be obtained. To fix this, simply use ':=' to initialize the appended-to variable in the root makefile.
The above tricks should hopefully remove much of the complexity from using a non-recursive build system based on make , and hide it within macros. The individual sub-makefiles may then be quite simple, and only list sources and targets. The dependencies can be calculated as required by the automation. Simple makefiles are much easier to maintain, and keep up to date. They also make debugging dependency problems easier. No longer will you have to worry about building too much or too little.
|
Comments
Daz said...Amazing.
Can you put up a tarball with all the code and a README (or this HTML file) somewhere?
Can't we use something elegant in the year 2010?
The root makefile there also shows a demo use of autoconf with this build system (although none of the source actually uses it.)
$(LIB): $(call depend,$(SRCS))
$(CC) $& $(CFLAGS) $(LDFLAGS) $(WFLAGS) -shared -fpic -Wl,-soname,$(LIBSONAME) -o $@ $(LIBS)
sed filter doesn't handle files in other directories. This works better:
$(CXX) -MM $(CPPFLAGS) $< | sed 's,\($*\)\.o[ :]*,\1.o $@ : ,g' > $@
P.S. You should delete the "Ryan Sharp" comment.
P.P.S. Your captcha is unreadable. I've been trying for like 10 times now...
This tutorial is awesome!
Keep going...
One needs to explicitly checkout the files to edit/change.
I am not sure if Makefile can use the checked out file list to compute the dependencies and build minimal time, assuming that a fraction of the source code changes in code-build-test iterations most of the time
I dont think current makefiles use such optimization
Sriram
assume your makefile defines ...
- src_files as a list of source files, e.g. file1.cxx here/file2.cpp there/file3.cc
- build_dir for *.o and *.d
then try this snippet from my make tools (ignore $(build_link)):
# Compile objects (one rule per file allows arbitrary suffixes and directories)
define obj_rule
$(1): $(2) | $(obj_dirs) $(build_link)
$(CXX) -c $$< -o $$@ $(CPPFLAGS) $(CXXFLAGS)
endef
$(foreach src, $(src_files), $(eval $(call obj_rule, $(addprefix $(build_dir)/, $(addsuffix .o, $(basename $(src)))), $(src))))
# Compile dependencies (one rule per file allows arbitrary suffixes and directories)
define dep_rule
$(1): $(2) | $(obj_dirs) $(build_link)
@set -e
$(CXX) -MM -MP -MT $(1) -MT $(1) $(CPPFLAGS) $$< | sed 's/.d /.o /' > $$@
endef
$(foreach src, $(src_files), $(eval $(call dep_rule, $(addprefix $(build_dir)/, $(addsuffix .d, $(basename $(src)))), $(src))))
It creates objects and deps in the same subdirs under $(build_dir) as the sources are located :)
Forget thge above, it works a bit different, I needed to post the whole stuff, Sorry.