Thursday, November 22, 2007

Multi-file projects and the GNU Make utility

==============================================================================
C-Scene Issue #2
Multi-file projects and the GNU Make utility
Author: George Foot
Email: george.foot@merton.ox.ac.uk
Occupation: Student at Merton College, Oxford University, England
IRC nick: gfoot
==============================================================================

Disclaimer: The author accepts no liability whatsoever for any
damage this may cause to anything, real, abstract or
virtual, that you may or may not own. Any damage
caused is your responsibility, not mine.

Ownership: The section `Multi-file projects' remains the
property of the author, and is copyright (c) George
Foot May-July 1997. The remaining sections are the
property of CScene and are copyright (c) 1997 by
CScene, all rights reserved. Distribution of this
article, in whole or in part, is subject to the same
conditions as any other CScene article.


0) Introduction
~~~~~~~~~~~~~~~

This article will explain firstly why, when and how to split
your C source code between several files sensibly, and it will
then go on to show you how the GNU Make utility can handle all
your compilation and linking automatically. Users of other make
utilities may still find the information useful, but it may
require some adaptation to work on other utilities. If in doubt,
try it out, but check the manual first.


1) Multi-file projects
~~~~~~~~~~~~~~~~~~~~~~

1.1 Why use them?
-----------------

Firstly, then, why are multi-file projects a good thing?
They appear to complicate things no end, requiring header
files, extern declarations, and meaning you need to search
through more files to find the function you're looking
for.

In fact, though, there are strong reasons to split up
projects. When you modify a line of your code, the
compiler has to recompile everything to create a new
executable. However, if your project is in several files
and you modify one of them, the object files for the other
source files are already on disk, so there's no point in
recompiling them. All you need to do is recompile the file
that was changed, and relink the object files. In a large
project this can mean the difference between a lengthy
(several minutes to several hours) rebuild and a ten or
twenty second adjustment.

With a little organisation, splitting a project between
files can make it much easier to find the piece of code
you are looking for. It's simple - you split the code
between the files based upon what the code does. Then if
you're looking for a routine you know exactly where to
find it.

It is much better to create a library from many object
files than from a single object file. Whether or not this
is a real advantage depends what system you're using, but
when gcc/ld links a library into a program at link time it
tries not to link in unused code. It can only exclude
entire object files from the library at a time, though, so
if you reference any symbols from a particular object file
of a library the whole object file must be linked in. If
the library is very segmented, the resulting executables
can be much smaller than they would be if the library
consisted of a single object file.

Also, since your program is very modular with the minimum
amount of sharing between files there are many other
benefits -- bugs are easier to track down, modules can
often be reused in another project, and last but not
least, other people will find it much easier to understand
what your code is doing.


1.2 When to split up your projects
----------------------------------

It is obvisouly not sensible to split up *everything*;
small programs like `Hello World' can't really be split
anyway since there's nothing to split. Splitting up small
throwaway test programs is pretty pointless too. In
general, though, I split things whenever doing so seems to
improve the layout, development and readability of the
program. This is in fact true most of the time.

The decision about what to split and how is of course
yours; I can only make general suggestions here, which you
may or may not choose to follow.

If you are developing a fairly large project, you should
think before you start how you are going to implement it,
and create several (appropriately named) files initially
to hold your code. Of course, don't hesitate to create new
files later in development, but if you do then you are
changing your mind and should perhaps think about whether
some other structural changes would be appropriate.

For medium-sized projects, you can use the above technique
of course, or you might be able to just start typing, and
split the file up later when it is getting hard to manage.
In my experience, though, it is a great deal simpler to
start off with a scheme in mind and stick to it or adapt
it as the program's needs change during development.


1.3 How to split up projects
----------------------------

Again, this is strictly my opinion; you may (probably
will?) prefer to lay things out differently. This is
touching on the controversial topic of coding style; what
I present here is simply my personal preference (along
with reasons for each of these guidelines):

i) Don't make header files which span several source
files (exception: library header files). It's much
easier to track and usually more efficient if each
header file only declares symbols from one source
file. Otherwise, changing the structure of one
source file (and its header file) may cause more
files to be rebuilt that is really necessary.

ii) Where appropriate, do use more than one header file
for a source file. It is often useful to seperate
function prototypes, type definitions, etc, from the
C source file into a header file even when they are
not publicly available. Making one header file for
public symbols and one for private symbols means
that if you change the internals of the file you can
recompile it without having to recompile other files
that use the public header file.

iii) Don't duplicate information in several header files.
If you need to, #include one in the other, but don't
write out the same header information twice. The
reason for this is that if you change the
information in the future you will only need to
change it once, rather than hunting for duplicates
which would also need modifying.

iv) Make each source file #include all the header files
which declare information in the source file. Doing
this means that the compiler is more likely to pick
out mistakes, where you have declared something
differently in the header file to what it is in the
source file.


1.4 Notes on common errors
--------------------------

a) Identifier clashes between source files: In C,
variables and functions are by default public, so that
any C source file may refer to global variables and
functions from another C source file. This is true even
if the file in question does not have a declaration or
prototype for the variable or function. You must,
therefore, ensure that the same symbol name is not used
in two different files. If you don't do this you will
get linker errors and possibly warnings during
compilation.

One way of doing this is to prefix public symbols with
some string which depends on the source file they appear
in. For example, all the routines in gfx.c might begin
with the prefix `gfx_'. If you are careful with the way
you split up your program, use sensible function names,
and don't go overboard with global variables, this
shouldn't be a problem anyway.

To prevent a symbol from being visible from outside the
source file it is defined in, prefix its definition
with the keyword `static'. This is useful for small
functions which are used internally by a file, and
won't be needed by any other file.

b) Multiply defined symbols (again): A header file is
literally substituted into your C code in place of the
#include statement. Consequently, if the header file is
#included in more than one source file all the
definitions in the header file will occur in both
source files. This causes them to be defined more than
once, which gives a linker error (see above).

Solution: don't define variables in header files. You
only want to declare them in the header file, and
define them (once only) in the appropriate C source
file, which should #include the header file of course
for type checking. The distinction between a
declaration and a definition is easy to miss for
beginners; a declaration tells the compiler that the
named symbol should exist and should have the specified
type, but it does not cause the compiler to allocate
storage space for it, while a definition does allocate
the space. To make a declaration rather than a
definition, put the keyword `extern' before the
definition.

So, if we have an integer called `counter' which we
want to be publicly available, we would define it in a
source file (one only) as `int counter;' at top level,
and declare it in a header file as `extern int
counter;'.

Function prototypes are implicitly extern, so they do
not create this problem.

c) Redefinitions, redeclarations, conflicting types:
Consider what happens if a C source file #includes both
a.h and b.h, and also a.h #includes b.h (which is
perfectly sensible; b.h might define some types that
a.h needs). Now, the C source file #includes b.h twice.
So every #define in b.h occurs twice, every declaration
occurs twice (not actually a problem), every typedef
occurs twice, etc. In theory, since they are exact
duplicates it shouldn't matter, but in practice it is
not valid C and you will probably get compiler errors
or at least warnings.

The solution to this problem is to ensure that the body
of each header file is included only once per source
file. This is generally achieved using preprocessor
directives. We will #define a macro for each header
file, as we enter the header file, and only use the
body of the file if the macro is not already defined.
In practice it is as simple as putting this at the
start of each header file:

#ifndef FILENAME_H
#define FILENAME_H

and then putting this at the end of it:

No comments:

Blog Archive