Thursday, December 13, 2007

Mixed-Language Programming and External Linkage

http://developers.sun.com/solaris/articles/external_linkage.html
Dec 13, 2007

Article
Mixed-Language Programming and External Linkage




By Giri
Mandalika,
March 2005,
Updated
March 2006

Abstract: This article introduces
the concept of linkage and shows how
a simple C++ program fails without
language linkage, but can succeed
with proper linkage.

Contents:

* Introduction
* The Problem
* The Reason
* The Solution
* Resources

It is a common practice to call
functions of a C library from a C++
program. This works out well as long
as developers restrict themselves to
the standard headers and libraries
that were supplied with the
operating system. But novice
programmers may stumble with some
link-time errors, as soon as they
try to call methods of their own C
library from a C++ program.
Potential reasons for the failure
could include unfamiliarity with
linkage specifications and how C/C++
compilers handle symbols during the
compilation.

This article briefly introduces the
concept of linkage and shows how a
simple C++ program fails without
language linkage, and succeeds with
proper linkage. Mixing code written
in C++ with code written in C is
relatively straightforward, as C++
is mostly a superset of C. Although
mixing C++ objects with objects in
languages other than C is allowed,
it is a bit more complicated, hence
this article restricts the
discussion to C and C++ objects.

Introduction

The C++ standard provides a
mechanism called linkage
specification for mixing code that
was written in different programming
languages and was compiled by the
respective compilers, in the same
program. Linkage specification
refers to the protocol for linking
functions or procedures written in
different languages. Linkage is the
term used by the C++ standard to
describe the accessibility of
objects from one file to another or
even within the same file. Three
types of linkage exist:

* No linkage
* Internal linkage
* External linkage

Something internal to a function, in
regard to its arguments, variables,
and so on, always has no linkage and
hence can be accessed only within
the function.

Sometimes it is necessary to declare
functions and other objects within a
single file in a way that allows
them to reference each other, but
not to be accessible from outside
that file. This can be done through
internal linkage. Symbols with
internal linkage only refer to the
same object within a single source
file. Prefixing the declarations
with the keyword static changes the
linkage of external objects from
external linkage to internal
linkage.

Objects that have external linkage
are all considered to be located at
the outermost level of the program.
This is the default linkage for
functions and anything declared
outside of a function. All instances
of a particular name with external
linkage refer to the same object in
the program. If two or more
declarations of the same symbol have
external linkage but with
incompatible types (for example,
mismatch of declaration and
definition), then the program may
either crash or show abnormal
behavior. The rest of the article
discusses one of the issues with
mixed code and provides a
recommended solution with external
linkage.

The Problem

In the real world, it is very common
to use the functionality of code
written in one programming language
from code written in another. A
trivial example is a C++ programmer
relying on a standard C library
(libc) for sorting a series of
integers with the "quick sort"
technique. It works because the C
implementation takes care of the
language linkage for us. But we need
to take additional care if we use
our own libraries written in C, from
a C++ program. Otherwise the
compilation may fail with link
errors caused by unresolved symbols.
Consider the following example:

Assume that we're writing C++ code
and wish to call a C function from C
++ code. Here's the code for the
callee, for example, C routine:

%cat greet.h
extern char *greet();

%cat greet.c
#include "greet.h"

char *greet() {
return ((char *) "Hello!");
}

%cc -G -o libgreet.so greet.c


Note: The extern keyword declares a
variable or function and specifies
that it has external linkage, i.e.,
its name is visible from files other
than the one in which it's defined.

Let's try to call the C function
greet() from a C++ program.

%cat mixedcode.cpp
#include <iostream.h>
#include "greet.h"

int main() {
char *greeting = greet();
cout << greeting << "\n";
return (0);
}

%CC -lgreet mixedcode.cpp
Undefined first referenced
symbol in file
char*greet() mixedcode.o
ld: fatal: Symbol referencing errors. No output written to a.out


Though the C++ code is linked with
the dynamic library that holds the
implementation for greet(),
libgreet.so, the linking failed with
undefined symbol error. What went
wrong?

The Reason

The reason for the link error is
that a typical C++ compiler mangles
(encodes) function names to support
function overloading. So, the symbol
greet is changed to something else
depending on the algorithm
implemented in the compiler during
the name mangling process. Hence the
object file does not have the symbol
greet anywhere in the symbol table.
The symbol table of mixedcode.o
confirms this. Let's have a look at
the symbol tables of both
libgreet.so and mixedcode.o:

%elfdump1 -s libgreet.so

Symbol Table Section: .symtab
index value size type bind oth ver shndx name
...
[1] 0x00000000 0x00000000 FILE LOCL D 0 ABS libgreet.so
...
[37] 0x00000268 0x00000004 OBJT GLOB D 0 .rodata _lib_version
[38] 0x000102f3 0x00000000 OBJT GLOB D 0 .data1 _edata
[39] 0x00000228 0x00000028 FUNC GLOB D 0 .text greet
[40] 0x0001026c 0x00000000 OBJT GLOB D 0 .dynamic _DYNAMIC

%elfdump -s mixedcode.o

Symbol Table Section: .symtab
index value size type bind oth ver shndx name
[0] 0x00000000 0x00000000 NOTY LOCL D 0 UNDEF
[1] 0x00000000 0x00000000 FILE LOCL D 0 ABS mixedcode.cpp
[2] 0x00000000 0x00000000 SECT LOCL D 0 .rodata
[3] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF
__1cDstd2l6Frn0ANbasic_ostream4Ccn0ALchar_traits4Cc____pkc_2_
[4] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF __1cFgreet6F_pc_
[5] 0x00000000 0x00000000 NOTY GLOB D 0 UNDEF __1cDstdEcout_
[6] 0x00000010 0x00000050 FUNC GLOB D 0 .text main
[7] 0x00000000 0x00000000 NOTY GLOB D 0 ABS __fsr_init_value

%dem2 __1cFgreet6F_pc_

__1cFgreet6F_pc_ == char*greet()


char*greet() has been mangled to
__1cFgreet6F_pc_ by the Sun Studio 9
C++ compiler. That's the reason why
the static linker (ld) couldn't
match the symbol in the object file.

Note that a C compiler that complies
with the C99 standard may mangle
some names. For example, on systems
in which linkers cannot accept
extended characters, a C compiler
may encode the universal character
name in forming valid external
identifiers.

The Solution

The C++ standard provides a
mechanism called linkage
specification to enables smooth
compilation of mixed code. Linkage
between C++ and non-C++ code
fragments is called language
linkage. All function types,
function names, and variable names
have a default C++ language linkage.
Language linkage can be achieved
using the following linkage
specification.

Linkage specification:

extern string-literal {
function-declaration
function-declaration
}
extern string-literal function-declaration;


The string-literal specifies the
linkage associated with a particular
function, for example, C and C++.
Every C++ implementation provides
for linkage to functions written in
C language ("C") and linkage to C++
("C++").

The solution to the problem under
discussion is to ask the C++
compiler to use C mangling for the
external functions to be called, so
we can use the functionality of
external C functions from C++ code,
without any issues. We can
accomplish this using the linkage to
C. The following declaration of
greet() in greet.h should resolve
the problem:

extern "C" char *greet();


Because we were calling C code from
a C++ program, C linkage was used
for the routine greet(). The linkage
directive extern "C" tells the
compiler to change from C++ mangling
to C mangling for the function, and
to use C calling conventions while
sending external information to the
linker. In other words, the C
linkage specification forces the C++
compiler to adopt C conventions,
which are not the same as C++
conventions.

So, let's modify the header greet.h,
and recompile:

%cat greet.h
#if defined __cplusplus
extern "C" {
#

Thursday, November 22, 2007

Multi-file projects and the GNU Make utility

==============================================================================
C-Scene Issue #2
Multi-file projects and the GNU Make utility
Author: George Foot
Email: george.foot@merton.ox.ac.uk
Occupation: Student at Merton College, Oxford University, England
IRC nick: gfoot
==============================================================================

Disclaimer: The author accepts no liability whatsoever for any
damage this may cause to anything, real, abstract or
virtual, that you may or may not own. Any damage
caused is your responsibility, not mine.

Ownership: The section `Multi-file projects' remains the
property of the author, and is copyright (c) George
Foot May-July 1997. The remaining sections are the
property of CScene and are copyright (c) 1997 by
CScene, all rights reserved. Distribution of this
article, in whole or in part, is subject to the same
conditions as any other CScene article.


0) Introduction
~~~~~~~~~~~~~~~

This article will explain firstly why, when and how to split
your C source code between several files sensibly, and it will
then go on to show you how the GNU Make utility can handle all
your compilation and linking automatically. Users of other make
utilities may still find the information useful, but it may
require some adaptation to work on other utilities. If in doubt,
try it out, but check the manual first.


1) Multi-file projects
~~~~~~~~~~~~~~~~~~~~~~

1.1 Why use them?
-----------------

Firstly, then, why are multi-file projects a good thing?
They appear to complicate things no end, requiring header
files, extern declarations, and meaning you need to search
through more files to find the function you're looking
for.

In fact, though, there are strong reasons to split up
projects. When you modify a line of your code, the
compiler has to recompile everything to create a new
executable. However, if your project is in several files
and you modify one of them, the object files for the other
source files are already on disk, so there's no point in
recompiling them. All you need to do is recompile the file
that was changed, and relink the object files. In a large
project this can mean the difference between a lengthy
(several minutes to several hours) rebuild and a ten or
twenty second adjustment.

With a little organisation, splitting a project between
files can make it much easier to find the piece of code
you are looking for. It's simple - you split the code
between the files based upon what the code does. Then if
you're looking for a routine you know exactly where to
find it.

It is much better to create a library from many object
files than from a single object file. Whether or not this
is a real advantage depends what system you're using, but
when gcc/ld links a library into a program at link time it
tries not to link in unused code. It can only exclude
entire object files from the library at a time, though, so
if you reference any symbols from a particular object file
of a library the whole object file must be linked in. If
the library is very segmented, the resulting executables
can be much smaller than they would be if the library
consisted of a single object file.

Also, since your program is very modular with the minimum
amount of sharing between files there are many other
benefits -- bugs are easier to track down, modules can
often be reused in another project, and last but not
least, other people will find it much easier to understand
what your code is doing.


1.2 When to split up your projects
----------------------------------

It is obvisouly not sensible to split up *everything*;
small programs like `Hello World' can't really be split
anyway since there's nothing to split. Splitting up small
throwaway test programs is pretty pointless too. In
general, though, I split things whenever doing so seems to
improve the layout, development and readability of the
program. This is in fact true most of the time.

The decision about what to split and how is of course
yours; I can only make general suggestions here, which you
may or may not choose to follow.

If you are developing a fairly large project, you should
think before you start how you are going to implement it,
and create several (appropriately named) files initially
to hold your code. Of course, don't hesitate to create new
files later in development, but if you do then you are
changing your mind and should perhaps think about whether
some other structural changes would be appropriate.

For medium-sized projects, you can use the above technique
of course, or you might be able to just start typing, and
split the file up later when it is getting hard to manage.
In my experience, though, it is a great deal simpler to
start off with a scheme in mind and stick to it or adapt
it as the program's needs change during development.


1.3 How to split up projects
----------------------------

Again, this is strictly my opinion; you may (probably
will?) prefer to lay things out differently. This is
touching on the controversial topic of coding style; what
I present here is simply my personal preference (along
with reasons for each of these guidelines):

i) Don't make header files which span several source
files (exception: library header files). It's much
easier to track and usually more efficient if each
header file only declares symbols from one source
file. Otherwise, changing the structure of one
source file (and its header file) may cause more
files to be rebuilt that is really necessary.

ii) Where appropriate, do use more than one header file
for a source file. It is often useful to seperate
function prototypes, type definitions, etc, from the
C source file into a header file even when they are
not publicly available. Making one header file for
public symbols and one for private symbols means
that if you change the internals of the file you can
recompile it without having to recompile other files
that use the public header file.

iii) Don't duplicate information in several header files.
If you need to, #include one in the other, but don't
write out the same header information twice. The
reason for this is that if you change the
information in the future you will only need to
change it once, rather than hunting for duplicates
which would also need modifying.

iv) Make each source file #include all the header files
which declare information in the source file. Doing
this means that the compiler is more likely to pick
out mistakes, where you have declared something
differently in the header file to what it is in the
source file.


1.4 Notes on common errors
--------------------------

a) Identifier clashes between source files: In C,
variables and functions are by default public, so that
any C source file may refer to global variables and
functions from another C source file. This is true even
if the file in question does not have a declaration or
prototype for the variable or function. You must,
therefore, ensure that the same symbol name is not used
in two different files. If you don't do this you will
get linker errors and possibly warnings during
compilation.

One way of doing this is to prefix public symbols with
some string which depends on the source file they appear
in. For example, all the routines in gfx.c might begin
with the prefix `gfx_'. If you are careful with the way
you split up your program, use sensible function names,
and don't go overboard with global variables, this
shouldn't be a problem anyway.

To prevent a symbol from being visible from outside the
source file it is defined in, prefix its definition
with the keyword `static'. This is useful for small
functions which are used internally by a file, and
won't be needed by any other file.

b) Multiply defined symbols (again): A header file is
literally substituted into your C code in place of the
#include statement. Consequently, if the header file is
#included in more than one source file all the
definitions in the header file will occur in both
source files. This causes them to be defined more than
once, which gives a linker error (see above).

Solution: don't define variables in header files. You
only want to declare them in the header file, and
define them (once only) in the appropriate C source
file, which should #include the header file of course
for type checking. The distinction between a
declaration and a definition is easy to miss for
beginners; a declaration tells the compiler that the
named symbol should exist and should have the specified
type, but it does not cause the compiler to allocate
storage space for it, while a definition does allocate
the space. To make a declaration rather than a
definition, put the keyword `extern' before the
definition.

So, if we have an integer called `counter' which we
want to be publicly available, we would define it in a
source file (one only) as `int counter;' at top level,
and declare it in a header file as `extern int
counter;'.

Function prototypes are implicitly extern, so they do
not create this problem.

c) Redefinitions, redeclarations, conflicting types:
Consider what happens if a C source file #includes both
a.h and b.h, and also a.h #includes b.h (which is
perfectly sensible; b.h might define some types that
a.h needs). Now, the C source file #includes b.h twice.
So every #define in b.h occurs twice, every declaration
occurs twice (not actually a problem), every typedef
occurs twice, etc. In theory, since they are exact
duplicates it shouldn't matter, but in practice it is
not valid C and you will probably get compiler errors
or at least warnings.

The solution to this problem is to ensure that the body
of each header file is included only once per source
file. This is generally achieved using preprocessor
directives. We will #define a macro for each header
file, as we enter the header file, and only use the
body of the file if the macro is not already defined.
In practice it is as simple as putting this at the
start of each header file:

#ifndef FILENAME_H
#define FILENAME_H

and then putting this at the end of it:

Blog Archive