Thursday, December 13, 2007

Mixed-Language Programming and External Linkage
Dec 13, 2007

Mixed-Language Programming and External Linkage

By Giri
March 2005,
March 2006

Abstract: This article introduces
the concept of linkage and shows how
a simple C++ program fails without
language linkage, but can succeed
with proper linkage.


* Introduction
* The Problem
* The Reason
* The Solution
* Resources

It is a common practice to call
functions of a C library from a C++
program. This works out well as long
as developers restrict themselves to
the standard headers and libraries
that were supplied with the
operating system. But novice
programmers may stumble with some
link-time errors, as soon as they
try to call methods of their own C
library from a C++ program.
Potential reasons for the failure
could include unfamiliarity with
linkage specifications and how C/C++
compilers handle symbols during the

This article briefly introduces the
concept of linkage and shows how a
simple C++ program fails without
language linkage, and succeeds with
proper linkage. Mixing code written
in C++ with code written in C is
relatively straightforward, as C++
is mostly a superset of C. Although
mixing C++ objects with objects in
languages other than C is allowed,
it is a bit more complicated, hence
this article restricts the
discussion to C and C++ objects.


The C++ standard provides a
mechanism called linkage
specification for mixing code that
was written in different programming
languages and was compiled by the
respective compilers, in the same
program. Linkage specification
refers to the protocol for linking
functions or procedures written in
different languages. Linkage is the
term used by the C++ standard to
describe the accessibility of
objects from one file to another or
even within the same file. Three
types of linkage exist:

* No linkage
* Internal linkage
* External linkage

Something internal to a function, in
regard to its arguments, variables,
and so on, always has no linkage and
hence can be accessed only within
the function.

Sometimes it is necessary to declare
functions and other objects within a
single file in a way that allows
them to reference each other, but
not to be accessible from outside
that file. This can be done through
internal linkage. Symbols with
internal linkage only refer to the
same object within a single source
file. Prefixing the declarations
with the keyword static changes the
linkage of external objects from
external linkage to internal

Objects that have external linkage
are all considered to be located at
the outermost level of the program.
This is the default linkage for
functions and anything declared
outside of a function. All instances
of a particular name with external
linkage refer to the same object in
the program. If two or more
declarations of the same symbol have
external linkage but with
incompatible types (for example,
mismatch of declaration and
definition), then the program may
either crash or show abnormal
behavior. The rest of the article
discusses one of the issues with
mixed code and provides a
recommended solution with external

The Problem

In the real world, it is very common
to use the functionality of code
written in one programming language
from code written in another. A
trivial example is a C++ programmer
relying on a standard C library
(libc) for sorting a series of
integers with the "quick sort"
technique. It works because the C
implementation takes care of the
language linkage for us. But we need
to take additional care if we use
our own libraries written in C, from
a C++ program. Otherwise the
compilation may fail with link
errors caused by unresolved symbols.
Consider the following example:

Assume that we're writing C++ code
and wish to call a C function from C
++ code. Here's the code for the
callee, for example, C routine:

%cat greet.h
extern char *greet();

%cat greet.c
#include "greet.h"

char *greet() {
return ((char *) "Hello!");

%cc -G -o greet.c

Note: The extern keyword declares a
variable or function and specifies
that it has external linkage, i.e.,
its name is visible from files other
than the one in which it's defined.

Let's try to call the C function
greet() from a C++ program.

%cat mixedcode.cpp
#include <iostream.h>
#include "greet.h"

int main() {
char *greeting = greet();
cout << greeting << "\n";
return (0);

%CC -lgreet mixedcode.cpp
Undefined first referenced
symbol in file
char*greet() mixedcode.o
ld: fatal: Symbol referencing errors. No output written to a.out

Though the C++ code is linked with
the dynamic library that holds the
implementation for greet(),, the linking failed with
undefined symbol error. What went

The Reason

The reason for the link error is
that a typical C++ compiler mangles
(encodes) function names to support
function overloading. So, the symbol
greet is changed to something else
depending on the algorithm
implemented in the compiler during
the name mangling process. Hence the
object file does not have the symbol
greet anywhere in the symbol table.
The symbol table of mixedcode.o
confirms this. Let's have a look at
the symbol tables of both and mixedcode.o:

%elfdump1 -s

Symbol Table Section: .symtab
index value size type bind oth ver shndx name
[1] 0x00000000 0x00000000 FILE LOCL D 0 ABS
[37] 0x00000268 0x00000004 OBJT GLOB D 0 .rodata _lib_version
[38] 0x000102f3 0x00000000 OBJT GLOB D 0 .data1 _edata
[39] 0x00000228 0x00000028 FUNC GLOB D 0 .text greet
[40] 0x0001026c 0x00000000 OBJT GLOB D 0 .dynamic _DYNAMIC

%elfdump -s mixedcode.o

Symbol Table Section: .symtab
index value size type bind oth ver shndx name
[0] 0x00000000 0x00000000 NOTY LOCL D 0 UNDEF
[1] 0x00000000 0x00000000 FILE LOCL D 0 ABS mixedcode.cpp
[2] 0x00000000 0x00000000 SECT LOCL D 0 .rodata
[3] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF
[4] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF __1cFgreet6F_pc_
[5] 0x00000000 0x00000000 NOTY GLOB D 0 UNDEF __1cDstdEcout_
[6] 0x00000010 0x00000050 FUNC GLOB D 0 .text main
[7] 0x00000000 0x00000000 NOTY GLOB D 0 ABS __fsr_init_value

%dem2 __1cFgreet6F_pc_

__1cFgreet6F_pc_ == char*greet()

char*greet() has been mangled to
__1cFgreet6F_pc_ by the Sun Studio 9
C++ compiler. That's the reason why
the static linker (ld) couldn't
match the symbol in the object file.

Note that a C compiler that complies
with the C99 standard may mangle
some names. For example, on systems
in which linkers cannot accept
extended characters, a C compiler
may encode the universal character
name in forming valid external

The Solution

The C++ standard provides a
mechanism called linkage
specification to enables smooth
compilation of mixed code. Linkage
between C++ and non-C++ code
fragments is called language
linkage. All function types,
function names, and variable names
have a default C++ language linkage.
Language linkage can be achieved
using the following linkage

Linkage specification:

extern string-literal {
extern string-literal function-declaration;

The string-literal specifies the
linkage associated with a particular
function, for example, C and C++.
Every C++ implementation provides
for linkage to functions written in
C language ("C") and linkage to C++

The solution to the problem under
discussion is to ask the C++
compiler to use C mangling for the
external functions to be called, so
we can use the functionality of
external C functions from C++ code,
without any issues. We can
accomplish this using the linkage to
C. The following declaration of
greet() in greet.h should resolve
the problem:

extern "C" char *greet();

Because we were calling C code from
a C++ program, C linkage was used
for the routine greet(). The linkage
directive extern "C" tells the
compiler to change from C++ mangling
to C mangling for the function, and
to use C calling conventions while
sending external information to the
linker. In other words, the C
linkage specification forces the C++
compiler to adopt C conventions,
which are not the same as C++

So, let's modify the header greet.h,
and recompile:

%cat greet.h
#if defined __cplusplus
extern "C" {

No comments:

Blog Archive