Tuesday, March 10, 2009

linker - relocation


http://www.airs.com/blog/archives/category/programming/page/14/

Basic Linker Data Types

The linker operates on a small number of basic data types: symbols, relocations, and contents. These are defined in the input object files. Here is an overview of each of these.

A symbol is basically a name and a value. Many symbols represent static objects in the original source code–that is, objects which exist in a single place for the duration of the program. For example, in an object file generated from C code, there will be a symbol for each function and for each global and static variable. The value of such a symbol is simply an offset into the contents. This type of symbol is known as a defined symbol. It’s important not to confuse the value of the symbol representing the variable my_global_var with the value of my_global_var itself. The value of the symbol is roughly the address of the variable: the value you would get from the expression &my_global_var in C.

Symbols are also used to indicate a reference to a name defined in a different object file. Such a reference is known as an undefined symbol. There are other less commonly used types of symbols which I will describe later.

During the linking process, the linker will assign an address to each defined symbol, and will resolve each undefined symbol by finding a defined symbol with the same name.

A relocation is a computation to perform on the contents. Most relocations refer to a symbol and to an offset within the contents. Many relocations will also provide an additional operand, known as the addend. A simple, and commonly used, relocation is “set this location in the contents to the value of this symbol plus this addend.” The types of computations that relocations do are inherently dependent on the architecture of the processor for which the linker is generating code. For example, RISC processors which require two or more instructions to form a memory address will have separate relocations to be used with each of those instructions; for example, “set this location in the contents to the lower 16 bits of the value of this symbol.”

During the linking process, the linker will perform all of the relocation computations as directed. A relocation in an object file may refer to an undefined symbol. If the linker is unable to resolve that symbol, it will normally issue an error (but not always: for some symbol types or some relocation types an error may not be appropriate).

The contents are what memory should look like during the execution of the program. Contents have a size, an array of bytes, and a type. They contain the machine code generated by the compiler and assembler (known as text). They contain the values of initialized variables (data). They contain static unnamed data like string constants and switch tables (read-only data or rdata). They contain uninitialized variables, in which case the array of bytes is generally omitted and assumed to contain only zeroes (bss). The compiler and the assembler work hard to generate exactly the right contents, but the linker really doesn’t care about them except as raw data. The linker reads the contents from each file, concatenates them all together sorted by type, applies the relocations, and writes the result into the executable file.

Basic Linker Operation

At this point we already know enough to understand the basic steps used by every linker.

* Read the input object files. Determine the length and type of the contents. Read the symbols.
* Build a symbol table containing all the symbols, linking undefined symbols to their definitions.
* Decide where all the contents should go in the output executable file, which means deciding where they should go in memory when the program runs.
* Read the contents data and the relocations. Apply the relocations to the contents. Write the result to the output file.
* Optionally write out the complete symbol table with the final values of the symbols.


In the common case, code may be compiled in two different modes. By default, code is position dependent. Putting position dependent code into a shared library will cause the program linker to generate a lot of relocation information, and cause the dynamic linker to do a lot of processing at runtime. Code may also be compiled in position independent mode, typically with the -fpic option. Position independent code is slightly slower when it calls a non-static function or refers to a global or static variable. However, it requires much less relocation information, and thus the dynamic linker will start the program faster.

Position independent code will call non-static functions via the Procedure Linkage Table or PLT. This PLT does not exist in .o files. In a .o file, use of the PLT is indicated by a special relocation. When the program linker processes such a relocation, it will create an entry in the PLT. It will adjust the instruction such that it becomes a PC-relative call to the PLT entry. PC-relative calls are inherently position independent and thus do not require a relocation entry themselves. The program linker will create a relocation for the PLT entry which tells the dynamic linker which symbol is associated with that entry. This process reduces the number of dynamic relocations in the shared library from one per function call to one per function called.


A detailed example can be found here:

http://bottomupcs.sourceforge.net/csbu/x3867.htm

Libraries
The Procedure Lookup Table

Libraries may contain many functions, and a program may end up including many libraries to get its work done. A program may only use one or two functions from each library of the many available, and depending on the run-time path through the code may use some functions and not others.

As we have seen, the process of dynamic linking is a fairly computationally intensive one, since it involves looking up and searching through many tables. Anything that can be done to reduce the overheads will increase performance.

The Procedure Lookup Table (PLT) facilitates what is called lazy binding in programs. Binding is synonymous with the fix-up process described above for variables located in the GOT. When an entry has been "fixed-up" it is said to be "bound" to its real address.

As we mentioned, sometimes a program will include a function from a library but never actually call that function, depending on user input. The process of binding this function is quite intensive, involving loading code, searching through tables and writing memory. To go through the process of binding a function that is not used is simply a waste of time.

Lazy binding defers this expense until the actual function is called by using a PLT.

Each library function has an entry in the PLT, which initially points to some special dummy code. When the program calls the function, it actually calls the PLT entry (in the same was as variables are referenced through the GOT).

This dummy function will load a few parameters that need to be passed to the dynamic linker for it to resolve the function and then call into a special lookup function of the dynamic linker. The dynamic linker finds the real address of the function, and writes that location into the calling binary over the top of the dummy function call.

Thus, the next time the function is called the address can be loaded without having to go back into the dynamic loader again. If a function is never called, then the PLT entry will never be modified but there will be no runtime overhead.
The PLT in action

Things start to get a bit hairy here! If nothing else, you should begin to appreciate that there is a fair bit of work in resolving a dynamic symbol!

Let us consider the simple "hello World" application. This will only make one library call to printf to output the string to the user.

Example 9-8. Hello World PLT example

$ cat hello.c
#include

int main(void)
5 {
printf("Hello, World!\n");
return 0;
}

10 $ gcc -o hello hello.c

$ readelf --relocs ./hello

Relocation section '.rela.dyn' at offset 0x3f0 contains 2 entries:
15 Offset Info Type Sym. Value Sym. Name + Addend
6000000000000ed8 000700000047 R_IA64_FPTR64LSB 0000000000000000 _Jv_RegisterClasses + 0
6000000000000ee0 000900000047 R_IA64_FPTR64LSB 0000000000000000 __gmon_start__ + 0

Relocation section '.rela.IA_64.pltoff' at offset 0x420 contains 3 entries:
20 Offset Info Type Sym. Value Sym. Name + Addend
6000000000000f10 000200000081 R_IA64_IPLTLSB 0000000000000000 printf + 0
6000000000000f20 000800000081 R_IA64_IPLTLSB 0000000000000000 __libc_start_main + 0
6000000000000f30 000900000081 R_IA64_IPLTLSB 0000000000000000 __gmon_start__ + 0;

We can see above that we have a R_IA64_IPLTLSB relocation for our printf symbol. This is saying "put the address of symbol printf into memory address 0x6000000000000f10". We have to start digging deeper to find the exact procedure that gets us the function.

Below we have a look at the disassembly of the main() function of the program.

Example 9-9. Hello world main()

4000000000000790
:
4000000000000790: 00 08 15 08 80 05 [MII] alloc r33=ar.pfs,5,4,0
4000000000000796: 20 02 30 00 42 60 mov r34=r12
400000000000079c: 04 08 00 84 mov r35=r1
5 40000000000007a0: 01 00 00 00 01 00 [MII] nop.m 0x0
40000000000007a6: 00 02 00 62 00 c0 mov r32=b0
40000000000007ac: 81 0c 00 90 addl r14=72,r1;;
40000000000007b0: 1c 20 01 1c 18 10 [MFB] ld8 r36=[r14]
40000000000007b6: 00 00 00 02 00 00 nop.f 0x0
10 40000000000007bc: 78 fd ff 58 br.call.sptk.many b0=4000000000000520 <_init+0xb0>
40000000000007c0: 02 08 00 46 00 21 [MII] mov r1=r35
40000000000007c6: e0 00 00 00 42 00 mov r14=r0;;
40000000000007cc: 01 70 00 84 mov r8=r14
40000000000007d0: 00 00 00 00 01 00 [MII] nop.m 0x0
15 40000000000007d6: 00 08 01 55 00 00 mov.i ar.pfs=r33
40000000000007dc: 00 0a 00 07 mov b0=r32
40000000000007e0: 1d 60 00 44 00 21 [MFB] mov r12=r34
40000000000007e6: 00 00 00 02 00 80 nop.f 0x0
40000000000007ec: 08 00 84 00 br.ret.sptk.many b0;;;

The call to 0x4000000000000520 must be us calling the printf function. We can find out where this is by looking at the sections with readelf.

Example 9-10. Hello world sections

$ readelf --sections ./hello
There are 40 section headers, starting at offset 0x25c0:

Section Headers:
5 [Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
...
10 [11] .plt PROGBITS 40000000000004c0 000004c0
00000000000000c0 0000000000000000 AX 0 0 32
[12] .text PROGBITS 4000000000000580 00000580
00000000000004a0 0000000000000000 AX 0 0 32
[13] .fini PROGBITS 4000000000000a20 00000a20
15 0000000000000040 0000000000000000 AX 0 0 16
[14] .rodata PROGBITS 4000000000000a60 00000a60
000000000000000f 0000000000000000 A 0 0 8
[15] .opd PROGBITS 4000000000000a70 00000a70
0000000000000070 0000000000000000 A 0 0 16
20 [16] .IA_64.unwind_inf PROGBITS 4000000000000ae0 00000ae0
00000000000000f0 0000000000000000 A 0 0 8
[17] .IA_64.unwind IA_64_UNWIND 4000000000000bd0 00000bd0
00000000000000c0 0000000000000000 AL 12 c 8
[18] .init_array INIT_ARRAY 6000000000000c90 00000c90
25 0000000000000018 0000000000000000 WA 0 0 8
[19] .fini_array FINI_ARRAY 6000000000000ca8 00000ca8
0000000000000008 0000000000000000 WA 0 0 8
[20] .data PROGBITS 6000000000000cb0 00000cb0
0000000000000004 0000000000000000 WA 0 0 4
30 [21] .dynamic DYNAMIC 6000000000000cb8 00000cb8
00000000000001e0 0000000000000010 WA 5 0 8
[22] .ctors PROGBITS 6000000000000e98 00000e98
0000000000000010 0000000000000000 WA 0 0 8
[23] .dtors PROGBITS 6000000000000ea8 00000ea8
35 0000000000000010 0000000000000000 WA 0 0 8
[24] .jcr PROGBITS 6000000000000eb8 00000eb8
0000000000000008 0000000000000000 WA 0 0 8
[25] .got PROGBITS 6000000000000ec0 00000ec0
0000000000000050 0000000000000000 WAp 0 0 8
40 [26] .IA_64.pltoff PROGBITS 6000000000000f10 00000f10
0000000000000030 0000000000000000 WAp 0 0 16
[27] .sdata PROGBITS 6000000000000f40 00000f40
0000000000000010 0000000000000000 WAp 0 0 8
[28] .sbss NOBITS 6000000000000f50 00000f50
45 0000000000000008 0000000000000000 WA 0 0 8
[29] .bss NOBITS 6000000000000f58 00000f50
0000000000000008 0000000000000000 WA 0 0 8
[30] .comment PROGBITS 0000000000000000 00000f50
00000000000000b9 0000000000000000 0 0 1
50 [31] .debug_aranges PROGBITS 0000000000000000 00001010
0000000000000090 0000000000000000 0 0 16
[32] .debug_pubnames PROGBITS 0000000000000000 000010a0
0000000000000025 0000000000000000 0 0 1
[33] .debug_info PROGBITS 0000000000000000 000010c5
55 00000000000009c4 0000000000000000 0 0 1
[34] .debug_abbrev PROGBITS 0000000000000000 00001a89
0000000000000124 0000000000000000 0 0 1
[35] .debug_line PROGBITS 0000000000000000 00001bad
00000000000001fe 0000000000000000 0 0 1
60 [36] .debug_str PROGBITS 0000000000000000 00001dab
00000000000006a1 0000000000000001 MS 0 0 1
[37] .shstrtab STRTAB 0000000000000000 0000244c
000000000000016f 0000000000000000 0 0 1
[38] .symtab SYMTAB 0000000000000000 00002fc0
65 0000000000000b58 0000000000000018 39 60 8
[39] .strtab STRTAB 0000000000000000 00003b18
0000000000000479 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
70 I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific);

That address is (unsurprisingly) in the .plt section. So there we have our call into the PLT! But we're not satisfied with that, let's keep digging further to see what we can uncover. We disassemble the .plt section to see what that call actually does.

Example 9-11. Hello world PLT

40000000000004c0 <.plt>:
40000000000004c0: 0b 10 00 1c 00 21 [MMI] mov r2=r14;;
40000000000004c6: e0 00 08 00 48 00 addl r14=0,r2
40000000000004cc: 00 00 04 00 nop.i 0x0;;
5 40000000000004d0: 0b 80 20 1c 18 14 [MMI] ld8 r16=[r14],8;;
40000000000004d6: 10 41 38 30 28 00 ld8 r17=[r14],8
40000000000004dc: 00 00 04 00 nop.i 0x0;;
40000000000004e0: 11 08 00 1c 18 10 [MIB] ld8 r1=[r14]
40000000000004e6: 60 88 04 80 03 00 mov b6=r17
10 40000000000004ec: 60 00 80 00 br.few b6;;
40000000000004f0: 11 78 00 00 00 24 [MIB] mov r15=0
40000000000004f6: 00 00 00 02 00 00 nop.i 0x0
40000000000004fc: d0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;;
4000000000000500: 11 78 04 00 00 24 [MIB] mov r15=1
15 4000000000000506: 00 00 00 02 00 00 nop.i 0x0
400000000000050c: c0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;;
4000000000000510: 11 78 08 00 00 24 [MIB] mov r15=2
4000000000000516: 00 00 00 02 00 00 nop.i 0x0
400000000000051c: b0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;;
20 4000000000000520: 0b 78 40 03 00 24 [MMI] addl r15=80,r1;;
4000000000000526: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8
400000000000052c: 01 08 00 84 mov r14=r1;;
4000000000000530: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15]
4000000000000536: 60 80 04 80 03 00 mov b6=r16
25 400000000000053c: 60 00 80 00 br.few b6;;
4000000000000540: 0b 78 80 03 00 24 [MMI] addl r15=96,r1;;
4000000000000546: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8
400000000000054c: 01 08 00 84 mov r14=r1;;
4000000000000550: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15]
30 4000000000000556: 60 80 04 80 03 00 mov b6=r16
400000000000055c: 60 00 80 00 br.few b6;;
4000000000000560: 0b 78 c0 03 00 24 [MMI] addl r15=112,r1;;
4000000000000566: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8
400000000000056c: 01 08 00 84 mov r14=r1;;
35 4000000000000570: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15]
4000000000000576: 60 80 04 80 03 00 mov b6=r16
400000000000057c: 60 00 80 00 br.few b6;;;

Let us step through the instructions. Firstly, we add 80 to the value in r1, storing it in r15. We know from before that r1 will be pointing to the GOT, so this is saying "store in r15 80 bytes into the GOT". The next thing we do is load into r16 the value stored in this location in the GOT, and post increment the value in r15 by 8 bytes. We then store r1 (the location of the GOT) in r14 and set r1 to be the value in the next 8 bytes after r15. Then we branch to r16.

In the previous chapter we discussed how functions are actually called through a function descriptor which contains the function address and the address of the global pointer. Here we can see that the PLT entry is first loading the function value, moving on 8 bytes to the second part of the function descriptor and then loading that value into the gp register before calling the function.

But what exactly are we loading? We know that r1 will be pointing to the GOT. We go 80 bytes past the got (0x50)

Example 9-12. Hello world GOT

$ objdump --disassemble-all ./hello
Disassembly of section .got:

6000000000000ec0 <.got>:
5 ...
6000000000000ee8: 80 0a 00 00 00 00 data8 0x02a000000
6000000000000eee: 00 40 90 0a dep r0=r0,r0,63,1
6000000000000ef2: 00 00 00 00 00 40 [MIB] (p20) break.m 0x1
6000000000000ef8: a0 0a 00 00 00 00 data8 0x02a810000
10 6000000000000efe: 00 40 50 0f br.few 6000000000000ef0 <_GLOBAL_OFFSET_TABLE_+0x30>
6000000000000f02: 00 00 00 00 00 60 [MIB] (p58) break.m 0x1
6000000000000f08: 60 0a 00 00 00 00 data8 0x029818000
6000000000000f0e: 00 40 90 06 br.few 6000000000000f00 <_GLOBAL_OFFSET_TABLE_+0x40>
Disassembly of section .IA_64.pltoff:
15
6000000000000f10 <.IA_64.pltoff>:
6000000000000f10: f0 04 00 00 00 00 [MIB] (p39) break.m 0x0
6000000000000f16: 00 40 c0 0e 00 00 data8 0x03b010000
6000000000000f1c: 00 00 00 60 data8 0xc000000000
20 6000000000000f20: 00 05 00 00 00 00 [MII] (p40) break.m 0x0
6000000000000f26: 00 40 c0 0e 00 00 data8 0x03b010000
6000000000000f2c: 00 00 00 60 data8 0xc000000000
6000000000000f30: 10 05 00 00 00 00 [MIB] (p40) break.m 0x0
6000000000000f36: 00 40 c0 0e 00 00 data8 0x03b010000
25 6000000000000f3c: 00 00 00 60 data8 0xc000000000;

0x6000000000000ec0 + 0x50 = 0x6000000000000f10, or the .IA_64.pltoff section. Now we're starting to get somewhere!

We can decode the objdump output so we can see exactly what is being loaded here. Swapping the byte order of the first 8 bytes f0 04 00 00 00 00 00 40 we end up with 0x4000000000004f0. Now that address looks familiar! Looking back up at the assemble output of the PLT we see that address.

The code at 0x4000000000004f0 firstly puts a zero value into r15, and then branches back to 0x40000000000004c0. Wait a minute! That's the start of our PLT section.

We can trace this code through too. Firstly we save the value of the global pointer (r2) then we load three 8 byte values into r16, r17 and finally, r1. We then branch to the address in r17. What we are seeing here is the actual call into the dynamic linker!

We need to delve into the ABI to understand exactly what is being loaded at this point. The ABI says two things -- dynamically linked programs must have a special section (called the DT_IA_64_PLT_RESERVE section) that can hold three 8 byte values. There is a pointer where this reserved area in the dynamic segment of the binary.

Example 9-13. Dynamic Segment

Dynamic segment at offset 0xcb8 contains 25 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6.1]
0x000000000000000c (INIT) 0x4000000000000470
5 0x000000000000000d (FINI) 0x4000000000000a20
0x0000000000000019 (INIT_ARRAY) 0x6000000000000c90
0x000000000000001b (INIT_ARRAYSZ) 24 (bytes)
0x000000000000001a (FINI_ARRAY) 0x6000000000000ca8
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
10 0x0000000000000004 (HASH) 0x4000000000000200
0x0000000000000005 (STRTAB) 0x4000000000000330
0x0000000000000006 (SYMTAB) 0x4000000000000240
0x000000000000000a (STRSZ) 138 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
15 0x0000000000000015 (DEBUG) 0x0
0x0000000070000000 (IA_64_PLT_RESERVE) 0x6000000000000ec0 -- 0x6000000000000ed8
0x0000000000000003 (PLTGOT) 0x6000000000000ec0
0x0000000000000002 (PLTRELSZ) 72 (bytes)
0x0000000000000014 (PLTREL) RELA
20 0x0000000000000017 (JMPREL) 0x4000000000000420
0x0000000000000007 (RELA) 0x40000000000003f0
0x0000000000000008 (RELASZ) 48 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x40000000000003d0
25 0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x40000000000003ba
0x0000000000000000 (NULL) 0x0;

Do you notice anything about it? It's the same value as the GOT. This means that the first three 8 byte entries in the GOT are actually the reserved area; thus will always be pointed to by the global pointer.

When the dynamic linker starts it is its duty to fill these values in. The ABI says that the first value will be filled in by the dynamic linker giving this module a unique ID. The second value is the global pointer value for the dynamic linker, and the third value is the address of the function that finds and fixes up the symbol.

Example 9-14. Code in the dynamic linker for setting up special values (from libc sysdeps/ia64/dl-machine.h)

/* Set up the loaded object described by L so its unrelocated PLT
entries will jump to the on-demand fixup code in dl-runtime.c. */

static inline int __attribute__ ((unused, always_inline))
5 elf_machine_runtime_setup (struct link_map *l, int lazy, int profile)
{
extern void _dl_runtime_resolve (void);
extern void _dl_runtime_profile (void);

10 if (lazy)
{
register Elf64_Addr gp __asm__ ("gp");
Elf64_Addr *reserve, doit;

15 /*
* Careful with the typecast here or it will try to add l-l_addr
* pointer elements
*/
reserve = ((Elf64_Addr *)
20 (l->l_info[DT_IA_64 (PLT_RESERVE)]->d_un.d_ptr + l->l_addr));
/* Identify this shared object. */
reserve[0] = (Elf64_Addr) l;

/* This function will be called to perform the relocation. */
25 if (!profile)
doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_resolve)->ip;
else
{
if (GLRO(dl_profile) != NULL
30 && _dl_name_match_p (GLRO(dl_profile), l))
{
/* This is the object we are looking for. Say that we really
want profiling and the timers are started. */
GL(dl_profile_map) = l;
35 }
doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_profile)->ip;
}

reserve[1] = doit;
40 reserve[2] = gp;
}

return lazy;
};

We can see how this gets setup by the dynamic linker by looking at the function that does this for the binary. The reserve variable is set from the PLT_RESERVE section pointer in the binary. The unique value (put into reserve[0]) is the address of the link map for this object. Link maps are the internal representation within glibc for shared objects. We then put in the address of _dl_runtime_resolve to the second value (assuming we are not using profiling). reserve[2] is finally set to gp, which has been found from r2 with the __asm__ call.

Looking back at the ABI, we see that the relocation index for the entry must be placed in r15 and the unique identifier must be passed in r16.

r15 has previously been set in the stub code, before we jumped back to the start of the PLT. Have a look down the entries, and notice how each PLT entry loads r15 with an incremented value? It should come as no surprise if you look at the relocations the printf relocation is number zero.

r16 we load up from the values that have been initialised by the dynamic linker, as previously discussed. Once that is ready, we can load the function address and global pointer and branch into the function.

What happens at this point is the dynamic linker function _dl_runtime_resolve is run. It finds the relocation; remember how the relocation specified the name of the symbol? It uses this name to find the right function; this might involve loading the library from disk if it is not already in memory, or otherwise sharing the code.

The relocation record provides the dynamic linker with the address it needs to "fix up"; remember it was in the GOT and loaded by the initial PLT stub? This means that after the first time the function is called, the second time it is loaded it will get the direct address of the function; short circuiting the dynamic linker.
Summary

You've seen the exact mechanism behind the PLT, and consequently the inner workings of the dynamic linker. The important points to remember are

*

Library calls in your program actually call a stub of code in the PLT of the binary.
*

That stub code loads an address and jumps to it.
*

Initially, that address points to a function in the dynamic linker which is capable of looking up the "real" function, given the information in the relocation entry for that function.
*

The dynamic linker re-writes the address that the stub code reads, so that the next time the function is called it will go straight to the right address.

No comments:

Blog Archive