Chapter 2 Basics: C language
2.1 Introduction
Most of the firmware we encounter today is written in C. In recent months and years it seems like Rust gets a little bit traction in embedded development, but the majority still writes C. For reverse engineering and to understand things in an effective way, it is very helpful to understand some very basic concepts from C first, which is what we are going to do in the following chapter.
2.2 C language: General terms and concepts
Terms explained in this chapter:
- compile
- assemble
- link
- translation unit
- object file
- relocatable object file
- executable object file
- ELF
Before we demystify how a firmware is build from source code, we need to introduce some terms, which are going to be important throughout the whole chapter. Figure 2.1 shows the different build stages of a firmware file, where source code files are translated into machine code
There are 3 basic steps which happen each time you build your software from its source code:
- compile
- assemble
- link
A compiler converts a high-level language like C to assembly, then an assembler converts assembly to machine code.
The compiler translates one translation unit to assembly: Once you have written your code in a .c
file (and maybe your .h
), you have build your first translation unit. After the C Preprocessor combines the c and h files into one, the translation unit is ready to be translated by the compiler. Roughly you can say: One c file is one translation unit.
The next step is to assemble the compilers output (which is is written in assembly) into a object file. At the most abstract level a firmware consist of data and machine code. This data and machine code obviously need to be organized into a well known format, to be manageable and understandable by the different tools and the developer. An object file is exactly that. It is the organized version of the data and instructions, we wrote in C. Additionally there is some meta information, as we’ll see soon.
The object file format which we are going to focus on throughout this book is ELF (there are others). At the stage of the linker (see 2.1) the object files are called relocatable object files. Once all relocatable object files are linked into one, this file is called executable object file.
The linker links the relocatable object files into an executable object file as specified in a linker script. They do instruct the linker on how to glue the different object files together, to build the final executable. The format of the executable object file is also ELF. Chapter 5 explains a secure world linker script in detail.
You can find a graphical summary on the build process in figure 2.1.
To execute your ELF file your Linux Kernel uses a ELF loader, which reads and loads the ELF you want to execute into memory. When we talk about firmware, most often one last step is done to create a firmware image. So a SoC normally does not load ELF files.
In the following, I am going to use the term build process to refer to all three steps. When I try to reference a specific step, I’ll use the appropriate term: compile, assemble or link.
2.3 C objects and identifiers: Terms and Concepts
Terms explained in this chapter:
- object
- scope
- visibility
- identifier
- declaration
- definition
- initialization
- storage class
- linkage
- storage duration
- extern specifier
- static specifier
An object in C is a region of data storage. They have different properties, which influence how they are handled during the build process. Keep in mind that the compiler and assembler always translate and assemble one translation unit. The linker then links those, so the following concepts and terms are relevant for one translation unit.
An object, as functions, enums, structs and so on are denoted by a identifier. A identifier has a scope, which is the portion of source it is visible. C has different kinds of scopes:
- block scope
- file scope
- function scope
int a; // scope: file
static int b; // scope: file
void main(int argc, int argv[]){ // argv and argc: function scope
int a; // scope: block. Hides the previous a (which has file scope)
...
}
The scope is an implicit property of an identifier. An identifier can only be used in the scope it is visible and often it is even destroyed, when its scope ends, e.g. a
in main()
is destroyed, after the function main ends.
To influence the visibility and lifetime of an object, you can change its storage class. There are two storage class specifiers, which influence two properties: the linkage and the storage duration:
- extern: external linkage, static duration
- static: internal linkage, static duration
Static duration means, that the variable lives trough the whole program lifetime (and not only in a block scope or function). All objects declared as static
or extern
have static storage duration. Linkage means, that the variable can be used in other scopes: Internal linkage meaning that the object can be used in all scopes of the current translation unit, external linkage means that the variable can even be used from scopes in other translation units!
static int a; // storage duration: static, linkage: internal
extern int b; // storage duration: static, linkage: external
int c; // storage duration: static, linkage: external
void main(void){
static int d; // storage duration: static, linkage: no linkage
extern int e; // storage duration: static, linkage: external
int f; // local only
}
When objects are defined at file-scope, static
and extern
influence only an objects linkage, since these file-scope objects are already storage duration static, per definition. File-scope identifiers have external linkage and static storage duration by default.
It is absolutely possible, that an object has no linkage. Then an object is only visible in the scope where it is declared.
Identifiers can be in three different states:
- declared
- defined
- initialized
Examples: Declared variables:
int a;
struct myStruct;
Defined variables:
int a;
struct myStruct {...};
Initialized variables:
= 1;
a = {...}; myStruct
2.4 Compiler, Assembler and Linker: Terms and concepts
Terms explained in this chapter:
- section
- symbol
- symbol table
- input section
- output section
In the previous chapter we learned how the storage duration and linkage influence the visibility of the object. In the following chapter we are going to look into how these properties are actually implemented by the compiler. State, storage duration, linkage all influence how an object is stored in memory. What happens in the background, when an object is declared with external linkage? What happens when an object is only defined but not initialized? Does static linkage influence the memory location also?
To answer all these questions, we first need to introduce a new term: A section. A section is a range of addresses in a memory region. All data in one section holds data for some particular purpose. We already talked about how an object file consists of data and machine code, which you can see as two “purposes.” These are the first two sections, we are going to encounter in the following chapters.
Sections are used on both relocatable object file and executable object file level: All machine code and data from one translation unit ends in sections of the relocatable object file. During the link step, ie when the executable object file is liked, these input sections (input from relocatable object files) are joined into output sections (output to executable object file). Which sections from the object file are joined into which sections of the executable is defined in the linker script.
2.4.1 Assembler terms
Terms explained in this chapter:
- directive
- (assembly) instruction
- label
- statement
In the previous chapter two important concepts were defined: sections and symbols. The compiler translates C code to assembler code, in which these symbols and sections are defined. The Assembler translates the assembler code into machine code and builds an object file (ELF in our case).
In assembler symbols are used as kind of an umbrella term for different kind of symbols: directive, assembly instructions and labels, all are symbols. Only some symbols from assembler will be put into symbol table of the object file (see 2.6).
The type of symbol is determined by special characters:
- A “.” in front of a symbol determines that the symbol is a directive.
- A “:” following a symbol, defines a label
label1: ; this is a label
.word variable1 ; this is the directive .word, defining a variable
add R1, #1 ; this is an assembly instruction
A graphical UML overview:
Then there are statements, which consist of an optional label and a directive or an assembler instruction. The whole example above is a statement.
2.5 ELF
Terms explained in this chapter:
- execution view
- linking view
This chapter introduces the ELF file format on a high level. We are going to see in the corresponding chapters how sections and symbols are represented in a ELF file. For now we are going to look into the header and the overall structure. Let’s start with a graphical summary of the structural elements of an ELF file. Remember, all object files are in ELF file format:
The life of an ELF file consists of two important tasks:
- Structure data and machine code during the build process, which is especially important in the linking step of the building process.
- strucuture data and machine code for execution.
These two tasks are represented in two so-called views: The execution view and the linking view of an ELF file.
All information in an object file is in a section, except the ELF header itself. A section can empty (existing but not holding any data). All sections have a section header. As we are going to see in chapter 2.7, there are some predefined sections.
A segment is a collection of one or more sections, and each segment has a program header. The segment header defines metadata for the execution of the ELF file. For the linking view, the segments are irrelevant.
We are going to go through all the relevant parts of a ELF file in the following chapters. Here is the list, in case you directly want to skip to a specific one:
- Sections are explained in chapter 2.7
- Segments are explained in chapter 2.11
- Symbol Tables are explained in chapter 2.6
- Relocation Tables are explained in chapter 2.9
A quick recap of three types of object files:
- relocatable object file: holds code and data, suitable for linking with other object files.
- executable object file: product of the linking of multiple relocatable object files.
- shared object file
The type of an object file is specified in its ELF header.
A ELF file always begins with a well-defined header. The ELF header is the only component of a ELF file, which has a fixed place in the file. All other components are referenced: The ELF header contains references to the section headers and the program headers.
Let’s take a look at the most important fields in the ELF header. The name used in the ELF specification is in parentheses. See [1] and for Arm specifics [2]).
- Type (e_type): Specifies the mentioned type:
- relocatable object file (ET_REL)
- executable object file (ET_EXEC)
- shared object file (ET_DYN)
- Entry point: (e_entry): Specifies where an object file starts execution. This is only for ET_EXEC.
- is
bit[0]
of the address is 1, the entry point contains Arm Thumb code, 0 means Arm code.
- is
- Details for finding the section header table: offset (e_shoff) and number of entries (e_shnum).
- Details for finding the program header table: offset (e_phoff) and number of entries (e_phnum).
A example header of a relocatable object file:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: ARM
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 1008 (bytes into file)
Flags: 0x5000000, Version5 EABI
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 10
Section header string table index: 9
A example header of an executable object file:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x20000001
Start of program headers: 52 (bytes into file)
Start of section headers: 132028 (bytes into file)
Flags: 0x5000200, Version5 EABI, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 10
Section header string table index: 9
ARM processors have two different instruction sets, a larger one called ARM and a smaller subset called Thumb. To distinguish between these two modes, Bit[0] of the address is used. If it is set to 0, the code will be executed in ARM mode, if it is set to 1, the code is executed in Thumb mode. In this case, Bit[0] of the Entry Point address is 1, so the processor will start in Thumb mode. The compiler does specify that using the directive .thumb.
example1/main.s:
The ELF Header examples show the Magic bytes (e_ident) of the ELF header, which are the very first bytes in any ELF object file. They encode different things, but one thing is noteworthy, especially when examining binaries: Byte 5 of the e_ident field encodes whether the data is Little (e_ident[5] = 2 (ELFDATA2LSB) or Big Endian (e_ident[5] = 2) encoded.
As already mentioned, one segment holds one or many sections. Properties like the type of the segment, the addresses in memory, size are given in the program header entry of the segment.
Example program headers:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x01000000 0x01000000 0x0009c 0x000a4 RWE 0x10000
LOAD 0x020000 0x20000000 0x20000000 0x00016 0x00016 R E 0x10000
There are two program headers, one is loaded at the address 0x01000000, the other one at 0x20000000. The first one is Read-Write-Execute and the second one Read-Execute only (see flags). When we go deep into linker scripts, we are going to see where segments and their properties are defined!
2.6 Symbols
Terms explained in this chapter:
- symbol
Symbols are a very central concept. They basically are names or references for data and code in the object files, they are used by every step on the build process in some form. Even after the build process when debugging your code, symbols are used to name code and data.
ELF defines a symbol table, where all symbols are stored in a list. Each symbol has a binding, which represents their visibility from C but on an object file level.
An object file’s symbol table holds information needed to locate and relocate a program’s symbolic definitions and references. [1]
2.6.1 Symbols: Tools
Tools to examine the symbol table of a ELF file are:
nm
or their Arm equivalent, e.g.arm-none-eabi-nm
readelf
orarm-none-eabi-readelf
In nm
the section where the symbol is defined is denoted as a character in front of the symbol name. Lowercase means the symbol is local, uppercase: global. The character shows the section:
nm output |
description |
---|---|
U/u | undefined |
T/t | text section |
D/d | data section |
B/b | bss section |
The whole list can be found in the documentation of nm.
The first example:
arm-none-eabi-gcc -c -mcpu=cortex-m33 -Os -nostdlib main11.c
int global_linkage_file_scope_var = 1;
extern int extern_file_scope_var;
static int static_file_scope_defined_var = 3;
extern int extern_function(int p1);
int main(){
return extern_function( extern_file_scope_var) + extern_function(static_file_scope_defined_var);
}
And their symbol tables:
arm-none-eabi-nm main11.o
U extern_file_scope_var
U extern_function
00000000 D global_linkage_file_scope_var
00000000 T main
same file with with readelf: arm-none-eabi-readelf --syms main11.o
Symbol table '.symtab' contains 15 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS main11.c
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: 00000000 0 SECTION LOCAL DEFAULT 2
4: 00000000 0 SECTION LOCAL DEFAULT 3
5: 00000000 0 SECTION LOCAL DEFAULT 4
6: 00000000 0 NOTYPE LOCAL DEFAULT 4 $t
7: 00000018 0 NOTYPE LOCAL DEFAULT 4 $d
8: 00000000 0 NOTYPE LOCAL DEFAULT 2 $d
9: 00000000 0 SECTION LOCAL DEFAULT 6
10: 00000000 0 SECTION LOCAL DEFAULT 7
11: 00000001 28 FUNC GLOBAL DEFAULT 4 main
12: 00000000 0 NOTYPE GLOBAL DEFAULT UND extern_function
13: 00000000 0 NOTYPE GLOBAL DEFAULT UND extern_file_scope_var
14: 00000000 4 OBJECT GLOBAL DEFAULT 2 global_linkage_file_scope
readelf
does show some more symbols. There is a debug symbol defined for each section in the object file, which nm
does not show without flags.
- Objects defined with external linkage (
global_linkage_file_scope_var
andextern_file_scope_var
) have global binding - Functions have global binding, both extern referenced (
extern_function
) and defined (main
).
extern
functions and objects have a subdefined Ndx value, which is the section index. This tells the linker to resolve that dependency and insert the address of the symbol from another object file, which has that symbol defined.
2.6.2 Symbols: ELF
The symbol table is a well-defined section in a ELF file (sh_name: .symtab, sh_type: SHT_SYMTAB), which holds array of symbol table entry structures: Elf32_Sym
. We’ll discuss the following subset of members of this structure:
st_name
: Name of the symbol2st_value
: Context depended, see below.st_info
: Defined binding and type of symbol, see below.
st_info
is a bitfield, where the 4 most significant bits encode the binding of the symbol (ELF32_ST_BIND
) and the lower ones the symbol type (ELF32_ST_TYPE
).
ELF32_ST_BIND:
Name | Description | Value |
---|---|---|
STB_LOCAL | Local symbols are only visible in the object file, where the symbol was defined. These symbols are listed in the symbol table of the relocatable object file and the executable object file. The same name might appear multiple times! | 0 |
STB_GLOBAL | Global symbols are visible across object files and are able to satisfy another object files undefined symbols. These names must be unique. | 1 |
STB_WEAK | Global symbol but with lower precedence that STB_GLOBAL |
2 |
ELF32_ST_TYPE:
Name | Description | Value |
---|---|---|
SST_NOTYPE | The type is not specified | 0 |
STT_OBJECT | Data object: array, variable,… | 1 |
STT_FUNC | A function. | 2 |
STT_SECTION | A section. | 3 |
STT_FILE | The source files name | 4 |
In relocatable object files, st_value
is the offset from the beginning of the section, where that symbol appears. In executable and shared object files, st_value holds the virtual address of the symbol.
2.6.3 Symbols: Assembler
Let’s now look at how symbols are defined at assembly level. To do that, we take the same example as above and only compile it:
arm-none-eabi-gcc -S -mcpu=cortex-m33 -Os -nostdlib main11.c
int global_linkage_file_scope_var = 1;
extern int extern_file_scope_var;
static int static_file_scope_defined_var = 3;
extern int extern_function(int p1);
int main(){
return extern_function( extern_file_scope_var) + extern_function(static_file_scope_defined_var);
}
The output:
.cpu cortex-m33
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 4
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "main11.c"
.text
.section .text.startup,"ax",%progbits
.align 1
.global main
.arch armv8-m.main
.arch_extension dsp
.syntax unified
.thumb
.thumb_func
.fpu softvfp
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
ldr r3, .L2
push {r4, lr}
ldr r0, [r3]
bl extern_function
mov r4, r0
movs r0, #3
bl extern_function
add r0, r0, r4
pop {r4, pc}
.L3:
.align 2
.L2:
.word extern_file_scope_var
.size main, .-main
.global global_linkage_file_scope_var
.data
.align 2
.type global_linkage_file_scope_var, %object
.size global_linkage_file_scope_var, 4
global_linkage_file_scope_var:
.word 1
.ident "GCC: (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]"
Labels, like main:
or global_linkage_file_scope_var:
define a symbol, which can be referenced by assembly in other parts of the code. Two directives are then used to set the properties of these symbols:
Directive: .global
: Changes the visibility and the binding of a symbol to global, making it visible outside the current object file to the linker. This changes the ELF st_info
ELF32_ST_BIND
to STB_GLOBAL
.
.global global_linkage_file_scope_var
Defines global linkage for the symbol global_linkage_file_scope_var
.
Directive: .type
: Defines the type of the symbol and sets the ELF32_ST_TYPE
field of a symbol in the ELF file.
.type global_linkage_file_scope_var, %object
Defines that the symbol global_linkage_file_scope_var
references an object, in this case a variable. The effect of .global
and .type
in the symbol table (a shown by readelf
):
14: 00000000 4 **OBJECT** **GLOBAL** DEFAULT 2 global_linkage_file_scope
The directive .word
expects a expression as parameter.
Symbols are expressions, so if the symbol extern_file_scope_var
is used as expression and an appropriate symbol is not defined in the current object file - as in the example -, the assembler will emit an undefined symbol with global linkage. In the example, this resulted in a symbol in the symbol table with section index (Ndx) undefined.
.word extern_file_scope_var
Results in an undefined symbol in the symbol table (as shown by readelf):
13: 00000000 0 NOTYPE GLOBAL DEFAULT **UND** extern_file_scope_var
The prefix .L
denotes a local label. These symbols are not put into the symbol table, but are only used in the assembly code to reference things.
2.7 Sections
Terms explained in this chapter:
- section
In the introduction to this chapter we already defined what a section is: Roughly a region in memory, which holds any kind of data - for example machine code or object values. All data types with the same purpose, e.g. all machine code, is grouped into one section. An object file can have many sections. During linking input sections from multiple relocatable object files are grouped into output sections of one executable object file. This step is called relocation A input section is the smallest relocatable unit. We are going to examine relocation in detail later in this chapter, when we look into the linker.
The ELF specification predefined some sections. C objects are - depended on the combination of their state and storage class - placed in different sections by the compiler:
section name | objects in this section |
---|---|
.data | initialized objects with static storage duration |
.text | machine code |
.bss | uninitialized objects with static storage duration static storage duration objects, initialized with zero |
The compiler assigns objects and functions to sections and the linker defines which sections appear in which memory region, when input sections are merged into output sections during linking. In the following chapter we are going to see how the compiler uses assembler directives to define in which section objects and functions are located.
A graphical summary:
2.7.1 Sections: Tools
readelf
or a Arm version, for example arm-none-eabi-readelf
can be used to examine the sections of an object file.
Let’s compile the following example:
arm-none-eabi-gcc -S -mcpu=cortex-m33 -Os -nostdlib main11.c
int global_linkage_file_scope_var = 1;
extern int extern_file_scope_var;
static int static_file_scope_defined_var = 3;
extern int extern_function(int p1);
int main(){
int local_var = extern_function( extern_file_scope_var) + extern_function(static_file_scope_defined_var);
return local_var;
}
To show all section headers then use readelf
on the compiled object file:
arm-none-eabi-readelf --sections a.out ```
There are 11 section headers, starting at offset 0x2dc:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000000 00 AX 0 0 2
[ 2] .data PROGBITS 00000000 000034 000004 00 WA 0 0 4
[ 3] .bss NOBITS 00000000 000038 000000 00 WA 0 0 1
[ 4] .text.startup PROGBITS 00000000 000038 00001c 00 AX 0 0 4
[ 5] .rel.text.startup REL 00000000 00026c 000018 08 I 8 4 4
[ 6] .comment PROGBITS 00000000 000054 00007a 01 MS 0 0 1
[ 7] .ARM.attributes ARM_ATTRIBUTES 00000000 0000ce 000034 00 0 0 1
[ 8] .symtab SYMTAB 00000000 000104 000100 10 9 13 4
[ 9] .strtab STRTAB 00000000 000204 000066 00 0 0 1
[10] .shstrtab STRTAB 00000000 000284 000057 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)
We see 10 sections. .text
, .bss
and .data
are sections you are going to encounter at almost all binaries. They are predefined by the ELF specification. Some others are Arm specific. In chapter 2.6 we had a deep look into symbols: The section .symtab
is a section, which holds the symbol table.
In the following chapters we are going to see how objects and functions from C are mapped into sections, especially into .text
, .bss
and .data
.
2.7.2 Sections: ELF
The section header table is an array of well-defined structures (Elf32_Shdr). Each element of this array holds the information for one section. The ELF header e_shoff value points to the start of the section header table. Since there are empty sections, a section header can exist without its corresponding data. Each section has exactly one section header entry.
Each section header has the following fields:
- sh_name: Name of the section.3
- sh_addr: Holds the address a section should be loaded into memory after execution. This is obviously important for executable object files and mostly zero in relocatable object files, as the name - relocatable - already implies.
- sh_offset: The offset in the ELF file, where the section contents start
- sh_size: Number of bytes in the file.
The Type (sh_type) defines the semantic of the data the section holds:
Name | Description | Value |
---|---|---|
SHT_PROGBITS | Program defined data. Semantic defined by program | 1 |
SHT_SYMTAB | Section which holds the symbol table | 2 |
SHT_NOBITS | This section holds no data, but the semantic is defined by the program | 8 |
SHT_REL | Section holds Relocation Entries (relocation table) | 9 |
[1] lists some special sections, which have a predefined sh_type and sh_name:
Name | sh_type | Description |
---|---|---|
.bss | SHT_NOBITS | uninitialized data |
.data / data1 | SHT_PROGBITS | initialized data |
.text | SHT_PROGBITS | executable instructions, machine code |
.rel$name | SHT_REL | Holds relocation entries for the section $name, could be .rel.text for example. |
.symtab | SHT_SYMTAB | Holds the symbol table |
2.7.3 Sections: Compiler and Assembler
2.7.3.1 .data
The section .data is used for C objects, ie variables, which have static storage duration and are initialized with a non-zero value. Objects have static storage duration, when they are declaredstatic
or extern
. The scope (e.g. file-scope or block-scope) does not matter for the storage duration. See chapter 2.3.
arm-none-eabi-gcc -S -mcpu=cortex-m33 -nostdlib data1.c
Produces the following assembler code:
.cpu cortex-m33
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "data1.c"
.text
.data
.align 2
.type a, %object
.size a, 4
a:
.word 1
.text
.align 1
.global main
.arch armv8-m.main
.arch_extension dsp
.syntax unified
.thumb
.thumb_func
.fpu softvfp
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push {r7}
add r7, sp, #0
ldr r3, .L3
ldr r3, [r3]
mov r0, r3
mov sp, r7
@ sp needed
pop {r7}
bx lr
.L4:
.align 2
.L3:
.word a
.size main, .-main
.ident "GCC: (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]"
The directive .data
instructs the assembler to assemble all of the following instruction into the section .data. After defining the symbol a using the label a:
the directive .word
is used to write the value 1 into the .data section. Also .type a, %object
is used to define the symbol a as an object type in the symbol table of the object file.
Non-Zero initialized objects with static storage duration declared in a block-scope are also added to the section .data.
2.7.3.2 .bss
The section .bss is used for C objects, ie variables, which have static storage duration and are either uninitialized or initialized with zero. Objects have static storage duration, when they are declaredstatic
or extern
. The scope (e.g. file-scope or block-scope) does not matter for the storage duration. See chapter 2.3.
Lets first examine a C object declared as static
at file scope. Compile with the following line:
arm-none-eabi-gcc -S -mcpu=cortex-m33 -nostdlib bss1.c
Produces the following assembler code:
.cpu cortex-m33
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "bss1.c"
.text
.bss
.align 2
a:
.space 4
.size a, 4
.text
.align 1
.global main
.arch armv8-m.main
.arch_extension dsp
.syntax unified
.thumb
.thumb_func
.fpu softvfp
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push {r7}
add r7, sp, #0
ldr r3, .L3
ldr r3, [r3]
mov r0, r3
mov sp, r7
@ sp needed
pop {r7}
bx lr
.L4:
.align 2
.L3:
.word a
.size main, .-main
.ident "GCC: (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]"
The directive .bss
instructs the assembler to assemble all of the following instructions into the section named .bss. Alternatively the directive .section .bss
could be used. The directive .section
allows for arbitrary section names, where with .bss
the section name is always .bss. The label a:
defines the symbol and .space 4
reserves 4 bytes, all assembled into the section .bss.
Uninitialized objects with static storage duration declared in a block-scope are also added to the section .bss. The same applies for objects which are initialized with zero and have static storage duration.
2.7.3.3 Uninitialized objects with external linkage
In the previous explanations on the section .bss the example showed only an object with internal linkage: static
declared objects. When objects with external linkage are declared - file-scope objects without the static
keyword have external linkage and static storage duration - but not initialized the compiler and assembler do have to handle these in a particular way.
Imagine that the compiler runs through multiple files. Each of them has declared the same file-scope object with external linkage, but none of them has initialized it. Since each of these files is part of one translation unit, neither the compiler nor the assembler now at compile and assembling time in which section to place this undefined objects (.bss or .data?). Only when the compiler comes across a file where the object is declared and initialized, it knows: This should be placed in .data.
If, however, none of the translation units initializes the object, space must be reserved in .bss!
Let’s examine how this situation in handled. The following example declares an uninitialized object with external linkage:
arm-none-eabi-gcc -S -c -mcpu=cortex-m33 -nostdlib bss2.c
Produces the following assembly code:
.cpu cortex-m33
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "bss2.c"
.text
.comm c,4,4
.align 1
.global main
.arch armv8-m.main
.arch_extension dsp
.syntax unified
.thumb
.thumb_func
.fpu softvfp
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push {r7}
add r7, sp, #0
ldr r3, .L3
ldr r3, [r3]
mov r0, r3
mov sp, r7
@ sp needed
pop {r7}
bx lr
.L4:
.align 2
.L3:
.word c
.size main, .-main
.ident "GCC: (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]"
The directive .comm
declares a common symbol. During linking all common or defined symbols with the same name from all objects files are merged into one. When the linker does not encounter a defined symbol with the same name, space is reserved in the .bss section. In the example above .comm c,4,4
would reserve 4 bytes.
2.7.3.4 .text
If you worked through the chapters on .bss and .data by now you have understood the principle, so we are going to leave our an example for .text. As
has a directive .text
which has the same effect as .data
and .bss
: All of the instructions folloing this directive are assembled into the section .text.
2.7.4 Sections: Summary
variable is … | handling in translation unit |
---|---|
initialized and static storage duration
(static or file-scope objects) |
.data |
uninitialized and static storage duration ( static` or file-scope objects) |
.bss` |
external linkage | .comm directive
ends up in .data or .bss after linking depended on the
initialization status in other translation units. |
2.8 Literal Pools
Terms explained in this chapter:
- literal pool
Literal Pools are a look up table for addresses of objects, e.g. variables. These look up tables are located nearby where the value of the objects are used, e.g. directly following the function in the .text section. These look up tables are automatically created by the compiler. Let’s first take a look at how the compiler creates literal pools, before we explain why they are needed:
arm-none-eabi-gcc -S -mcpu=cortex-m33 -Os -nostdlib main12.c
extern int externVar1;
extern int externVar2;
int main(){
return externVar1 + externVar2;
}
int myFunction(int a){
return externVar1 + a;
}
The output:
.cpu cortex-m33
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "main12.c"
.text
.align 1
.global main
.arch armv8-m.main
.arch_extension dsp
.syntax unified
.thumb
.thumb_func
.fpu softvfp
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push {r7}
add r7, sp, #0
ldr r3, .L3
ldr r2, [r3]
ldr r3, .L3+4
ldr r3, [r3]
add r3, r3, r2
mov r0, r3
mov sp, r7
@ sp needed
pop {r7}
bx lr
.L4:
.align 2
.L3:
.word externVar1
.word externVar2
.size main, .-main
.align 1
.global myFunction
.syntax unified
.thumb
.thumb_func
.fpu softvfp
.type myFunction, %function
myFunction:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push {r7}
sub sp, sp, #12
add r7, sp, #0
str r0, [r7, #4]
ldr r3, .L7
ldr r2, [r3]
ldr r3, [r7, #4]
add r3, r3, r2
mov r0, r3
adds r7, r7, #12
mov sp, r7
@ sp needed
pop {r7}
bx lr
.L8:
.align 2
.L7:
.word externVar1
.size myFunction, .-myFunction
.ident "GCC: (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]"
The directives following the labels .L3
and .L7
are the literal pools. The former is the literal pool for main()
and contains externVar1
and externVar2
and the latter for myFunction()
, which contains externVar1
. Neither holds the value of externVar1
or externVar2
but rather a reference to the actual value: In both cases the value stored at .word externVarX
is dereferenced before being used:
, .L7
ldr r3, [r3] ldr r2
We are going to see in the section on relocation how the addresses of the symbols externVar1
and externVar2
are inserted
If you are already a little bit familiar with Arm assembly, you might wonder: Why doesn’t the compiler produce something like:
, externVar1
ldr r3, [r3] ldr r2
The instruction LDR
works perfectly fine with labels. The assembler would calculate the relative offset from the current PC
value to externVar1
and would then the value of externVar1
into R3
. The problem lies in the size of the offset calculated! Let’s look how the LDR (literal) instruction is decoded. The bits for the Armv8-M T1 LDR instruction are:
LDR {Reqister} {label}
Bit | 15 | 14 | 13 | 12 | 11 | 10-8 | 7-0 |
Value | 0 | 1 | 0 | 0 | 1 | register | immediate offset to label |
- Bits 15-11 are fix, they encode the instruction
LDR
- Bits 10-8 encode the register
- Bits 7-0 encode the offset calculated.
The offset is Since only 8 bits are available to encode the immediate (the relative offset from PC to the label) and since this variant of LDR
calculates the offset with a factor of 4 you have a range from 0 to 1024. If the label is outside of that address, so greater than PC + immediate * 4, the value can’t be loaded.
The LDR
instruction is only one example of this problem. The Arm architecture is optimized for short jumps and therefore the same problem occurs for all kind of instructions, for example branches. The solution for that is to have literal pools. They hold the full address - 32 bit - and are loaded fully into a register. Using that concept the full 32 bit address range can be reached.
On a side note: Labels prepended with .L
denote local labels, which to not appear in the symbol table. GCC puts a literal pool following each function where one is necessary.
2.9 Relocation
Terms explained in this chapter:
- relocation
- relocation table
In the previous chapters we saw how one translation unit results in a relocatable object file after compilation and assembling and talked about how many relocatable object files are linked into one executable object file. We also saw how symbols are listed in the symbol table, depended on their status in the object file: If they are not defined in the current object file, they appear as NOTYPE and UND. (symbol table extract from example in chapter 2.6):
13: 00000000 0 NOTYPE GLOBAL DEFAULT UND extern_file_scope_var
14: 00000000 4 OBJECT GLOBAL DEFAULT 2 global_linkage_file_scope
The symbol global_linkage_file_scope
is defined in the current object file, whereas extern_file_scope_var
was declared extern
and thus not defined but only referenced as being available in another object file. Relocation is part of the linking process and is about connecting symbolic references with symbolic definitions.
In the chapter on section (2.7) you might have already noticed a special section named .rel$name
(e.g. .rel.text for the section .text). These sections have the sh_type
SHT_REL
and hold the relocation table entries for the section $name. Each relocation table entry describes how the particular relocatable object file has to be modified - i.e. how to connect symbolic references with symbolic definition - for successful linking and finally execution of the executable object file.
Relocation entries are defined in the specification as an a structure named ELF32_Rel
, which has the following members:
- r_offset: Same as st_value in the symbol table entries: For relocatable object files this is an offset from the beginning of the section, for executable and shared object files this is a virtual address.
- r_info: Bitfield which encodes two information: The symbol table index of the relocation entry (ELF32_R_SYM) and the type of relocation (ELF32_R_TYPE).
- Bits 0-7: ELF32_R_TYPE
- Bits 8-15: ELF32_R_SYM
Let’s take a look at the following example (same as (2.7))
arm-none-eabi-gcc -S -mcpu=cortex-m33 -Os -nostdlib main11.c
int global_linkage_file_scope_var = 1;
extern int extern_file_scope_var;
static int static_file_scope_defined_var = 3;
extern int extern_function(int p1);
int main(){
int local_var = extern_function( extern_file_scope_var) + extern_function(static_file_scope_defined_var);
return local_var;
}
And then check the relocation table:
arm-none-eabi-readelf -r main11.o
Relocation section '.rel.text' at offset 0x288 contains 4 entries:
Offset Info Type Sym.Value Sym. Name
0000000c 00000d0a R_ARM_THM_CALL 00000000 extern_function
00000018 00000d0a R_ARM_THM_CALL 00000000 extern_function
0000002c 00000e02 R_ARM_ABS32 00000000 extern_file_scope_var
00000030 00000302 R_ARM_ABS32 00000000 .data
During linking the sections, in which these symbols are defined, are going to be placed into a memory region. Afterwards their address will be available. Together with the offset of the place being relocated - e.g. extern_function at 0x0000000c in the example above - the linker can resolve the reference and put the correct value into the place being relocated.
As you can see in the relocation table: There are different types of relocation. Obviously the value written into the place being relocated depends on the instruction, which uses the value. Remember the literal pools?
externVar1
appears as relocation type R_ARM_ABS32
in the relocation table, which is the most simple example: We really just need the final address of externVar1
inserted at .L7:
during relocation.
2.10 Veneers
Terms explained in this chapter:
- veneer
Now imagine you want to design a linker whose responsibility is to merge multiple object files into one. We saw some support by the compiler and assembler in the chapter on relocation (see 2.9). As I already mentioned there not only the LDR
instruction is subject to a limited number of bits, which encode the PC-relative offset but also branch instruction. The branch (b
) instruction in Armv8-M for example has - depended on the variant - 11 or 8 bits. This limits the range to branch off to. If the linker encounters a situation where the branch is too far, a veneer is inserted.
Let’s take a look at the following example (same as (2.7)) and implement a library, which defines the extern function and variable:
main11.c
int global_linkage_file_scope_var = 1;
extern int extern_file_scope_var;
static int static_file_scope_defined_var = 3;
extern int extern_function(int p1);
int main(){
int local_var = extern_function( extern_file_scope_var) + extern_function(static_file_scope_defined_var);
return local_var;
}
main11-lib.c:
First compile both source code files into relocatable object files:
arm-none-eabi-gcc -c -mcpu=cortex-m33 -nostdlib main11.c
arm-none-eabi-gcc -c -mcpu=cortex-m33 -nostdlib main11-lib.c
This will result in the two files main11.o
and main11-lib.o
. The next step is to link both. To actually see the veneers, we simulate that the .text sections from main11.o main11-lib.o will be located in different memory regions, which are farther apart than a branch could encode. To do achieve that, we use the following linker script:
memory.ld
MEMORY {
RAMMAIN : ORIGIN = 0x01000000, LENGTH = 32K
RAMLIB : ORIGIN = 0x20000000, LENGTH = 32K
}
SECTIONS {
ENTRY(function_in_library)
.text : {
main11.o(.text)
} > RAMMAIN
.lib : {
main11-lib.o(.text)
} > RAMLIB
.data : {
*(.data)
} > RAMMAIN
}
Now link the object files to an executable object file:
arm-none-eabi-ld -T memory.ld main11-lib.o main11.o -o main11
Linker scripts are explained in chapter 2.11
Linking the files will result in in an executable object file named main11. Throw this file into a disassembler of your choice:
Using arm-none-eabi-objdump --disassemble=XXXXX main11
will give you the following for the function XXXX:
01000000 <main>:
1000000: b590 push {r4, r7, lr}
1000002: b083 sub sp, #12
1000004: af00 add r7, sp, #0
1000006: 4b09 ldr r3, [pc, #36] ; (100002c <main+0x2c>)
1000008: 681b ldr r3, [r3, #0]
100000a: 4618 mov r0, r3
100000c: f000 f814 bl 1000038 <__extern_function_veneer>
1000010: 4604 mov r4, r0
1000012: 4b07 ldr r3, [pc, #28] ; (1000030 <main+0x30>)
1000014: 681b ldr r3, [r3, #0]
1000016: 4618 mov r0, r3
1000018: f000 f80e bl 1000038 <__extern_function_veneer>
100001c: 4603 mov r3, r0
100001e: 4423 add r3, r4
1000020: 607b str r3, [r7, #4]
1000022: 687b ldr r3, [r7, #4]
1000024: 4618 mov r0, r3
1000026: 370c adds r7, #12
1000028: 46bd mov sp, r7
100002a: bd90 pop {r4, r7, pc}
100002c: 01000050 .word 0x01000050
1000030: 0100004c .word 0x0100004c
01000038 <__extern_function_veneer>:
1000038: b401 push {r0}
100003a: 4802 ldr r0, [pc, #8] ; (1000044 <__extern_function_veneer+0xc>)
100003c: 4684 mov ip, r0
100003e: bc01 pop {r0}
1000040: 4760 bx ip
1000042: bf00 nop
1000044: 20000001 .word 0x20000001
20000000 <extern_function>:
20000000: b480 push {r7}
20000002: b083 sub sp, #12
20000004: af00 add r7, sp, #0
20000006: 6078 str r0, [r7, #4]
20000008: 2301 movs r3, #1
2000000a: 4618 mov r0, r3
2000000c: 370c adds r7, #12
2000000e: 46bd mov sp, r7
20000010: bc80 pop {r7}
20000012: 4770 bx lr
First thing to note is the different addresses (first column) the machine code is located: extern_function
is at 0x2000.0000
and following, whereas main is in 0x100.0000
. The linker script worked as expected! The next thing to notice is the function __extern_function_veneer
, which was inserted. Originally main directly called extern_function.
, but since the branch was too far, the linker inserted a veneer to cope with the distance.
Inside the veneer the address of extern_function
is stored in a literal pool (0x1000044
) and loaded PC-relative (0x100003a
) into R0
, then moved into IP
. Finally the branch to IP happens. Using that technique the full 32 bits of the address space can be reached.
As you can see R0
is saved and restored in the veneer. The IP
register on the other hand is not. That is because the IP
register is a defined exactly for that purpose and is allowed to be corrupted between function calls. See chapter 6.4 for details on the R12
register and why R0
is used.
2.11 Linker Script
Terms explained in this chapter:
- linker script
- linker command
- (memory) region
- segment
- relocatable unit
A linker script is used to
- map input sections (from relocatable object files) to output sections (in the executable object file)
- map output sections to memory regions
- define some other properties of the executable object file.
In the chapter on veneers 2.10 we used an linker script to showcase veneers:
MEMORY {
RAMMAIN : ORIGIN = 0x01000000, LENGTH = 32K
RAMLIB : ORIGIN = 0x20000000, LENGTH = 32K
}
SECTIONS {
ENTRY(function_in_library)
.text : {
main11.o(.text)
} > RAMMAIN
.lib : {
main11-lib.o(.text)
} > RAMLIB
.data : {
*(.data)
} > RAMMAIN
}
A linker script is parsed from top to bottom. MEMORY
and SECTIONS
are commands in the linker script. Using MEMORY
you describe location and size of memory regions of the target and then instruct the linker, using the SECTIONS
command, to place sections data into these regions.
The MEMORY
command has the following syntax:
MEMORY
{
name [(attr)] : ORIGIN = origin, LENGTH = len
…
}
- ORIGIN: The address where the segment starts. This can be a expression, which evaluates to a constant.
- LENGTH: the Size of the segment
In chapter 2.5 we introduced the execution view and the linking view of a ELF file. The segments defined here are the core part of the execution view of the ELF file. They contain important information on how to place the developers code into memory and also which permissions - read, write or execute - are assigned to these segments. In embedded systems although these properties are often ignored, since memory permissions are set up during reset of the system in the code.
The SECTIONS
command maps (multiple) input sections to an output section. Input sections are the sections from a relocatable object file, output sections the section in an executable object file.
The SECIONS command is defined as:
SECTIONS
{
sections-command
sections-command
…
}
secions-command
being - most often - a output section description4, which is a command selecting an output section. For example:
SECTIONS
{
"name_of_output_section" : {
input-command
} > segment_name
…
}
Where input-command is a command to select input sections:
SECTIONS
{
"name_of_output_section" : {
file_selector(input_section_name)
} > segment_name
…
}
Using the file selector you can select file names and the input_section_name select the input sections in these files. With this very basic example we can understand the example linker script:
- To create the output section .text, take the input section .text from main11.o. Put the output section .text into segment RAMMAIN.
- To create the output section .lib, take the input section .text from main11-lib.o Put the output section .text into segment RAMLIB.
- To create the output section .data, take the input sections .data from all relocatable object files. Put the output section .data into segment RAMMAIN.
A section is the smallest relocatable unit, as we can see in the input-command definition. By using the linker script we can define where in memory specific sections from specific relocatable object files are placed when executed. As we saw in the chapter in veneers (2.10) and relocation (2.9) this had the consequence that veneers and literal pools had to be used to resolve long jumps.
2.12 Excursus: Arm-Thumb interworking and .glue_7 and .glue_7t
Interworking enables the developer to mix Thumb and Arm code. This book currently only covers Armv8-M. Since this architecture only supports Thumb mode, no interworking is necessary. But since the details are interesting and might also be handy in having them as background knowledge, we’ll make a little excursus to understand Arm-Thumb interworking in more detail.
When compiling a translation unit the compiler may or may not know in which instruction set the called function was compiled in. So when calling a function with instruction set change - either from Thumb to Arm or back - there are details, which the compiler has solve. Obviously, when the compiler knows the destination instruction set, the BLX
or BX
instruction could be used. However, BX
was introduced in Armv4t, BLX
in Armv5. So there might be a situation where a Thumb function was called from Arm. When linking these object files then, the linker has to create a stub, which changes state to thumb.
A example:
((target("thumb")))
__attribute__int add_ten_in_thumb_mode(int a) {
return a + 10;
}
((target("arm")))
__attribute__int add_ten_in_arm_mode(int a) {
return a + 10;
}
((target("arm")))
__attribute__int sum_in_arm(){
return add_ten_in_arm_mode(10) + add_ten_in_thumb_mode(10);
}
((target("thumb")))
__attribute__int sum_in_thumb(){
return add_ten_in_arm_mode(10) + add_ten_in_thumb_mode(10);
}
Compiled with arm-none-eabi-gcc --specs=nosys.specs -nostdlib -march=armv4t -o glue_test.o glue_test.c
added interworking veneers:
00008040 <sum_in_arm>:
8040: e92d4830 push {r4, r5, fp, lr}
8044: e28db00c add fp, sp, #12
8048: e3a0000a mov r0, #10
804c: ebfffff1 bl 8018 <add_ten_in_arm_mode>
8050: e1a04000 mov r4, r0
8054: e3a0000a mov r0, #10
8058: eb00000e bl 8098 <__add_ten_in_thumb_mode_from_arm>
805c: e1a03000 mov r3, r0
8060: e0843003 add r3, r4, r3
8064: e1a00003 mov r0, r3
8068: e24bd00c sub sp, fp, #12
806c: e8bd4830 pop {r4, r5, fp, lr}
8070: e12fff1e bx lr
00008074 <sum_in_thumb>:
8074: b5b0 push {r4, r5, r7, lr}
8076: af00 add r7, sp, #0
8078: 200a movs r0, #10
807a: f000 f813 bl 80a4 <__add_ten_in_arm_mode_from_thumb>
807e: 0004 movs r4, r0
8080: 200a movs r0, #10
8082: f7ff ffbd bl 8000 <add_ten_in_thumb_mode>
8086: 0003 movs r3, r0
8088: 18e3 adds r3, r4, r3
808a: 0018 movs r0, r3
808c: 46bd mov sp, r7
808e: bcb0 pop {r4, r5, r7}
8090: bc02 pop {r1}
8092: 4708 bx r1
8094: 0000 movs r0, r0
...
00008098 <__add_ten_in_thumb_mode_from_arm>:
8098: e59fc000 ldr ip, [pc] ; 80a0 <__add_ten_in_thumb_mode_from_arm+0x8>
809c: e12fff1c bx ip
80a0: 00008001 .word 0x00008001
<__add_ten_in_arm_mode_from_thumb>:
000080a4 80a4: 4778 bx pc
80a6: e7fd b.n 80a4 <__add_ten_in_arm_mode_from_thumb>
80a8: eaffffda b 8018 <add_ten_in_arm_mode>
80ac: 00000000 andeq r0, r0, r0
The first example: The compiler generated for add_ten_in_thumb_mode()
a veneer called __add_ten_in_thumb_mode_from_arm
. So when add_ten_in_thumb_mode()
was called from Arm code (sum_in_arm()
) with BL
, the veneer:
- kept the
LR
unchanged:BX R1
will change to Arm back, whenadd_ten_in_thumb_mode()
finishes. - does the switch to Thumb by loading 0x8001 (address of
add_ten_in_thumb_mode()
+1) and branching to it
A similar construct was done for the veneer __add_ten_in_arm_mode_from_thumb
, where Thumb code calls __add_ten_in_arm_mode_from_thumb
, which branches to 0x80a8, changes instruction sets and then branches to add_ten_in_arm_mode
.
Now compile the example as a shared library: arm-none-eabi-gcc -shared -Xlinker "--unique=.glue_7" glue_test.c -o glue_test.o
-shared
creates a shared object file-Xlinker "--unique=.glue_7"
instructs the linker to keep the sectionglue_7
, which would otherwise be joined with the.text
section by the default linker script.
Then disassemble the .glue_7
section using arm-none-eabi-objdump -j ".glue_7" -S -a glue_test.o
:
.glue_7:
Disassembly of section
00000244 <add_ten_in_thumb_mode>:
244: e59fc004 ldr ip, [pc, #4] ; 250 <add_ten_in_thumb_mode+0xc>
248: e08cc00f add ip, ip, pc
24c: e12fff1c bx ip
250: ffffff61 .word 0xffffff61
00000254 <sum_in_thumb>:
254: e59fc004 ldr ip, [pc, #4] ; 260 <sum_in_thumb+0xc>
258: e08cc00f add ip, ip, pc
25c: e12fff1c bx ip
260: ffffffc5 .word 0xffffffc5
For shared object files the section .glue_t
is used to put the veneer stubs into.
When an object file, which had defined add_ten_in_thumb_mode()
as external, would be linked against this example, first the veneer would be called, because the symbol for add_ten_in_thumb_mode
was actually put at the start of the veneer. Using this technique the object file could use BL add_ten_in_thumb_mode()
, which would actually first jump to the stub and then use BX IP
to branch and exchange to the actual implementation of add_ten_in_thumb_mode()
.
.glue_7
is a section is is defined internally by the linker to put such stubs into. When the ELF file is linked these are joined with the .text
section.
References
This field actually holds an index of the corresponding string in the String Table. The String Table was left out for now.↩︎
This field actually holds an index of the corresponding string in the String Table. The String Table was left out for now.↩︎
https://sourceware.org/binutils/docs-2.34/ld/Output-Section-Description.html#Output-Section-Description↩︎