Chapter 6 AAPCS Arm ABI and Runtime view
A procedure call standard or calling convention defines how different compilation units, even when compiled by different compilers, can work together. It defines how parameters are passed from one function to another, which registers and other program states must be preserved by caller an callee and what might be altered by the callee. The procedure call standard is one in a set of standards, which define the application binary interface (ABI) a compilation unit has to respect.
When we develop software we often encounter programming interfaces, for example the Twitter API (application programming interface) defines how we have to “talk” to Twitter, to retrieve meaningful results. The same applies for compiled files. Imagine your executable targets Linux as operating system and intends to establish a network connects. In Linux there is the connect()
syscall. connect()
has an associated system call number in Linux: 283. In chapter 3.6 we learned how the SysCall
exception is triggered by the svc
instruction in assembler. Maybe you now assume that you can call (ie the compiler could compile your C-code) svc #283
in your code to ask Linux for a connect()
. In fact, that worked until the Old Application Binary Interface (OABI) - the Arm ABI for Linux - was replaced by the new Arm EABI for Linux.
The new Arm EABI for now expects the syscall number in R0
and 0 as an argument to SVC
. A summary on the syscall interface for Linux is given in the man page for syscall(2):
The first table shows how a syscall is called:
Arch/ABI Instruction System Ret Ret Error Notes
call # val val2
───────────────────────────────────────────────────────────────────
arm/OABI swi NR - a1 - - 2
arm/EABI swi 0x0 r7 r0 r1 -
And the second how arguments are passed to the syscalls:
Arch/ABI arg1 arg2 arg3 arg4 arg5 arg6 arg7 Notes
──────────────────────────────────────────────────────────────
alpha a0 a1 a2 a3 a4 a5 -
arc r0 r1 r2 r3 r4 r5 -
arm/OABI a1 a2 a3 a4 v1 v2 v3
arm/EABI r0 r1 r2 r3 r4 r5 r6
For bare-metal projects not running on Linux the EABI for Linux is not relevant. Instead the ABI for the Arm Architecture, specified by Arm, is used. As already mentioned the ABI is a set of standards. One element, the procedure call standard, defines an interface for function calls in between compilation units. Another important element of the ABI set of standards is the ELF extensions for Arm binaries, from which we already saw parts in chapter 2.
For reverse engineering the procedure call standard is the most relevant to understand, so the Procedure Call Standard for Arm Architecture is the focus of this chapter.
Over the years Arm released multiple versions of their procedure call standard. In the following chapter we will rely on the Procedure Call Standard for Arm Architecture (AAPCS), which the most recent release. You might encounter other terms:
- AAPCS: Procedure Call Standard for Arm Architecture
- ATPCS Arm-Thumb Procedure Call Standard (precursor AAPCS).
- APCS: Arm Procedure Call Standard (obsolete).
- TPCS: Thumb Procedure Call Standard (obsolete).
When compiling using gcc for example, you can encounter these in the -mabi=
option, where you can specify atpcs or apcs-gnu to compile for these target ABI.
6.1 Data Types
The AAPCS defines composite types and fundamental types. A fundamental type is anyone of the C types listed in table 6.1 or 6.2. There are also fundamental types for floats, if you want a full list check the AAPCS. A composite type consists of one to many fundamental or other composite data types: For example a struct
in C it will be handled as composite type. Each member of the struct is either another composite data type (another struct) or a fundamental data type. Since the definition is recursive (struct in struct, ie composite data type in composite data type), in the end each composite data type consists of a collection of fundamental data types.
struct myStruct7Word
is a composite type, which consists of 7 fundamental types: int a
to int f
and char g
.
Tables 6.1 and 6.2 shows the mapping from C types to machine types, their size in memory and their alignment boundary.
Terms explained in this chapter:
- composite type
- fundamental type
- maschine type
Maschine Type | Size in Bytes | Alignment Boundary | C types |
---|---|---|---|
(signed and unsigned) byte | 1 | 1 | (signed and unsigned) char |
(signed and unsigned) half-word | 2 | 2 | (signed and unsigned) short |
(signed and unsigned) word | 4 | 4 | (signed and unsigned) int and long int |
(signed and unsigned) double-word | 8 | 8 | (signed and unsigned) long long int |
Maschine Type | Size in Bytes | Alignment Boundary | C types |
---|---|---|---|
pointer (data and code | 4 | 4 | T*, T (*F)(), T& |
Composite types can be splitted into their fundamental types, when they are passed as value into a subroutine. We are going to examine that in great detail in in chapter 6.2. Fundamental types are handled as single entity when a subroutine is called and are not splitted further.
6.2 Subroutine Call
Terms explained in this chapter:
- parameter
- argument
- caller-saved register
- callee-saved register
- stack frame
- stack layout
Let me first introduce the difference between parameters and arguments. When I speak about an argument, I mean the actual value passed into a function. The parameter on the other side is the corresponding variable, which is part of the declaration of the function.
More on Arm Registers:
Before we dive into the subroutine call, we need to categorize registers into two broad categories: caller-saved and callee-saved registers. AAPCS does not directly mention these categories, but rather divides registers into preserved and not preserved ones. For reverse engineering firmware, it is important to know the purpose of each register within a function and for a function call:
- The general-purpose registers
R0
-R3
are used to pass arguments to a function and also return values. They are not needed to be preserved by the callee, they are caller-saved. R4
-R8
,R10
andR11
are used to hold local variables within a function. A caller can expect them to be unchanged, when the called function returns: They are callee-saved.
To pass an fundamental data type argument to a function, the compiler first maps the parameter to a machine type (see tables 6.1 and 6.2). Then the argument is bound to a register, which is used to pass the argument to the function. When all all caller-saved registers (R0
to R3
) are bound to arguments, the stack is used to pass all arguments left:
- Parameters are passed from left to right as declared.
- Arguments are passed in registers and on stack.
- Composite types can be splitted between registers and stack. Their contained fundamental types are not splitted! (see 5))
- The first 4 parameters are passed in R0-R3. All the following parameters are passed on the stack.
- If one argument is bigger than 32 bit (e.g. a long long int), the parameter is splitted into consecutive registers, e.g. R0 and R1 up to R3.
- Fundamental types can’t be splitted between registers and stack. So when R0-R2 are already reserved for the first two parameters of a function, a third
long long int
parameter with 64 bit won’t be in splitted but rather fully passed on stack.
Composite types are splitted into their fundamental data types and then these are handled the same way as described above.
6.2.1 Passing arguments
Let’s quickly look at an example for step number 5, to show how fundamental types are either passed entirely in registers or on the stack:
We see 3 times word sized arguments (int
) and one 8 byte sized double-word (long long int
). What do we expect if we follow the routine above? The int
s should be passed in R0
to R2
. R3
would be empty, but since long long int
is a fundamental data type is can’t be splitted further, d
should be passed on stack.
main8.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <caller>:
0: b507 push {r0, r1, r2, lr}
2: 2300 movs r3, #0
4: 2203 movs r2, #3
6: 2101 movs r1, #1
8: e9cd 2300 strd r2, r3, [sp]
c: 2000 movs r0, #0
e: 2202 movs r2, #2
10: f7ff fffe bl 0 <callee>
14: b003 add sp, #12
16: f85d fb04 ldr.w pc, [sp], #4
We see how first the long long int
is loaded into R2
and R3
and then stored on the stack: The first 32 bits will be at SP+0
to SP+4
, the next 32 bit then SP+4
to SP+8
. SP
is kept as it is. Then R0
, R1
and R2
are prepared with the other arguments and callee()
is called.
Let’s next look at an example for 3), where
- a composite type is splitted into it’s fundamental types and passed in registers and
- a composite type is splitted between registers and stack
struct myStruct1Word {
int a;
};
struct myStruct7Word{
int a;
int b;
int c;
int d;
int e;
int f;
char g;
};
extern void callee1(struct myStruct1Word);
extern void callee7(struct myStruct7Word);
void caller(){
struct myStruct1Word s;
s.a = 1111;
callee1(s);
struct myStruct7Word t;
t.a = 2222;
t.g = 0xFF;
callee7(t);
}
There are two structs defined: myStruct1Word
and myStruct7Word
. Note that myStruct7Word
is actually 6 times an int
, which is mapped to machine type of 4 bytes each and a char
, which is 1 byte. They are local variables, which are located on the stack. Also remember: The structs are passed by value. The disassembly of caller()
:
main6.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <caller>:
0: b500 push {lr}
2: f240 4057 movw r0, #1111 ; 0x457
6: b08d sub sp, #52 ; 0x34
8: f7ff fffe bl 0 <callee1>
c: f640 03ae movw r3, #2222 ; 0x8ae
10: 9305 str r3, [sp, #20]
12: 23ff movs r3, #255 ; 0xff
14: f88d 302c strb.w r3, [sp, #44] ; 0x2c
18: ab0c add r3, sp, #48 ; 0x30
1a: e913 0007 ldmdb r3, {r0, r1, r2}
1e: e88d 0007 stmia.w sp, {r0, r1, r2}
22: ab05 add r3, sp, #20
24: cb0f ldmia r3, {r0, r1, r2, r3}
26: f7ff fffe bl 0 <callee7>
2a: b00d add sp, #52 ; 0x34
2c: f85d fb04 ldr.w pc, [sp], #4
How does gcc set up the stack? First in line 8 stack space is reserved. Then R0
is loaded with s.a
. Note: Despite s
being stack variables, s.a
is never stored in the stack frame, since it is never used in caller()
. That was optimized away! The composite data type myStruct1Word
is splitted into it’s fundamental type int
and passed in the register R0
. Then `callee1()
is called.
Afterwards the preparations to call callee7()
begin. For t
, the following happens: First, write the values 2222 and 0xFF into memory (lines 11 - 14). Then the values t.e
, t.f
, t.g
are copied on top of SP, to be passed as arguments to callee7():
After that the remaining 4 arguments (t.a
to t.d
) are loaded into R0
-R3
and callee7()
is called.
A more generalized stack layout would be:
The dark grey parts of figure 6.2 depicts memory regions the currently running function owns. The arguments passed to the callee are at the bottom and the dark grey parts in the middle are used for local variables.
The light grey parts are:
- previous function state storage: Values the callee preserved for the caller. The stored LR and the callee-saved registers at the bottom of stack.
- the prepared arguments for the next function call, i.e. the arguments on top of the stack.
6.2.2 Return function values
How values are returned from functions is also part of the AAPS. The rules are very similar to passing arguments, with some important details differing:
- For fundamental types and composite types of sizes smaller or equal to 32 bit
R0
is used. - For fundamental types only: If the return value is bigger than 32 bit (e.g. a long long int) the parameter is splitted into consecutive registers, e.g. R0 and R1 up to R3, if necessary (128 bit return value).
- For composite data types bigger than 32 bit bit only: The caller places a reference to memory he reserved into R0, which is then used by the callee to return the values.
Rules 1 and 2 are obvious and we saw in the previous chapter, when we examined how arguments are passed into a function, these rules in action. When a function returns a composite type things get interesting, so let’s examine the first of those examples:
struct myStruct1Word {
int a;
};
struct myStruct2Word {
int a;
int b;
};
struct myStruct7Word{
int a;
int b;
int c;
int d;
int e;
int f;
char g;
};
extern struct myStruct1Word callee1();
extern struct myStruct2Word callee2();
extern struct myStruct7Word callee7();
int caller(){
struct myStruct1Word s;
struct myStruct2Word t;
struct myStruct7Word u;
s = callee1();
t = callee2();
u = callee7();
return s.a + t.a + u.a;
}
main9.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <caller>:
0: e92d4010 push {r4, lr}
4: e24dd028 sub sp, sp, #40 ; 0x28
8: ebfffffe bl 0 <callee1>
c: e1a04000 mov r4, r0
10: e28d0004 add r0, sp, #4
14: ebfffffe bl 0 <callee2>
18: e28d000c add r0, sp, #12
1c: ebfffffe bl 0 <callee7>
20: e59d0004 ldr r0, [sp, #4]
24: e0844000 add r4, r4, r0
28: e59d000c ldr r0, [sp, #12]
2c: e0840000 add r0, r4, r0
30: e28dd028 add sp, sp, #40 ; 0x28
34: e8bd4010 pop {r4, lr}
38: e12fff1e bx lr
First thing again in line 7 is to reserve some space on stack.
callee1()
returns only an one word sized composite type (myStruct1Word
), so according to the rules described above only R0
is used to return s
, which is exactly was happend in line 10.
The return values from callee7()
and callee2()
are bigger than 32 bit, so we expect a reference passed to them in R0:
The address SP+4
is put into R0
and callee2()
is called. The value for t.a
is loaded from SP+4
in line 15. The same process but with the address SP+12
is done for callee7()
. caller()
reserved some space for myStruct2Word
and myStruct7Word
on his stack and passed a reference to these locations in R0
to callee2()
respectively callee7()
. Then towards the end the return value for caller()
is calculated and moved into R0
(line 18). Last, caller()
returns.
More on Secure function calls:
6.3 Exception Entry and Return
Terms explained in this chapter:
- exception entry
- exeption phases
- state context
An exception is an interruption of normal execution flow. This means that the code currently running is being stopped unforeseen during execution and the core switches to the interrupt handler for execution. This is be done right in the middle of any execution function. In the previous chapters we saw how the compiler and linker do prepare function calls in a structured and well-defined way to make sure the state context of one function is saved before transferring execution to another function. To implement exceptions nevertheless, the core takes care of a lot of state context saving, as we are going to see in the following chapter.
More on Arm Exception System:
- Chapter 3.6.2: Arm Architecture:NVIC
- Chapter 3.6.3: Arm Architecture: Masking
- Chapter 6.3: Exception Entry and Return
- Chapter 10.2.5.2: STM32L5: TrustZone Illegal Access Controller (TZIC)
- Chapter 3.4: Arm Architecture: Execution Modes and Privilege Levels
- Chapter 3.6: Arm Architecture:Exceptions, Interrupts, Faults
- Chapter 7.5: Secure and non-secure exceptions
On exception entry the core saves state context on the stack. The state context is a set of registers:
RETPSR
(Combined Exception Return Program Status Registers)- ReturnAddress
- LR
- R12
- R0-R3
The following steps are done by the core on exception entry:
- Save state context
- Set LR to EXC_RETURN
- Clearing some registers
- Enable Main Stack Pointer
- PC is set to the exception handler
If floating point extensions are available in the core and the floating point context is activated in CONTROL.FPCA
bit, first the floating point context is stored on stack, and then the mentioned state context. The stack on exception entry then looks like:
The exception return phase is started by the core, when PC
is set to EXC_RETURN
by the exception handler. During the return phase the saved contexts are restored and the execution resumes execution where is was interrupted before the exception, triggered by setting PC
to the stored Return Address from stack.
6.4 Excursus: IP register for Interworking
More on Interworking:
In previous chapters we often saw veneers, which got inserted by the linker to make long branches. To implement these the linker has to insert stubs, which load a value of 32 bit from memory intro a register and then jump to that location by referencing that register. In this way the full 32 bit address space can be reached, instead of the limited range the Arm instructions branching to an immediate provide.
Since these veneers change the content of a register, it necessary to have a register, which neither caller nor callee rely on. This is R12
or the intra-procedure-call scratch register. AAPCS defines R12 as the register, which can be used by the linker to implement these long branches.
#{r, results="asis", echo=FALSE, comment='' } #nextlevel("veneers") #
In chapter 2.9 we saw how the linker would resolve a symbol with relocation type R_ARM_THM_CALL
, which defines a relocation of a BL #immediate
thumb instruction. There is obviously a lot of complexity hidden in the linker and each relocation type differs from the others a little bit, for example in the calculation needed to determine the destination address.
For the relocation type R_ARM_THM_CALL
binutils linker has defined the following veneer, which gets inserted when a R_ARM_THM_CALL
symbol is relocated:
/* Thumb -> Thumb long branch stub. Used on M-profile architectures. */
static const insn_sequence elf32_arm_stub_long_branch_thumb_only[] =
{
THUMB16_INSN (0xb401), /* push {r0} */
THUMB16_INSN (0x4802), /* ldr r0, [pc, #8] */
THUMB16_INSN (0x4684), /* mov ip, r0 */
THUMB16_INSN (0xbc01), /* pop {r0} */
THUMB16_INSN (0x4760), /* bx ip */
THUMB16_INSN (0xbf00), /* nop */
DATA_WORD (0, R_ARM_ABS32, 0), /* dcd R_ARM_ABS32(X) */
};
PC+8
is the location in memory, where the linker will write the destination address to. This value is loaded into R0
, moved into IP
and finally R0
is restored.
Why that detour using R0
? LDR
instructions in thumb mode have 3 bits to encode for the destination register, so you can only use R0
- R7
(3 bits). Although Armv8-M main extension has encoding for LDR
where the destination register is encoded in 4 bits, this seems not to be used in the stubs.