Chapter 6 AAPCS Arm ABI and Runtime view

A procedure call standard or calling convention defines how different compilation units, even when compiled by different compilers, can work together. It defines how parameters are passed from one function to another, which registers and other program states must be preserved by caller an callee and what might be altered by the callee. The procedure call standard is one in a set of standards, which define the application binary interface (ABI) a compilation unit has to respect.

When we develop software we often encounter programming interfaces, for example the Twitter API (application programming interface) defines how we have to “talk” to Twitter, to retrieve meaningful results. The same applies for compiled files. Imagine your executable targets Linux as operating system and intends to establish a network connects. In Linux there is the connect() syscall. connect() has an associated system call number in Linux: 283. In chapter 3.6 we learned how the SysCall exception is triggered by the svc instruction in assembler. Maybe you now assume that you can call (ie the compiler could compile your C-code) svc #283 in your code to ask Linux for a connect(). In fact, that worked until the Old Application Binary Interface (OABI) - the Arm ABI for Linux - was replaced by the new Arm EABI for Linux.

The new Arm EABI for now expects the syscall number in R0 and 0 as an argument to SVC. A summary on the syscall interface for Linux is given in the man page for syscall(2):

The first table shows how a syscall is called:

    Arch/ABI    Instruction           System  Ret  Ret  Error    Notes
                                         call #  val  val2
       ───────────────────────────────────────────────────────────────────
       arm/OABI    swi NR                -       a1   -    -        2
       arm/EABI    swi 0x0               r7      r0   r1   -

And the second how arguments are passed to the syscalls:

   Arch/ABI      arg1  arg2  arg3  arg4  arg5  arg6  arg7  Notes
       ──────────────────────────────────────────────────────────────
       alpha         a0    a1    a2    a3    a4    a5    -
       arc           r0    r1    r2    r3    r4    r5    -
       arm/OABI      a1    a2    a3    a4    v1    v2    v3
       arm/EABI      r0    r1    r2    r3    r4    r5    r6

For bare-metal projects not running on Linux the EABI for Linux is not relevant. Instead the ABI for the Arm Architecture, specified by Arm, is used. As already mentioned the ABI is a set of standards. One element, the procedure call standard, defines an interface for function calls in between compilation units. Another important element of the ABI set of standards is the ELF extensions for Arm binaries, from which we already saw parts in chapter 2.

6.1 Data Types

The AAPCS defines composite types and fundamental types. A fundamental type is anyone of the C types listed in table 6.1 or 6.2. There are also fundamental types for floats, if you want a full list check the AAPCS. A composite type consists of one to many fundamental or other composite data types: For example a struct in C it will be handled as composite type. Each member of the struct is either another composite data type (another struct) or a fundamental data type. Since the definition is recursive (struct in struct, ie composite data type in composite data type), in the end each composite data type consists of a collection of fundamental data types.

struct myStruct7Word{
        int a;
        int b;
        int c;
        int d;
        int e;
        int f;
        char g;
};

struct myStruct7Word is a composite type, which consists of 7 fundamental types: int a to int f and char g.

Tables 6.1 and 6.2 shows the mapping from C types to machine types, their size in memory and their alignment boundary.

Terms explained in this chapter:

composite type
fundamental type
maschine type

Table 6.1: Integral types and their mapping to C types
Maschine Type	Size in Bytes	Alignment Boundary	C types
(signed and unsigned) byte	1	1	(signed and unsigned) char
(signed and unsigned) half-word	2	2	(signed and unsigned) short
(signed and unsigned) word	4	4	(signed and unsigned) int and long int
(signed and unsigned) double-word	8	8	(signed and unsigned) long long int

Table 6.2: Pointer types and their mapping to C types
Maschine Type	Size in Bytes	Alignment Boundary	C types
pointer (data and code	4	4	T, T (F)(), T&

Composite types can be splitted into their fundamental types, when they are passed as value into a subroutine. We are going to examine that in great detail in in chapter 6.2. Fundamental types are handled as single entity when a subroutine is called and are not splitted further.

6.2 Subroutine Call

Terms explained in this chapter:

parameter
argument
caller-saved register
callee-saved register
stack frame
stack layout

Let me first introduce the difference between parameters and arguments. When I speak about an argument, I mean the actual value passed into a function. The parameter on the other side is the corresponding variable, which is part of the declaration of the function.

6.2.1 Passing arguments

Let’s quickly look at an example for step number 5, to show how fundamental types are either passed entirely in registers or on the stack:

→ chapter 7, example 1

arm-none-eabi-gcc -c -mcpu=cortex-m33 -mthumb -Os -nostdlib main.c

extern void callee(int a, int b, int c, long long int d);

void caller(){
        callee(0,1,2,3);
}

We see 3 times word sized arguments (int) and one 8 byte sized double-word (long long int). What do we expect if we follow the routine above? The ints should be passed in R0 to R2. R3 would be empty, but since long long int is a fundamental data type is can’t be splitted further, d should be passed on stack.

arm-none-eabi-objdump --disassemble=caller main.o

main8.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <caller>:
   0:   b507            push    {r0, r1, r2, lr}
   2:   2300            movs    r3, #0
   4:   2203            movs    r2, #3
   6:   2101            movs    r1, #1
   8:   e9cd 2300       strd    r2, r3, [sp]
   c:   2000            movs    r0, #0
   e:   2202            movs    r2, #2
  10:   f7ff fffe       bl      0 <callee>
  14:   b003            add     sp, #12
  16:   f85d fb04       ldr.w   pc, [sp], #4

We see how first the long long int is loaded into R2 and R3 and then stored on the stack: The first 32 bits will be at SP+0 to SP+4, the next 32 bit then SP+4 to SP+8. SP is kept as it is. Then R0, R1 and R2 are prepared with the other arguments and callee() is called.

Let’s next look at an example for 3), where

a composite type is splitted into it’s fundamental types and passed in registers and
a composite type is splitted between registers and stack

→ chapter 7, example 2

arm-none-eabi-gcc -c -mcpu=cortex-m33 -mthumb -Os -nostdlib main.c

struct myStruct1Word {
        int a;
};

struct myStruct7Word{
        int a;
        int b;
        int c;
        int d;
        int e;
        int f;
        char g;
};

extern void callee1(struct myStruct1Word);
extern void callee7(struct myStruct7Word);

void caller(){
        struct myStruct1Word s;
        s.a = 1111;
        callee1(s);
        struct myStruct7Word t;
        t.a = 2222;
        t.g = 0xFF;
        callee7(t);
}

There are two structs defined: myStruct1Word and myStruct7Word. Note that myStruct7Word is actually 6 times an int, which is mapped to machine type of 4 bytes each and a char, which is 1 byte. They are local variables, which are located on the stack. Also remember: The structs are passed by value. The disassembly of caller():

arm-none-eabi-objdump --disassemble=caller main.o

main6.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <caller>:
   0:   b500            push    {lr}
   2:   f240 4057       movw    r0, #1111       ; 0x457
   6:   b08d            sub     sp, #52 ; 0x34
   8:   f7ff fffe       bl      0 <callee1>
   c:   f640 03ae       movw    r3, #2222       ; 0x8ae
  10:   9305            str     r3, [sp, #20]
  12:   23ff            movs    r3, #255        ; 0xff
  14:   f88d 302c       strb.w  r3, [sp, #44]   ; 0x2c
  18:   ab0c            add     r3, sp, #48     ; 0x30
  1a:   e913 0007       ldmdb   r3, {r0, r1, r2}
  1e:   e88d 0007       stmia.w sp, {r0, r1, r2}
  22:   ab05            add     r3, sp, #20
  24:   cb0f            ldmia   r3, {r0, r1, r2, r3}
  26:   f7ff fffe       bl      0 <callee7>
  2a:   b00d            add     sp, #52 ; 0x34
  2c:   f85d fb04       ldr.w   pc, [sp], #4

How does gcc set up the stack? First in line 8 stack space is reserved. Then R0 is loaded with s.a. Note: Despite s being stack variables, s.a is never stored in the stack frame, since it is never used in caller(). That was optimized away! The composite data type myStruct1Word is splitted into it’s fundamental type int and passed in the register R0. Then `callee1() is called.

Afterwards the preparations to call callee7() begin. For t, the following happens: First, write the values 2222 and 0xFF into memory (lines 11 - 14). Then the values t.e, t.f, t.g are copied on top of SP, to be passed as arguments to callee7():

Figure 6.1: Stack Layout during execution

After that the remaining 4 arguments (t.a to t.d) are loaded into R0-R3 and callee7() is called.

A more generalized stack layout would be:

Figure 6.2: Generalized stack layout

The dark grey parts of figure 6.2 depicts memory regions the currently running function owns. The arguments passed to the callee are at the bottom and the dark grey parts in the middle are used for local variables.

The light grey parts are:

previous function state storage: Values the callee preserved for the caller. The stored LR and the callee-saved registers at the bottom of stack.
the prepared arguments for the next function call, i.e. the arguments on top of the stack.

6.2.2 Return function values

How values are returned from functions is also part of the AAPS. The rules are very similar to passing arguments, with some important details differing:

For fundamental types and composite types of sizes smaller or equal to 32 bit R0 is used.
For fundamental types only: If the return value is bigger than 32 bit (e.g. a long long int) the parameter is splitted into consecutive registers, e.g. R0 and R1 up to R3, if necessary (128 bit return value).
For composite data types bigger than 32 bit bit only: The caller places a reference to memory he reserved into R0, which is then used by the callee to return the values.

Rules 1 and 2 are obvious and we saw in the previous chapter, when we examined how arguments are passed into a function, these rules in action. When a function returns a composite type things get interesting, so let’s examine the first of those examples:

→ chapter 7, example 3

arm-none-eabi-gcc -c -marm -Os -nostdlib main.c

struct myStruct1Word {
        int a;
};

struct myStruct2Word {
        int a;
        int b;
};
struct myStruct7Word{
        int a;
        int b;
        int c;
        int d;
        int e;
        int f;
        char g;
};

extern struct myStruct1Word callee1();
extern struct myStruct2Word callee2();
extern struct myStruct7Word callee7();

int caller(){
        struct myStruct1Word s;
        struct myStruct2Word t;
        struct myStruct7Word u;

        s = callee1();
        t = callee2();
        u = callee7();
        return s.a + t.a + u.a;
}

arm-none-eabi-objdump --disassemble=caller main.o

main9.o:     file format elf32-littlearm
                                                           
                                                           
Disassembly of section .text:
                                                           
00000000 <caller>:    
   0:   e92d4010        push    {r4, lr}
   4:   e24dd028        sub     sp, sp, #40     ; 0x28
   8:   ebfffffe        bl      0 <callee1>
   c:   e1a04000        mov     r4, r0                                                                                 
  10:   e28d0004        add     r0, sp, #4
  14:   ebfffffe        bl      0 <callee2>
  18:   e28d000c        add     r0, sp, #12
  1c:   ebfffffe        bl      0 <callee7>
  20:   e59d0004        ldr     r0, [sp, #4]
  24:   e0844000        add     r4, r4, r0
  28:   e59d000c        ldr     r0, [sp, #12]
  2c:   e0840000        add     r0, r4, r0
  30:   e28dd028        add     sp, sp, #40     ; 0x28
  34:   e8bd4010        pop     {r4, lr}
  38:   e12fff1e        bx      lr

First thing again in line 7 is to reserve some space on stack.

callee1() returns only an one word sized composite type (myStruct1Word), so according to the rules described above only R0 is used to return s, which is exactly was happend in line 10.

The return values from callee7() and callee2() are bigger than 32 bit, so we expect a reference passed to them in R0:

The address SP+4 is put into R0 and callee2() is called. The value for t.a is loaded from SP+4 in line 15. The same process but with the address SP+12 is done for callee7(). caller() reserved some space for myStruct2Word and myStruct7Word on his stack and passed a reference to these locations in R0 to callee2() respectively callee7(). Then towards the end the return value for caller() is calculated and moved into R0 (line 18). Last, caller() returns.

6.3 Exception Entry and Return

Terms explained in this chapter:

exception entry
exeption phases
state context

Figure 6.3: Exception Phases

An exception is an interruption of normal execution flow. This means that the code currently running is being stopped unforeseen during execution and the core switches to the interrupt handler for execution. This is be done right in the middle of any execution function. In the previous chapters we saw how the compiler and linker do prepare function calls in a structured and well-defined way to make sure the state context of one function is saved before transferring execution to another function. To implement exceptions nevertheless, the core takes care of a lot of state context saving, as we are going to see in the following chapter.

Embedded Systems Security and TrustZone

Chapter 6 AAPCS Arm ABI and Runtime view

6.1 Data Types

6.2 Subroutine Call

6.2.1 Passing arguments

6.2.2 Return function values

6.3 Exception Entry and Return

6.4 Excursus: IP register for Interworking