Chapter 5 Bare-metal firmware build and boot process (CMSIS Core(M))

5.1 Introduction

The following chapter examines an bare metal firmware example, which is based on Cortex Microcontroller Software Interface Standard (CMSIS). CMSIS is a “vendor-independent hardware abstraction layer for microcontrollers that are based on Arm Cortex processors.” Chances are high that you are going to encounter CMSIS in your embedded research projects and even if not, the overall concepts shown here pretty much should be the same.

Since the linker file is crucial for understanding the firmware layout, we are going to look into the linker file and the CMSIS source code in parallel and examine how they work together to form the firmware. We are examining a secure world firmware, but again: The concepts for non-secure and secure firmware are the same, non-secure side missing only the non-secure callable region.

Topics and concepts covered in this chapter:

  • The Boot process
  • Initialization of the RAM layout (.data, .bss)
  • Setup of the stack and heap
  • Setup the non-secure callable region

The following example is based on a STM32L5 SoC, however it is only used to have a descriptive, real-world example with real memory addresses to explain the concepts. Everything covered here is very similar for other systems and vendors.

5.2 Memory segments

In the following chapter we will walk through an actual linker script, which is used in example-stm32-p1. The linker files for non-secure and secure world are pretty similar, so we’ll leave the detailed examination of the non-secure linker script as a task to the reader and only examine the secure world linker script.

Let’s assume that we have decided to use the following memory map for our firmware:

Table 5.1: secure / non-secure callable regions in an example system
Type Segment Name security
attributes
from..to section overview
Flash ROM_NSC NSC 0x0C03 E000 -
0x0C03 FFFF
Explored in chapter 5.2.3
Flash ROM S 0x0C00 0000 -
0x0C03 DFFF
Explored in chapter 5.2.1
SRAM RAM S 0x3000 0000 -
0x3001 7FFF
Explored in chapter 5.2.2

The system memory map is predefined at a high level by Arm for M-profile architectures: See chapter 3.5. The further separation of the memory map into secure / non-secure / NSC depends on SoC design and configuration. An example where on bit[28] of the address is used to define the security attribution of the respective memory address, is shown in 4.5.


More on Linker Script:

  • Chapter 2.11: Linker Script
  • Chapter 5.2: CMSIS: Memory segments
  • Chapter 5.6: CMSIS: Heap and Stack

Let’s start our examination of the secure world linker script on the very top. The following code block shows the beginning of the script:

→ example-stm32-p1(/P1_TZEN/Secure/STM32L552ZETXQ_FLASH.ld)

/* Entry Point */
ENTRY(Reset_Handler)

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);    /* end of "RAM" Ram type memory */

_Min_Heap_Size = 0x200 ;    /* required amount of heap  */
_Min_Stack_Size = 0x400 ;   /* required amount of stack */

/* Memories definition */
MEMORY
{
  RAM   (xrw)   : ORIGIN = 0x30000000,  LENGTH = 96K    /* Memory is divided. Actual start is 0x30000000 and actual length is 256K */
  ROM   (rx)    : ORIGIN = 0x0C000000,  LENGTH = 248K    /* Memory is divided. Actual start is 0x0C000000 and actual length is 512K */
  ROM_NSC   (rx)    : ORIGIN = 0x0C03E000,  LENGTH = 8K    /* Non-Secure Call-able region */
}

The ENTRY commands sets the entry point of the executable object file, which is the location where program execution starts. In M-profile cores the execution starts at whatever value is loaded into PC in the Reset Vector. In the present example the system starts executing at 0x0C00.0000.

_estack defines a symbol and calculates the address where the initial stack pointer will point to. Loading the initial stack pointer into SP is the very first thing the core does after coming out of reset. The initial stack pointer value is the topmost value in the vector table, just before the Reset Vector address.

More on Stack:

  • Chapter 5.6: CMSIS: Heap and Stack
  • Chapter 3.4: Arm Architecture: Execution Modes and Privilege Levels
  • Chapter 3.7: Arm Architecture: General-Purpose registers, special-purpose registers
  • Chapter 6.2: AAPCS: Subroutine Call

In table 5.1 we defined three memory regions, segments, where code and data should be placed into. The MEMORYcommand defines these three segments. Chapter 5.2.1, chapter 5.2.2 and chapter 5.2.3 each describe one of the three segments by showing which input sections are written into the particular segment and what their purpose is.

  • Chapter 5.2.1 examines the segment “ROM.”
  • Chapter 5.2.2 examines the segment “RAM.”
  • Chapter 5.2.3 examines the segment “ROM_NSC.”

Let’s quickly recap the relation between an object file’s input regions, output regions and segments:

connection between object file input regions, output regions and segments

Figure 5.1: connection between object file input regions, output regions and segments

More on Linking:

  • Chapter 2.9: Relocation
  • Chapter 2.10: Veneers
  • Chapter 2.11: Linker Script
  • Chapter 5.3: CMSIS: ROM segment and Boot Process

5.2.1 ROM segment

The following output sections are written into the ROM segment:

Table 5.2: ROM output sections
output section explanation
.isr_vector Vector Table. Important part of boot process. See chapter @ref(sec:stm32i-firmware-boot-process.
.text This output section combines the following input sections:
- .text: compiled and assembled code.
- glue_7 and glue_7t. See chapter 2.12.
- .init and .fini. Libc related functions, see chapter 5.3.
.rodata Constants in the code.
.ARM and .ARM.extab These sections contain information for unwinding the stack, when an Exception is thrown in C++. For C-Projects these should be empty.
.preinit_array, .init_array, .fini_array Libc related functions. See chapter 5.3 for details.
.data initialized objects with static storage duration. Explored in chapter 5.3

More on Section:

  • Chapter 2.7: Sections
  • Chapter 2.7.3.1: .data section
  • Chapter 2.7.3.2: .bss section
  • Chapter 2.8: Literal Pool

5.2.2 RAM segment

The following output sections are written into the RAM segment:

Table 5.3: RAM output sections
section explanation
.data Details discussed in chapter 5.3 and chapter 5.6
._user_heap_stack Details discussed in chapter 5.6
.bss Details discussed in chapter 5.3

More on Section:

  • Chapter 2.7: Sections
  • Chapter 2.7.3.1: .data section
  • Chapter 2.7.3.2: .bss section
  • Chapter 2.8: Literal Pool

5.2.3 ROM_NSC segment

The following output sections are written into the ROM_NSC segment:

Table 5.4: ROM_NSC output sections
section explanation
.gnu.sgstubgs See chapter 5.4

More on Secure function calls:

  • Chapter 5.4: CMSIS: Non-Secure Callable segment
  • Chapter 6.2: AAPCS: Subroutine Call
  • Chapter 4.4.1: Banked Registers
  • Chapter 7.3: Details: Secure function call
  • Chapter 4.6.1: Overview: Secure function call

5.3 ROM segment and Boot Process

Terms explained in this chapter:

  • .word
  • .weak
  • Load Memory Address (LMA)
  • Virtual Memory Address (VMA)

We defined the address where the system starts executing to be 0x0C00.0000, which corresponds to the address ROM region will be loaded to. Since the linker script is processed from top to bottom, the very first input region the linker will write into ROM segment is the output section .isr_vector

→ example-stm32-p1(P1_TZEN/Secure/STM32L552ZETXQ_FLASH.ld)

SECTIONS
{
  /* The startup code into "ROM" Rom type memory */
  .isr_vector :
  {
    . = ALIGN(8);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(8);
  } >ROM

More on Vector Table:

  • Chapter 3.6: Arm Architecture:Exceptions, Interrupts, Faults
  • Chapter 5.3: CMSIS: ROM segment and Boot Process

The section .isr_vector is defined in the CMSIS Startup file named startup_<device >.s:

→ example-stm32-p1(P1_TZEN/Secure/Core/Startup/startup_stm32l552zetxq.s)

    .section    .isr_vector,"a",%progbits
    .type   g_pfnVectors, %object
    .size   g_pfnVectors, .-g_pfnVectors


g_pfnVectors:
    .word   _estack
    .word   Reset_Handler
    .word   NMI_Handler

→ example-stm32-p1(P1_TZEN/Secure/Core/Startup/startup_stm32l552zetxq.s)

    .weak   NMI_Handler
    .thumb_set NMI_Handler,Default_Handler

    .weak   HardFault_Handler
    .thumb_set HardFault_Handler,Default_Handler

More on Assembler Terms:

  • Chapter 2.4.1: Assembler Terms

The label g_pfnVectors: marks the beginning of the exception vector list, which consists of .word <exception vector symbol name> directives. The .word directive reserves space and defines the symbol at hand. In startup_<device>.s each of the exception vectors is defined as a .weak symbol, which makes sure that the Default_Handler is only used when there is no other symbol defined with the same name in any of the other linked object files. Hence, to implement an exception vector you just have to define a function with the particular exception vectors name, you want to implement. This will overwrite the weak bound DefaultHandler. If an exception handler is not implemented, the DefaultHandler will be used.

The Reset_Handler for example is defined in the startup_<device>.s file (see below.)


_estack, the topmost value in the vector table, is the initial value of the main stack pointer. This symbol was defined in the very beginning of the linker script. See chapter 5.2.

More on Stack:

  • Chapter 5.6: CMSIS: Heap and Stack
  • Chapter 3.4: Arm Architecture: Execution Modes and Privilege Levels
  • Chapter 3.7: Arm Architecture: General-Purpose registers, special-purpose registers
  • Chapter 6.2: AAPCS: Subroutine Call

More on Vector Table:

  • Chapter 3.6: Arm Architecture:Exceptions, Interrupts, Faults
  • Chapter 5.3: CMSIS: ROM segment and Boot Process

Now that we know how the exception vectors are set up, let’s look what happens on reset. The core is set up to start executing at 0x0C00 0000. On reset the first thing is to pop the first value from the vector table, which is the main stack pointer, into the SP register (vector_table[0]). Then the second entry in the vector table (vector_table[1]), which is the Reset_Handler, is called.

The core register VTOR can be used to relocate the vector table from a default to a new base address. ArmV8-M core support that features, but in Armv6-M VTOR is fixed at 0x0000.0000 and can’t be changed. Some SoC designers fix that by implementing a remapping feature in their bootloader.

→ example-stm32-p1(P1_TZEN/Secure/Core/Startup/startup_stm32l552zetxq.s)

  .section  .text.Reset_Handler
    .weak   Reset_Handler
    .type   Reset_Handler, %function
Reset_Handler:
  ldr   sp, =_estack    /* set stack pointer */

/* Copy the data segment initializers from flash to SRAM */
  movs  r1, #0
  b LoopCopyDataInit

CopyDataInit:
    ldr r3, =_sidata
    ldr r3, [r3, r1]
    str r3, [r0, r1]
    adds    r1, r1, #4

LoopCopyDataInit:
    ldr r0, =_sdata
    ldr r3, =_edata
    adds    r2, r0, r1
    cmp r2, r3
    bcc CopyDataInit
    ldr r2, =_sbss
    b   LoopFillZerobss
/* Zero fill the bss segment. */
FillZerobss:
    movs    r3, #0
    str r3, [r2], #4

LoopFillZerobss:
    ldr r3, = _ebss
    cmp r2, r3
    bcc FillZerobss

/* Call the clock system intitialization function.*/
    bl  SystemInit
/* Call static constructors */
    bl __libc_init_array
/* Call the application's entry point.*/
    bl  main

LoopForever:
    b LoopForever

.size   Reset_Handler, .-Reset_Handler

Reset_Handler starts and sets up the .data and .bss sections:

  • copy the .data section from flash into SRAM (CopyDataInit: and LoopCopyDataInit:)
  • initialize .bss with zeros in SRAM

The symbol _sidata holds the address in flash, where the .data is going to be written to when the system is flashed The symbols _sdata and _edata on the other side hold the start and end address of the region in SRAM, where .data is being loaded to by the Reset_Handler in the startup code. The labels CopyDataInit: and LoopCopyDataInit: mark routines which copy .data data from flash into SRAM.

→ example-stm32-p1(P1_TZEN/Secure/STM32L552ZETXQ_FLASH.ld)

  /* Used by the startup to initialize data */
  _sidata = LOADADDR(.data);

  /* Initialized data sections into "RAM" Ram type memory */
  .data : 
  {
    . = ALIGN(8);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */

    . = ALIGN(8);
    _edata = .;        /* define a global symbol at data end */
    
  } >RAM AT> ROM

In many systems there are different types of memory, most often flash and RAM. Firmware is programmed into flash and during reset the system is initialized and some data is copied from flash into RAM, e.g. the .data section. The linker supports this notion by allowing to differentiate between a Virtual Memory Address (VMA) and a Load Memory Address (LMA). The former is the address where data is loaded to and the latter is the address where data is loaded from. In systems where the whole ELF file is loaded into RAM LMA and VMA do not differ, which is the case on your personal computer at home for example.

By specifying >RAM AT> ROM, we define that .data is loaded from ROM into RAM. The linker can then use LOADADDR() to access the LMA of a symbol. In our example the startup code in the reset handler uses the LMA of _sidata to retrieve the address where from to read .data.

→ example-stm32-p1(P1_TZEN/Secure/Core/Startup/startup_stm32l552zetxq.s

  b LoopCopyDataInit
  
CopyDataInit:
    ldr r3, =_sidata
    ldr r3, [r3, r1]
    str r3, [r0, r1]
    adds    r1, r1, #4

LoopCopyDataInit:
    ldr r0, =_sdata
    ldr r3, =_edata
    adds    r2, r0, r1
    cmp r2, r3
    bcc CopyDataInit

The two routines initialize R3 with the LMA and R0 with the VMA of .data. CopyDataInit: starts at the LMA loads word by word relative to R3 and stores it relative to R0 into RAM. The offset used in R0 and R3 and number of words copied is tracked in R1 and the loop finishes as soon as all words are copied.


Next up is the .bss section:

→ example-stm32-p1(P1_TZEN/Secure/STM32L552ZETXQ_FLASH.ld)

  /* Uninitialized data section into "RAM" Ram type memory */
  . = ALIGN(8);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss section */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(8);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

We already learned in chapter 2 that .bss does not hold any data. All variables in .bss are initialized with zero, as you can see for yourself in the routines labeled FillZerobss and LoopFillZerobss in the startup code:

→ example-stm32-p1(P1_TZEN/Secure/Core/Startup/startup_stm32l552zetxq.s

    ldr r2, =_sbss
    b   LoopFillZerobss
/* Zero fill the bss segment. */
FillZerobss:
    movs    r3, #0
    str r3, [r2], #4

LoopFillZerobss:
    ldr r3, = _ebss
    cmp r2, r3
    bcc FillZerobss

The startup code uses the symbols provided in the linker script which defined the start (_sbss) and end address ( _ebss) of the output section .bss to initialize the section in RAM with zero. Nothing is loaded from flash.


After initializing .data and .bss, Reset_Handler proceeds and jumps to SystemInit, which initializes SAU (details in chapter 7.2) and VTOR for NS world. Before jumping to main __libc_init_array is called, which in turn calls all libc initialization functions located in the section .preinit_array, .init_array and .init. These were also written by the linker into the ROM section:

→ example-stm32-p1(example-stm32-p1/P1_TZEN/Secure/STM32L552ZETXQ_FLASH.ld)

  .preinit_array     :
  {
    . = ALIGN(8);
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
    . = ALIGN(8);
  } >ROM
  
  .init_array :
  {
    . = ALIGN(8);
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
    . = ALIGN(8);
  } >ROM

The symbols defined here, e.g. __init_array_start, are used by the C library (newlibc init.c) to calculate the number of entries in the array:

  #ifdef HAVE_INITFINI_ARRAY

  /* These magic symbols are provided by the linker.  */
  extern void (*__preinit_array_start []) (void) __attribute__((weak));
  extern void (*__preinit_array_end []) (void) __attribute__((weak));
  extern void (*__init_array_start []) (void) __attribute__((weak));
  extern void (*__init_array_end []) (void) __attribute__((weak));

  extern void _init (void);

  /* Iterate over all the init routines.  */
  void
  __libc_init_array (void)
  {
    size_t count;
    size_t i;

    count = __preinit_array_end - __preinit_array_start;
    for (i = 0; i < count; i++)
      __preinit_array_start[i] ();

    _init ();

    count = __init_array_end - __init_array_start;
    for (i = 0; i < count; i++)
      __init_array_start[i] ();
  }
  #endif

Each address placed into the sections .preinit_array and .init_array are called in __preinit_array_start[i]() and __init_array_start[i]()


A graphical summary of this chapter:

boot process graphical summary

Figure 5.2: boot process graphical summary

5.4 Non-Secure Callable segment

Terms explained in this chapter:

  • SG veneer
  • CSME import library

This chapter applies only for secure world firmware. As discussed in chapter 4 the state (secure or non-secure) is determined by the memory region the processor executes from. When the core executes code from non-secure memory, the state is NS. When the processor executes code from a region, which is attributed as secure the processor runs in secure mode and must fetch instructions from secure memory. The security attribution of memory regions is defined in IDAU and SAU.

More on Implementation Defined Attribution Unit:

  • Chapter 4.5: IDAU and SAU: Security attribution
  • Chapter 10.2.4: STM32L5: Memory Map

More on Security Attribution Unit:

  • Chapter 4.5: IDAU and SAU: Security attribution
  • Chapter 5.3: CMSIS: ROM segment and Boot Process
  • Chapter 7.2: SAU Initialization
  • Chapter 10.2.4: STM32L5: Memory Map

A quick recap on the transition from NS to S: Three requirements must be met for a successful transition to secure world:

  • the first instruction of the transition must be a SG instruction
  • the processor must be in NS state, when SG is executed
  • there is a special region, which is the only one allowed to hold SG instructions: the NSC region

The following chapter shows how the NSC region is defined in the linker script.

→ example-stm32-p1(P1_TZEN/Secure/STM32L552ZETXQ_FLASH.ld)

 .gnu.sgstubs :
  {
    . = ALIGN(8);
    *(.gnu.sgstubs*)   /* Secure Gateway stubs */
    . = ALIGN(8);
  } >ROM_NSC

The output section .gnu.sgstubs is written into the ROM_NSC segment. The input sections .gnu.sgstubs.* are secure gateway veneers (SG veneer) generated by the linker. Let’s go through the process using a very simple example and assume you defined the following function in your secure world application:

Compile the example with: arm-none-eabi-gcc -o nsc-test.o -mcpu=cortex-m33 --specs=nosys.specs -mcmse -mthumb -c nsc-test.c

int __attribute__((cmse_nonsecure_entry)) add10(int a){
        return 10+a;
}

To make a function in S world reachable from NS world you must add the function attribute cmse_nonsecure_entry to the function. The compiler will add two global symbols to the symbol table for the function add10(int) (readelf nsc-test.o):

    8: 00000001    34 FUNC    GLOBAL DEFAULT    1 add10
    9: 00000001     0 FUNC    GLOBAL DEFAULT    1 __acle_se_add10

Using the following simple linker script we link nsc-test.o: arm-none-eabi-ld -o nsc-test --cmse-implib --out-implib=./nsclib.o -T gnustubs.ld nsc-test.o

gnustubs.ld:

MEMORY {
        ROM_NSC (rx)    : ORIGIN = 0x0C03E000,  LENGTH = 8K
}
SECTIONS {
  .gnu.sgstubs :
    {
      . = ALIGN(8);
      *(.gnu.sgstubs*)
  } >ROM_NSC
}

When the linker recognizes two symbols at the same address, one prefixed with __acle_se_$name and another one $name, the linker will create a new input section named .gnu.sgstubs with a SG veneer for each function found. Obviously the shown linker script is far form complete and the linker will use a lot of default values for the memory regions and output sections necessary. Normally GCC uses a default linker script, but when compiling a CSME enabled firmware, we need to define the location of the non-secure callable segment explicitly to successfully build the firmware. In the example, the final binary will have the SG veneers placed in the ROM_NSC segment.

The parameters --csme-implib and --out-implib=./nsclib.o will create import library named nsclib.o. This file will contain a symbol for add10(int) without any code but pointing to the SG veneer in ROM_NSC:

Symbol table '.symtab' contains 2 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
   0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
   1: 0c03e041     8 FUNC    GLOBAL DEFAULT  ABS add10

You can now use nsclib.o when linking object files for NS world. The linker will relocate these add10() calls to the address where the SG veneer will be located (0x0c03e041) in NSC segment. Having an import library for secure world firmware allows to compile the non-secure firmware completely independent of the secure world firmware. The developers of secure word just pass the import library to the developers of the non-secure world, not even the toolchain used to develop non-secure firmware needs to be CSME aware.


More on Secure function calls:

  • Chapter 5.4: CMSIS: Non-Secure Callable segment
  • Chapter 6.2: AAPCS: Subroutine Call
  • Chapter 4.4.1: Banked Registers
  • Chapter 7.3: Details: Secure function call
  • Chapter 4.6.1: Overview: Secure function call

Let’s quickly have a look at the disassembly of a secure function call. In the following example toggle_led_secure() is the secure function, which is called from the non-secure main(). The linker not only added a SG veneer in NSC, but also a long-branch veneer, leading to the call flow:

  1. main() (NS) calls __toggle_led_secure_veneer()(NS)
  2. __toggle_led_secure_veneer() (NS) calls toggle_led_secure() (NSC)
  3. toggle_led_secure() (NSC) calls toggle_led_secure() (secure) (not shown in the following code example)

arm-none-eabi-objdump --disassemble=main P1_TZEN_NonSecure.elf

Disassembly of section .text:

08040238 <main>:
 8040238:       b580            push    {r7, lr}
 804023a:       af00            add     r7, sp, #0
 804023c:       f000 f9af       bl      804059e <HAL_Init>
 8040240:       f000 f890       bl      8040364 <MX_GPIO_Init>
 8040244:       f000 f80c       bl      8040260 <MX_ADC1_Init>
 8040248:       f000 f884       bl      8040354 <MX_RTC_Init>
 804024c:       f44f 7100       mov.w   r1, #512        ; 0x200
 8040250:       4802            ldr     r0, [pc, #8]    ; (804025c <main+0x24>)
 8040252:       f001 fb97       bl      8041984 <HAL_GPIO_TogglePin>
 8040256:       f002 f8c3       bl      80423e0 <__toggle_led_secure_veneer>
 804025a:       e7f7            b.n     804024c <main+0x14>
 804025c:       42020000        .word   0x42020000

More on Veneers:

  • Chapter 2.10: Veneers
  • Chapter 4.6.1: Overview: Secure function call

In main() before jumping into NSC to the SG veneer, the linker inserted an additional veneer at __toggle_led_secure_veneer(). From previous chapters we know why:

  • the SG veneer for toggle_led_secure() is at 0xc03.e009.
  • the bl __toggle_led_secure_veneer() in NS main() is at 0x0804.0256.
  • that results in a relative jump of 0x3FF DDB3 (0xc03.e009 - 0x0804.0256).
  • bl in Armv-8M allows offsets from –16777216 to 16777214 (0x00FF.FFFE)
  • Since the jump is too far (0x3FF.DDB3 > 0x00FF.FFFE) a veneer long-branch veneer (__toggle_led_secure_veneer()) is inserted.
  • The NS SG veneers address is stored in the literal pool (0x80423ec) and loaded from there in __toggle_led_secure_veneer().

arm-none-eabi-objdump --disassemble=__toggle_led_secure_veneer P1_TZEN_NonSecure.elf

Disassembly of section .text:

080423e0 <__toggle_led_secure_veneer>:
 80423e0:       b401            push    {r0}
 80423e2:       4802            ldr     r0, [pc, #8]    ; (80423ec <__toggle_led_secure_veneer+0xc>)
 80423e4:       4684            mov     ip, r0
 80423e6:       bc01            pop     {r0}
 80423e8:       4760            bx      ip
 80423ea:       bf00            nop
 80423ec:       0c03e009        .word   0x0c03e009

The NS world firmware was linked against the import library, so the call to toggle_led_secure() in NS world was relocated to point to the address of the SG veneer at 0x0c03.e008 in NSC:

arm-none-eabi-objdump --disassemble=toggle_led_secure P1_TZEN_Secure.elf

P1_TZEN_Secure.elf:     file format elf32-littlearm
                                                                                                                       
0c03e008 <toggle_led_secure>:                                                                                                                                                                                                                  
 c03e008:       e97f e97f       sg                         
 c03e00c:       f7c2 b942       b.w     c000294 <__acle_se_toggle_led_secure>                

The SG veneer switches to S world (SG) and then branches off to the actual implementation of toggle_led_secure() at __acle_toggle_led_secure():

arm-none-eabi-objdump --disassemble=__acle_se_toggle_led_secure P1_TZEN_Secure.elf

Disassembly of section .text:                                                                                          
                                                           
0c000294 <__acle_se_toggle_led_secure>:                                                                                
 c000294:       b580            push    {r7, lr}                                                                       
 c000296:       af00            add     r7, sp, #0                                                                     
 c000298:       2180            movs    r1, #128        ; 0x80      
 c00029a:       481d            ldr     r0, [pc, #116]  ; (c000310 <__acle_se_toggle_led_secure+0x7c>)
 c00029c:       f000 ffea       bl      c001274 <HAL_GPIO_TogglePin>                                                   
 c0002a0:       bf00            nop                                                                                    
 c0002a2:       46bd            mov     sp, r7                                                                         
 c0002a4:       e8bd 4080       ldmia.w sp!, {r7, lr}                                                                  
 c0002a8:       4670            mov     r0, lr                                                                         
 c0002aa:       4671            mov     r1, lr                                                                         
 c0002ac:       4672            mov     r2, lr                                                                         
 c0002ae:       4673            mov     r3, lr                                                                         
 c0002b0:       eeb7 0a00       vmov.f32        s0, #112        ; 0x3f800000  1.0
 c0002b4:       eef7 0a00       vmov.f32        s1, #112        ; 0x3f800000  1.0
 c0002b8:       eeb7 1a00       vmov.f32        s2, #112        ; 0x3f800000  1.0
 c0002bc:       eef7 1a00       vmov.f32        s3, #112        ; 0x3f800000  1.0
 c0002c0:       eeb7 2a00       vmov.f32        s4, #112        ; 0x3f800000  1.0
 c0002c4:       eef7 2a00       vmov.f32        s5, #112        ; 0x3f800000  1.0
 c0002c8:       eeb7 3a00       vmov.f32        s6, #112        ; 0x3f800000  1.0
 c0002cc:       eef7 3a00       vmov.f32        s7, #112        ; 0x3f800000  1.0
 c0002d0:       eeb7 4a00       vmov.f32        s8, #112        ; 0x3f800000  1.0
 c0002d4:       eef7 4a00       vmov.f32        s9, #112        ; 0x3f800000  1.0
 c0002d8:       eeb7 5a00       vmov.f32        s10, #112       ; 0x3f800000  1.0
 c0002dc:       eef7 5a00       vmov.f32        s11, #112       ; 0x3f800000  1.0
 c0002e0:       eeb7 6a00       vmov.f32        s12, #112       ; 0x3f800000  1.0
 c0002e4:       eef7 6a00       vmov.f32        s13, #112       ; 0x3f800000  1.0
 c0002e8:       eeb7 7a00       vmov.f32        s14, #112       ; 0x3f800000  1.0
 c0002ec:       eef7 7a00       vmov.f32        s15, #112       ; 0x3f800000  1.0
 c0002f0:       f38e 8c00       msr     CPSR_fs, lr        
 c0002f4:       b410            push    {r4}
 c0002f6:       eef1 ca10       vmrs    ip, fpscr
 c0002fa:       f64f 7460       movw    r4, #65376      ; 0xff60                                                       
 c0002fe:       f6c0 74ff       movt    r4, #4095       ; 0xfff                                                        
 c000302:       ea0c 0c04       and.w   ip, ip, r4
 c000306:       eee1 ca10       vmsr    fpscr, ip
 c00030a:       bc10            pop     {r4}
 c00030c:       46f4            mov     ip, lr
 c00030e:       4774            bxns    lr
 c000310:       52020800        .word   0x52020800

5.5 Calling non-secure world from secure world

This section is here as reference. The linker does not need to generate special regions for NS to S transitions. Everything needed to know is defined in chapter 4.6.2.

More on Nonsecure function calls:

  • Chapter 6.2: AAPCS: Subroutine Call
  • Chapter 4.4.1: Banked Registers
  • Chapter 4.6.2: Overview: Non-Secure function call
  • Chapter 7.4: Details: Non-secure function call

5.6 Heap and Stack

From the Linker file we also can learn a lot about the RAM layout of the device: For example where Heap and the Stack will be located. The output sections listed in table 5.3 are written sequentially into the RAM segment, in the order they were defined in the linker script:

Table 5.3: RAM output sections
section explanation
.data See chapter 5.3 and chapter 5.6
.bss See chapter 5.3
._user_heap_stack See chapter 5.6

The output section ._user_heap_stack sets up the heap and stack regions. 3 important symbols are defined in the beginning of the linker script:

  • _estack
  • _Min_Heap_Size
  • _Min_Stack_Size

→ example-stm32-p1(example-stm32-p1/P1_TZEN/Secure/STM32L552ZETXQ_FLASH.ld)

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);    /* end of "RAM" Ram type memory */

_Min_Heap_Size = 0x200 ;    /* required amount of heap  */
_Min_Stack_Size = 0x400 ;   /* required amount of stack */

/* Memories definition */
MEMORY
{
  RAM   (xrw)   : ORIGIN = 0x30000000,  LENGTH = 96K    /* Memory is divided. Actual start is 0x30000000 and actual length is 256K */
  ROM   (rx)    : ORIGIN = 0x0C000000,  LENGTH = 248K    /* Memory is divided. Actual start is 0x0C000000 and actual length is 512K */
  ROM_NSC   (rx)    : ORIGIN = 0x0C03E000,  LENGTH = 8K    /* Non-Secure Call-able region */
}

They are used to set up the size of the output section ._user_heap_stack:

→ example-stm32-p1(P1_TZEN/Secure/STM32L552ZETXQ_FLASH.ld)

 /* User_heap_stack section, used to check that there is enough "RAM" Ram  type memory left */
  ._user_heap_stack :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
    . = . + _Min_Stack_Size;
    . = ALIGN(8);
  } >RAM

The location counter . in a linker script always contains the current output section location. When you assign to the location counter, as done here, the location moves and creates free space in the output section.

The final layout of the RAM segment:

  • The segment will start at 0x3000.0000, as defined in the MEMORY command of the linker script
  • First .data, then .bss is written into RAM.
  • Then the location counter is moved by _Min_Heap_Size and _Min_Stack_Size forward.

This results in the RAM layout showed in figure 5.3.

RAM layout

Figure 5.3: RAM layout

If the sum of the length of .data, .bss, stack and heap exceeds the LENGTH of RAM, the linker will notify the developer.

The symbol _estack is set to be at the highest address of RAM, which is 0x3001 7fff (0x3000 000 + 96 KB). Stack grows upwards as you put data on it, towards the heap. The beginning of heap is defined by the symbol end (or _end). These symbols are provided to the C library and used to initialize the heap. When memory on heap is allocated, malloc() calls the system call sbrk() to acquire additional space on heap:

→ example-stm32-p1(P1_TZEN/Secure/Core/Src/sysmem.c)

register char * stack_ptr asm("sp");

/* Functions */

/**
 _sbrk
 Increase program data space. Malloc and related functions depend on this
**/
caddr_t _sbrk(int incr)
{
    extern char end asm("end");
    static char *heap_end;
    char *prev_heap_end;

    if (heap_end == 0)
        heap_end = &end;

    prev_heap_end = heap_end;
    if (heap_end + incr > stack_ptr)
    {
        errno = ENOMEM;
        return (caddr_t) -1;
    }

    heap_end += incr;

    return (caddr_t) prev_heap_end;
}

_sbrk() uses the provided symbol end to initialize the lowest address of heap when called for the first time. To check if enough memory is available, i.e. stack and heap do not collide, it checks that the current stack pointer address is lower than the current heap address plus the requested amount of bytes.

More on Stack:

  • Chapter 5.6: CMSIS: Heap and Stack
  • Chapter 3.4: Arm Architecture: Execution Modes and Privilege Levels
  • Chapter 3.7: Arm Architecture: General-Purpose registers, special-purpose registers
  • Chapter 6.2: AAPCS: Subroutine Call

5.7 ELF to .bin

When flashing a SoC we do not write a ELF file into flash, but an additional step of firmware image creation happens. How the final firmware image is build from an ELF file depends on the target, but for STM32L5 it is simply:

arm-none-eabi-objcopy -O binary "P1_TZEN_Secure.elf" "P1_TZEN_Secure.bin"

objcopy makes a memory dump of the ELF file. It starts at the load memory address (PhysAddr) of the lowest segment and copies contents from the ELF segment into the .bin file, starting at 0x0. It also fills gaps between segments with zeros. For example the S world ELF:

arm-none-eabi-readelf P1_TZEN_Secure.elf (segments and program headers shown)

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x010000 0x0c000000 0x0c000000 0x03270 0x03270 RWE 0x10000
  LOAD           0x020000 0x30000000 0x0c003270 0x00010 0x00010 RW  0x10000
  LOAD           0x02e000 0x0c03e000 0x0c03e000 0x00020 0x00020 R E 0x10000
  LOAD           0x030010 0x30000010 0x30000010 0x00000 0x00658 RW  0x10000

 Section to Segment mapping:
  Segment Sections...
   00     .isr_vector .text .rodata .init_array .fini_array 
   01     .data 
   02     .gnu.sgstubs 
   03     .bss ._user_heap_stack 
  • File bin file will start with the contents of segment 00 (0x0C00.0000)
  • Directly followed by the contents of segment 01, which starts at the file offset 0x0000.3270 (0x0c00.3270 - 0x0c00.0000)
  • The space between segment 01 and 02 is filled with 0 until file offset 0x0003.e000, where 02 starts
  • segment 02 content
  • segment 03 is empty and has no content. Nothing will be written into the bin.

The final binary blob is written into STM32L5 at flash offset 0x0C00.0000. When the device starts booting from 0x0c00.0000, all offsets in the binary fit and the firmware runs successfully.