Chapter 3 Basics: Arm M-profile architectures and Cortex-M
3.1 Introduction
This chapter provides an overview on the concepts, which are universal across all Arm M-profile architecture cores and all Cortex-M implementations. Once you get the universal concepts you become much faster efficient in understanding different implementations from different vendors, for example systems implemented by STMicroelectronics or Silicon Labs using Arm Cortex-M cores.
3.2 Architectures and Core Implementations
The Arm architecture defines things like the instruction sets, registers and the memory system. Vendors can either buy only the architecture license and implement their own core using the licensed intellectual property (IP) or they can buy an additional IP-core license, which then includes IP for a whole microcontroller, including debug interfaces, interfaces to the bus, an interrupt controller, maybe even MPUs and FPUs.
Arm has three architecture profiles:
Application profile (-A): This is what you most would encounter in a smartphone, some notebooks or even servers.
Application profiles implement a traditional ARM architecture with multiple modes and support a virtual memory system architecture based on an MMU. These profiles support both ARM and Thumb instruction sets. [3]
Real-Time profile (-R): A R-profile core could also be in a Smartphone but more in the baseband processor. For example the Cortex-R8 designed for 5G modems.
Real-time profiles implement a traditional ARM architecture with multiple modes and support a protected memory system architecture based on an MPU. [3]
Microcontroller profile (-M): This profile is most common in these things, which are going to make a huge part of our smart world: Maybe your next refrigerator?
Microcontroller profiles implement a programmers’ model designed for fast interrupt processing, with hardware stacking of registers and support for writing interrupt handlers in high-level languages. The processor is designed for integration into an FPGA and is ideal for use in very low power applications. [3]
In this chapter we are focusing on the M-profile, whereof 3 exist:
- Armv6-M
- Armv7-M
- Armv8-M
These architectures profiles are implemented by different cores:
- Cortex-M0, -M0+, -M1 implements Armv6-M
- Cortex-M3, -M4, -M7 implements Armv7-M
- Cortex-M23, -M33 implements Armv8-M
The profiles and the cores implementing the profiles differ a lot in features, for example in the supported instruction set or hardware features. The following chapters aim to show the basic concepts universal to all cores and profiles. Even with some major differences it is important to know the commonalities, because knowing those eases the process of learning about about the differences and learning about new, embedded related topics.
When an address or feature in the following chapters is marked as being “implementation defined”, that means that Arm does not predefine the value, but the implementers need to define that value.
3.3 Arm -M profile (Armv6-M, Armv7-M, Armv8-M)
The Arm M-profile ranges from low performance, small size microcontrollers (e.g. Cortex-M0) to high increasing performance, bigger size ones (e.g. Cortex-M33). In general M-profile cores are optimized for cost and energy-efficiency. They are used in millions and millions of consumer devices and you, I am pretty convinced, have at least a handful of them already deployed somewhere around you.
Wikipedia has a nice overview on features available in the different Cortex-M cores. There you will also find a list of microcontrollers using Cortex-M cores.
3.4 Execution Modes and Privilege Levels
Terms explained in this chapter:
- Execution Mode
- Privilege Level
- Shadowed Stack Pointer
- Process Stack Pointer
- Main Stack Pointer
- Handler Mode
- Thread Mode
If you have a bit experience in in the world of classical computers and especially security, you might already be familiar with the idea of different privileges in a system. The kernel on your Linux box for example has higher privileges than your non-root user. The M-profile implements different features which allow running an operating system having similar compartmentalization features as to have a higher privileged kernel and lower privileged applications. This has the advantage that a crashed application does not influence the whole systems stability.
Arm cores support this compartmentalization by defining so-called privilege levels. A normal application without privileges would execute in thread mode without privileges. The operating system would also run in thread mode, but with higher privileges: privileged mode. Privileged mode allows write access to some registers, which do influence the operation of the whole core. When the core runs an exception handler (details in 3.6), for example when a peripheral needs the attention of the core to handle data arriving at the peripheral, the core runs in handler mode. Handler mode is always privileged.
When a Memory Protection Unit (MPU) is available in the core the different access permissions to memory can be assign to unprivileged thread mode and thus prevent applications influencing the OS kernel.
The core always start in privileged thread mode.
Simple use cases obviously can ignore unprivileged thread mode and switch only between privileged handler and thread mode. Complex embedded operating systems like FreeRTOS or Zephyr use these features to separate user threads and kernel threads.
The Monodon firmware, which I implemented for the Embedded Protocols section, uses Interrupts to send and receive data via USART and I2C. It also has priorities configured for the respective interrupt handlers. The STM32 HAL is used to configure the NVIC. Check it out to see an example for the Cortex-M NVIC in a real world system like the Bluepill.
A Shadowed Stack Pointer allow the stack for privileged and unprivileged code to be in different regions of memory, which further improves the stability of the system. M-profile cores have Main Stack Pointer (MSP) and a Process Stack Pointer (PSP). By using MSP in privileged code and the PSP in unprivileged, setups like the ones shown in figure 3.2 are possible. Developers can either use MSP
or PSP
registers directly or use the alias SP
, which always points to the currently selected stack pointer. To use PSP the cores have to be configured accordingly. In the default state only MSP
is used.
More on Arm Privilege Levels:
3.4.1 Configuration
Privilege Levels and the shadowed stack pointer are configured the following registers:
- The privilege level in Threat mode is configured in
CONTROL.nPRIV
. - Which stack pointer should be used is configured in
CONTROL.SPSEL
The CONTROL
register is written and read using special register configuration instructions. See 3.7.2 for details.
3.5 Memory System
This chapter introduces the following concepts and terms:
Terms explained in this chapter:
- System Control Block
- System Control Space
- System Address Map
M-profile implementations have a flat 32 bit address space resulting in 4 gigabyte of addressable memory. Some regions of the memory map are predefined by Arm, so they are common across all implementations such as the Cortex-M. These regions are:
- code
- SRAM
- internal peripherals
- external RAM
- external peripheral
- the system region.
The different cores differ how the core is connected to the memory, RAM and the peripherals. Some Cortex-M have multiple bus interfaces, one for instruction fetch when an address in code region is accessed by the core and a second one for read and write to SRAM, maybe even a third one for accesses to peripherals. Others only have one bus interface.
The Cortex-M33 for example has the C-AHB and the S-AHB bus. Fetches for code regions are made with C-AHB, accesses to other regions using S-AHB interface.
Future versions of this resource will include a chapter on the Arm bus system.
The system address map is splitted into regions of 0.5 GB each. Vendors might split each region further to implement different features, but having a rough understanding of the structure of the Arm M-profile system address map is especially helpful when examining firmware through reverse engineering. The following address map applies to all cores implementing a M-profile cores, for example all Cortex-M.
Address | Region | Example Usage [4] |
---|---|---|
0x0000.0000- 0x1FFF.FFFF (0.5 GB) | Code | ROM or flash memory |
0x2000.0000- 0x3FFF.FFFF (0.5 GB) | SRAM | on-chip RAM |
0x4000.0000- 0x5FFF.FFFF (0.5 GB) | Peripheral | on-chip peripherals |
0x6000.0000- 0x7FFF.FFFF (0.5 GB) | RAM | cached memory (write-back write-through) |
0x8000.0000- 0x9FFF.FFFF (0.5 GB) | RAM | write-through memory |
0xA000.0000- 0xDFFF.FFFF (1 GB) | Devices | shared (0.5GB) and non-shared devices (0.5GB) |
0xE000.0000- 0xE00F.FFFF (1 MB) | Private Peripheral Bus (PPB) | Used by the core. Includes System Control Space, Debug |
0xE010.0000- 0xFFFF.FFFF (499 MB) | Vendor Region | Vendor specific |
Consider you find the following function call during reverse engineering a firmware:
:
0c000294: b580 push {r7, lr}
c000294: af00 add r7, sp, #0
c000296: 2180 movs r1, #128
c000298: 481d ldr r0, [pc, #116] ;0x29a+2+116 = 0x310
c00029a: f000 ffea bl c001274
c00029c[... snip ...]
: 52020800 .word 0x52020800 c000310
0x52020800
is loaded into R0
, 0x128 into R1
. Then a branch to c001274
happens. Without having the system address map in mind 0x52020800
could be basically everything, but if you consult the address map you see that it is within the region of on-chip peripherals. Now you have you first lead to find the purpose of that function: You can assume it is accessing any peripherals. 0x128
could be an index and 0x52020800
the offset. In fact, the example shows code which toggles a GPIO to enable a LED on a STM32L5 development board.
The Private Peripheral Bus is a core-internal bus, which connects debug ports and similar stuff to the execution unit. Registers related to the core and some core-internal peripherals like the NVIC are contained in the System Control Space (SCS).
The SCS provides registers for control, configuration and status reporting of the core. The SCS ranges from 0xE000.E000
to 0xE000.EFFF
and contains configuration registers for core components like the NVIC (see 3.6.2). Within the SCS there is the System Control Block which ranges from 0xE000.ECFC
to 0xE000.ED8C
. Within the SCB there are registers to get the status and control system exceptions and relocate the vector table. (see 3.6)
3.6 Exceptions and Interrupts
This chapter introduces the following concepts and terms:
Terms explained in this chapter:
- Exceptions
- System Exceptions
- Interrupts
- Faults
- Vector Table
- Interrupt Service Routine
- Fault handlers
- System handlers
- Exception Masking
- NVIC
If an event happens, e.g. in the core or on any peripheral, the core has to handle that event. This handling is done in exceptions. There are 3 categories of exceptions:
- Interrupt: Exceptions issued by Peripherals
- System Exceptions: Exceptions defined by the core, they are part of the Arm M-profile specifications.
- Faults: Special kind of system exception. They also are part of the specifications but are, mostly, irrecoverable.
Additionally to an event triggered in the core or peripheral an exception can be executed by software.
The core handles exceptions using exception handlers:
- Interrupt Service Routines: handle Interrupts
- Fault handlers: handle faults
- System handlers: handle system exceptions, which are not faults.
Each of these handlers has a different priority.
The exception handlers are organized in a vector table, which is basically an organized list of references to the first instruction of an exception handler.
A-profile cores have also a vector table, but their tables do not contain references but instructions. So a A-profile vector table is basically a list of branch instructions. The M-profile on the other hand is a list of references.
We just introduced a ton of new and important terms. Let’s quickly summarize them in a graphical overview to really get familiar:
The vector table is an organized list of addresses of exception handlers. The vector table is organized in two sections and each vector has an associated exception number. Number 1-15 are reserved for the core and contain the fault and system handlers. Exception number 16+N are implementation defined and used for Interrupts. How many Interrupts a core can handle differs across the M-profiles. The Cortex-M33 for example can handle up to 480 Interrupts.
Each Exception has a priority, which is a signed integer. The smaller the number, the higher the priority.
Exception | Exception Number | Priority | Description |
---|---|---|---|
Reset | 1 | -3 | Taken when system starts |
NMI5 | 2 | -2 | implementation specific. |
HardFault | 3 | -1 | last fault escalation, when no other fitted |
MemManage Fault6 | 4 | configurable | MMU fault, e.g .XN-Bit violation |
BusFault | 5 | configurable | Error on the AHB bus |
UseageFault | 6 | configurable | Exception taken when code tries to access a co-processor |
Reserved | 7-10 | configurable | |
SVCall | 11 | configurable | SuperVisor Call: OS-call |
DebugMonitor7 | 12 | configurable | Debug Events, breakpoints,.. |
Reserved | 13 | configurable | |
PendSV | 14 | configurable | OS-specific |
SysTick | 15 | configurable | System Tick Timer, Exception generated by a Timer which is on the core. |
When different privilege levels are used, and an operating system is run on the microcontroller, system exception like SVCall and PendSV are important. The SVCall is a supervisor call, which lower privilege applications can use to request privileged operations or access resources governed by the operating system. To indicate what operation the application requests a SVCall has a SVC number associated with it.
Linux does not use the SVC number but rather ignore it. The actual system call requested by an application is passed into Linux syscalls using registers, the SVC number can be basically anything for a SVCall.
See chapter 6 for some details.
The NMI is an interrupt, which is not maskable (see 3.6.3). When this interrupt is triggered is implementation defined by the SoC vendor.
On some cores the location of the vector table be configured using the VTOR
register, which is part of the SCB. The address on reset is implementation defined. For Armv6-M VTOR
is optional.
In table 3.1 the exception number shows the position of the exception handler in the vector table. The vector table starts with the Reset vector at position 1 and position 0 is where the Main Stack Pointer (MSP) is located. The very first thing Cortex-M cores do is to load the main stack pointer to setup the stack and then execute the Reset exception handler.
Each exception handler has either a configurable or a fixed priority. The smaller the number, the higher the priority: -3 being the highest priority, which only the reset handler has. When an exception with higher priority than the current execution has is triggered, the new exception preempts the current exception. This is called nested exception.
When an exception handler for one exception is executing, it has the state Active. There are 4 types of exception states:
- Active: The core has started executing the corresponding handler. A exception is active even when another exception preempted it. It is active as long as it has not yet returned from the handler.
- Pending: The exception was generated, but the core has not yet started executing the corresponding handler.
- Inactive: Neither pending nor active.
- Active and Pending: One instance of the exception is active and a second instance of the exception is pending. This state is only available for asynchronous exceptions.
Obviously there are many configuration and implementation details associated with exceptions. There are different available priority levels, strategies how to handle exception entry, exception return, nested scenarios, and so on. All these details might be interesting but not important for reverse engineering and hacking embedded firmwares.
System exceptions (exception number 1-15) are part of the core. Interrupts are reported to the NVIC (see 3.6.2), which is a separate but closely related block to the processing unit in the core. Since system exceptions are part of the core, they are configured in the SCB. Interrupts on the other hand are configured in the NVIC.
More on Arm Exception System:
- Chapter 3.6.2: Arm Architecture:NVIC
- Chapter 3.6.3: Arm Architecture: Masking
- Chapter 6.3: Exception Entry and Return
- Chapter 10.2.5.2: STM32L5: TrustZone Illegal Access Controller (TZIC)
- Chapter 3.4: Arm Architecture: Execution Modes and Privilege Levels
- Chapter 3.6: Arm Architecture:Exceptions, Interrupts, Faults
- Chapter 7.5: Secure and non-secure exceptions
3.6.1 System Exception and Vector Table Configuration
System exceptions and the location of the vector table are configured in the System Control Block. The most important registers to configure system exceptions are:
- Clearing pending state and setting pending state of NMI, PendSV, SysTick:
ICSR
(Interrupt Control and State Register) - Enable of faults and reading pending status of all system exceptions:
SHCSR
(System Handler Control and State Register) - Offset from 0x0000.0000 where the vector table is located.
VTOR
3.6.2 NVIC
M-profile architecture provides an interrupt controller, which is closely integrated to the core. The number of external interrupt lines supported depends on the architecture and how many interrupts are actually used depends on the SoC vendor, e.g. on the number of peripherals connected to the interrupt controller. The integrated interrupt controller in M-profile cores is called Nested Vectored Interrupt Controller.
Each interrupt can be enabled and disabled by software and they all have the same life cycle from active to pending and inactive, as mentioned in 3.6.
Interrupt lines connect peripherals to the NVIC and the core to the NVIC. When an exception is triggered this is reported to the core, which then runs the corresponding handler.
3.6.2.1 Interrupt Configuration
Depending on the architecture a maximum number of possible exceptions is defined. For Armv8-M that is 480, then n would be n = {0..15}.
Enable or disable interrupts:
- Enable or read enable state interrupts:
NVIC_ISERn
registers. (Interrupt Set-Enable Register) - Disable of read enable state of interrupts:
NVIC_ICERn
registers (Interrupt Clear-Enable Register)
Set priority:
- Priorities are configured using the
NVIC_IPRn
registers. (Interrupt Priority Register)
Set or clear pending state and active state:
- Enable or read pending state:
NVIC_ISPRn
(Interrupt Set-Pending Register) - Clear or read pending state:
NVIC_ICPRn
(Interrupt Set-Pending Register) - Read active state:
NVIC_IABRn
(Interrupt Active Bit Register)
3.6.3 Masking
As you might know from work, sometimes it is important to have some time for uninterrupted work. The same applies for an core executing an important task, for example something real-time critical. To have have some uninterrupted time the core can mask exceptions. Masking means that the current execution priority is boosted to a specific value, leading to the fact that all lower exceptions are not executed. There are three masking registers provided and each boosts the priority to a specific level:
- PRIMASK: Set current execution priority to 0, masking all exceptions except NMI and HardFault.
- FAULTMASK: Set current execution priority to -1, masking even HardFault.
- BASEPRI: Configurable Priority Masking
Armv6-M has some limitations on masking, for example only PRIMASK is available.
3.7 Registers
Terms explained in this chapter:
- General-Purpose Register
- Special-Purpose Register
- variable registers
- argument registers
- Link Register
- Stack Pointer
3.7.1 General-purpose registers
There are 15 general-purpose 32 bit registers in all M-profile architecture cores:
- R0-R12
Two of them being named:
- R13: SP (Stack Pointer)
- R14: LR (Link Register)
Additionally there is the Program Counter:
- R15: PC (Program Counter)
Sometimes registers are referenced to by their aliases, which are:
Register | Alias |
---|---|
R9 | SB or Static Base |
R10 | SL or Stack Limit |
R11 | FP or Frame Pointer |
R12 | IP or intra-procedure-call scratch |
R13 | SP or Stack Pointer |
R14 | LR or Link Register |
R15 | PC or Program counter |
Registers can also be categorized into argument (or scratch) registers and variable registers. Argument registers are used to pass arguments between functions (more on that in chapter 6) and variable registers are used within functions to store and work with variable values.
3.7.2 Special-purpose registers
Which special-purpose registers are available depends on the M-profile version. For ArmV8-M the special-purpose registers include the registers used for masking exceptions (3.6.3). For all M-profile architecture cores the CONTROL
register is also part of the special-purpose registers. The CONTROL
register holds bitfields to control the privilege levels thread mode is running in, which stack pointer is used (MSP
and PSP
or only MSP
) and other important values, depending on the core.
Special-purpose registers are accessible using to dedicated instructions, which you might encounter when reverse engineering code:
MSR
: Write: Moves a value from a general-purpose register to a special-purpose register.(Move to Special register from Register).MRS
: Read: Moves the content from a special-purpose register to a general-purpose register (Move to Register from Special register).