Chapter 3 Basics: Arm M-profile architectures and Cortex-M

3.1 Introduction

This chapter provides an overview on the concepts, which are universal across all Arm M-profile architecture cores and all Cortex-M implementations. Once you get the universal concepts you become much faster efficient in understanding different implementations from different vendors, for example systems implemented by STMicroelectronics or Silicon Labs using Arm Cortex-M cores.

3.2 Architectures and Core Implementations

The Arm architecture defines things like the instruction sets, registers and the memory system. Vendors can either buy only the architecture license and implement their own core using the licensed intellectual property (IP) or they can buy an additional IP-core license, which then includes IP for a whole microcontroller, including debug interfaces, interfaces to the bus, an interrupt controller, maybe even MPUs and FPUs.

Arm has three architecture profiles:

Application profile (-A): This is what you most would encounter in a smartphone, some notebooks or even servers.

Application profiles implement a traditional ARM architecture with multiple modes and support a virtual memory system architecture based on an MMU. These profiles support both ARM and Thumb instruction sets. [3]

Real-Time profile (-R): A R-profile core could also be in a Smartphone but more in the baseband processor. For example the Cortex-R8 designed for 5G modems.

Real-time profiles implement a traditional ARM architecture with multiple modes and support a protected memory system architecture based on an MPU. [3]

Microcontroller profile (-M): This profile is most common in these things, which are going to make a huge part of our smart world: Maybe your next refrigerator?

Microcontroller profiles implement a programmers’ model designed for fast interrupt processing, with hardware stacking of registers and support for writing interrupt handlers in high-level languages. The processor is designed for integration into an FPGA and is ideal for use in very low power applications. [3]

In this chapter we are focusing on the M-profile, whereof 3 exist:

Armv6-M
Armv7-M
Armv8-M

These architectures profiles are implemented by different cores:

Cortex-M0, -M0+, -M1 implements Armv6-M
Cortex-M3, -M4, -M7 implements Armv7-M
Cortex-M23, -M33 implements Armv8-M

The profiles and the cores implementing the profiles differ a lot in features, for example in the supported instruction set or hardware features. The following chapters aim to show the basic concepts universal to all cores and profiles. Even with some major differences it is important to know the commonalities, because knowing those eases the process of learning about about the differences and learning about new, embedded related topics.

When an address or feature in the following chapters is marked as being “implementation defined”, that means that Arm does not predefine the value, but the implementers need to define that value.

3.3 Arm -M profile (Armv6-M, Armv7-M, Armv8-M)

The Arm M-profile ranges from low performance, small size microcontrollers (e.g. Cortex-M0) to high increasing performance, bigger size ones (e.g. Cortex-M33). In general M-profile cores are optimized for cost and energy-efficiency. They are used in millions and millions of consumer devices and you, I am pretty convinced, have at least a handful of them already deployed somewhere around you.

Wikipedia has a nice overview on features available in the different Cortex-M cores. There you will also find a list of microcontrollers using Cortex-M cores.

3.4 Execution Modes and Privilege Levels

Terms explained in this chapter:

Execution Mode
Privilege Level
Shadowed Stack Pointer
Process Stack Pointer
Main Stack Pointer
Handler Mode
Thread Mode

If you have a bit experience in in the world of classical computers and especially security, you might already be familiar with the idea of different privileges in a system. The kernel on your Linux box for example has higher privileges than your non-root user. The M-profile implements different features which allow running an operating system having similar compartmentalization features as to have a higher privileged kernel and lower privileged applications. This has the advantage that a crashed application does not influence the whole systems stability.

Arm cores support this compartmentalization by defining so-called privilege levels. A normal application without privileges would execute in thread mode without privileges. The operating system would also run in thread mode, but with higher privileges: privileged mode. Privileged mode allows write access to some registers, which do influence the operation of the whole core. When the core runs an exception handler (details in 3.6), for example when a peripheral needs the attention of the core to handle data arriving at the peripheral, the core runs in handler mode. Handler mode is always privileged.

When a Memory Protection Unit (MPU) is available in the core the different access permissions to memory can be assign to unprivileged thread mode and thus prevent applications influencing the OS kernel.

The core always start in privileged thread mode.

Figure 3.1: M-profile Execution Modes

Simple use cases obviously can ignore unprivileged thread mode and switch only between privileged handler and thread mode. Complex embedded operating systems like FreeRTOS or Zephyr use these features to separate user threads and kernel threads.

The Monodon firmware, which I implemented for the Embedded Protocols section, uses Interrupts to send and receive data via USART and I2C. It also has priorities configured for the respective interrupt handlers. The STM32 HAL is used to configure the NVIC. Check it out to see an example for the Cortex-M NVIC in a real world system like the Bluepill.

A Shadowed Stack Pointer allow the stack for privileged and unprivileged code to be in different regions of memory, which further improves the stability of the system. M-profile cores have Main Stack Pointer (MSP) and a Process Stack Pointer (PSP). By using MSP in privileged code and the PSP in unprivileged, setups like the ones shown in figure 3.2 are possible. Developers can either use MSP or PSP registers directly or use the alias SP, which always points to the currently selected stack pointer. To use PSP the cores have to be configured accordingly. In the default state only MSP is used.

Figure 3.2: MSR and PSP: Stack pointers

3.4.1 Configuration

Privilege Levels and the shadowed stack pointer are configured the following registers:

The privilege level in Threat mode is configured in CONTROL.nPRIV.
Which stack pointer should be used is configured in CONTROL.SPSEL

The CONTROL register is written and read using special register configuration instructions. See 3.7.2 for details.

3.5 Memory System

This chapter introduces the following concepts and terms:

Terms explained in this chapter:

System Control Block
System Control Space
System Address Map

M-profile implementations have a flat 32 bit address space resulting in 4 gigabyte of addressable memory. Some regions of the memory map are predefined by Arm, so they are common across all implementations such as the Cortex-M. These regions are:

code
SRAM
internal peripherals
external RAM
external peripheral
the system region.

The different cores differ how the core is connected to the memory, RAM and the peripherals. Some Cortex-M have multiple bus interfaces, one for instruction fetch when an address in code region is accessed by the core and a second one for read and write to SRAM, maybe even a third one for accesses to peripherals. Others only have one bus interface.

The Cortex-M33 for example has the C-AHB and the S-AHB bus. Fetches for code regions are made with C-AHB, accesses to other regions using S-AHB interface.

Future versions of this resource will include a chapter on the Arm bus system.

The system address map is splitted into regions of 0.5 GB each. Vendors might split each region further to implement different features, but having a rough understanding of the structure of the Arm M-profile system address map is especially helpful when examining firmware through reverse engineering. The following address map applies to all cores implementing a M-profile cores, for example all Cortex-M.

System address map of Arm6-M, Armv7-M, Armv8-M
Address	Region	Example Usage [4]
0x0000.0000- 0x1FFF.FFFF (0.5 GB)	Code	ROM or flash memory
0x2000.0000- 0x3FFF.FFFF (0.5 GB)	SRAM	on-chip RAM
0x4000.0000- 0x5FFF.FFFF (0.5 GB)	Peripheral	on-chip peripherals
0x6000.0000- 0x7FFF.FFFF (0.5 GB)	RAM	cached memory (write-back write-through)
0x8000.0000- 0x9FFF.FFFF (0.5 GB)	RAM	write-through memory
0xA000.0000- 0xDFFF.FFFF (1 GB)	Devices	shared (0.5GB) and non-shared devices (0.5GB)
0xE000.0000- 0xE00F.FFFF (1 MB)	Private Peripheral Bus (PPB)	Used by the core. Includes System Control Space, Debug
0xE010.0000- 0xFFFF.FFFF (499 MB)	Vendor Region	Vendor specific

Consider you find the following function call during reverse engineering a firmware:

0c000294:
 c000294:       b580            push    {r7, lr}
 c000296:       af00            add     r7, sp, #0
 c000298:       2180            movs    r1, #128        
 c00029a:       481d            ldr     r0, [pc, #116]   ;0x29a+2+116 = 0x310
 c00029c:       f000 ffea       bl      c001274 
[... snip ...]
 c000310:       52020800        .word   0x52020800

0x52020800 is loaded into R0, 0x128 into R1. Then a branch to c001274 happens. Without having the system address map in mind 0x52020800 could be basically everything, but if you consult the address map you see that it is within the region of on-chip peripherals. Now you have you first lead to find the purpose of that function: You can assume it is accessing any peripherals. 0x128 could be an index and 0x52020800 the offset. In fact, the example shows code which toggles a GPIO to enable a LED on a STM32L5 development board.

The Private Peripheral Bus is a core-internal bus, which connects debug ports and similar stuff to the execution unit. Registers related to the core and some core-internal peripherals like the NVIC are contained in the System Control Space (SCS).

The SCS provides registers for control, configuration and status reporting of the core. The SCS ranges from 0xE000.E000 to 0xE000.EFFF and contains configuration registers for core components like the NVIC (see 3.6.2). Within the SCS there is the System Control Block which ranges from 0xE000.ECFC to 0xE000.ED8C. Within the SCB there are registers to get the status and control system exceptions and relocate the vector table. (see 3.6)

Figure 3.3: SCS, SCB in the PPB

3.6 Exceptions and Interrupts

This chapter introduces the following concepts and terms:

Terms explained in this chapter:

Exceptions
System Exceptions
Interrupts
Faults
Vector Table
Interrupt Service Routine
Fault handlers
System handlers
Exception Masking
NVIC

If an event happens, e.g. in the core or on any peripheral, the core has to handle that event. This handling is done in exceptions. There are 3 categories of exceptions:

Interrupt: Exceptions issued by Peripherals
System Exceptions: Exceptions defined by the core, they are part of the Arm M-profile specifications.
Faults: Special kind of system exception. They also are part of the specifications but are, mostly, irrecoverable.

Additionally to an event triggered in the core or peripheral an exception can be executed by software.

The core handles exceptions using exception handlers:

Interrupt Service Routines: handle Interrupts
Fault handlers: handle faults
System handlers: handle system exceptions, which are not faults.

Each of these handlers has a different priority.

The exception handlers are organized in a vector table, which is basically an organized list of references to the first instruction of an exception handler.

A-profile cores have also a vector table, but their tables do not contain references but instructions. So a A-profile vector table is basically a list of branch instructions. The M-profile on the other hand is a list of references.

We just introduced a ton of new and important terms. Let’s quickly summarize them in a graphical overview to really get familiar:

Figure 3.4: UML summary of terms in the exception context

The vector table is an organized list of addresses of exception handlers. The vector table is organized in two sections and each vector has an associated exception number. Number 1-15 are reserved for the core and contain the fault and system handlers. Exception number 16+N are implementation defined and used for Interrupts. How many Interrupts a core can handle differs across the M-profiles. The Cortex-M33 for example can handle up to 480 Interrupts.

Each Exception has a priority, which is a signed integer. The smaller the number, the higher the priority.

Table 3.1: Exceptions 1-15 common across ArmV6-M, Armv7-M, Armv8-M
Exception	Exception Number	Priority	Description
Reset	1	-3	Taken when system starts
NMI⁵	2	-2	implementation specific.
HardFault	3	-1	last fault escalation, when no other fitted
MemManage Fault⁶	4	configurable	MMU fault, e.g .XN-Bit violation
BusFault	5	configurable	Error on the AHB bus
UseageFault	6	configurable	Exception taken when code tries to access a co-processor
Reserved	7-10	configurable
SVCall	11	configurable	SuperVisor Call: OS-call
DebugMonitor⁷	12	configurable	Debug Events, breakpoints,..
Reserved	13	configurable
PendSV	14	configurable	OS-specific
SysTick	15	configurable	System Tick Timer, Exception generated by a Timer which is on the core.

When different privilege levels are used, and an operating system is run on the microcontroller, system exception like SVCall and PendSV are important. The SVCall is a supervisor call, which lower privilege applications can use to request privileged operations or access resources governed by the operating system. To indicate what operation the application requests a SVCall has a SVC number associated with it.

Linux does not use the SVC number but rather ignore it. The actual system call requested by an application is passed into Linux syscalls using registers, the SVC number can be basically anything for a SVCall.

See chapter 6 for some details.

The NMI is an interrupt, which is not maskable (see 3.6.3). When this interrupt is triggered is implementation defined by the SoC vendor.

On some cores the location of the vector table be configured using the VTOR register, which is part of the SCB. The address on reset is implementation defined. For Armv6-M VTOR is optional.

In table 3.1 the exception number shows the position of the exception handler in the vector table. The vector table starts with the Reset vector at position 1 and position 0 is where the Main Stack Pointer (MSP) is located. The very first thing Cortex-M cores do is to load the main stack pointer to setup the stack and then execute the Reset exception handler.

Each exception handler has either a configurable or a fixed priority. The smaller the number, the higher the priority: -3 being the highest priority, which only the reset handler has. When an exception with higher priority than the current execution has is triggered, the new exception preempts the current exception. This is called nested exception.

When an exception handler for one exception is executing, it has the state Active. There are 4 types of exception states:

Active: The core has started executing the corresponding handler. A exception is active even when another exception preempted it. It is active as long as it has not yet returned from the handler.
Pending: The exception was generated, but the core has not yet started executing the corresponding handler.
Inactive: Neither pending nor active.
Active and Pending: One instance of the exception is active and a second instance of the exception is pending. This state is only available for asynchronous exceptions.

Obviously there are many configuration and implementation details associated with exceptions. There are different available priority levels, strategies how to handle exception entry, exception return, nested scenarios, and so on. All these details might be interesting but not important for reverse engineering and hacking embedded firmwares.

System exceptions (exception number 1-15) are part of the core. Interrupts are reported to the NVIC (see 3.6.2), which is a separate but closely related block to the processing unit in the core. Since system exceptions are part of the core, they are configured in the SCB. Interrupts on the other hand are configured in the NVIC.

3.6.1 System Exception and Vector Table Configuration

System exceptions and the location of the vector table are configured in the System Control Block. The most important registers to configure system exceptions are:

Clearing pending state and setting pending state of NMI, PendSV, SysTick: ICSR (Interrupt Control and State Register)
Enable of faults and reading pending status of all system exceptions: SHCSR (System Handler Control and State Register)
Offset from 0x0000.0000 where the vector table is located. VTOR

3.6.2 NVIC

M-profile architecture provides an interrupt controller, which is closely integrated to the core. The number of external interrupt lines supported depends on the architecture and how many interrupts are actually used depends on the SoC vendor, e.g. on the number of peripherals connected to the interrupt controller. The integrated interrupt controller in M-profile cores is called Nested Vectored Interrupt Controller.

Each interrupt can be enabled and disabled by software and they all have the same life cycle from active to pending and inactive, as mentioned in 3.6.

Figure 3.5: NVIC and core strucutre

Interrupt lines connect peripherals to the NVIC and the core to the NVIC. When an exception is triggered this is reported to the core, which then runs the corresponding handler.

3.6.2.1 Interrupt Configuration

Depending on the architecture a maximum number of possible exceptions is defined. For Armv8-M that is 480, then n would be n = {0..15}.

Enable or disable interrupts:

Enable or read enable state interrupts: NVIC_ISERn registers. (Interrupt Set-Enable Register)
Disable of read enable state of interrupts: NVIC_ICERn registers (Interrupt Clear-Enable Register)

Set priority:

Priorities are configured using the NVIC_IPRn registers. (Interrupt Priority Register)

Set or clear pending state and active state:

Enable or read pending state: NVIC_ISPRn (Interrupt Set-Pending Register)
Clear or read pending state: NVIC_ICPRn (Interrupt Set-Pending Register)
Read active state: NVIC_IABRn (Interrupt Active Bit Register)

3.6.3 Masking

As you might know from work, sometimes it is important to have some time for uninterrupted work. The same applies for an core executing an important task, for example something real-time critical. To have have some uninterrupted time the core can mask exceptions. Masking means that the current execution priority is boosted to a specific value, leading to the fact that all lower exceptions are not executed. There are three masking registers provided and each boosts the priority to a specific level:

PRIMASK: Set current execution priority to 0, masking all exceptions except NMI and HardFault.
FAULTMASK: Set current execution priority to -1, masking even HardFault.
BASEPRI: Configurable Priority Masking

Armv6-M has some limitations on masking, for example only PRIMASK is available.

3.7 Registers

Terms explained in this chapter:

General-Purpose Register
Special-Purpose Register
variable registers
argument registers
Link Register
Stack Pointer

3.7.1 General-purpose registers

There are 15 general-purpose 32 bit registers in all M-profile architecture cores:

R0-R12

Two of them being named:

R13: SP (Stack Pointer)
R14: LR (Link Register)

Additionally there is the Program Counter:

R15: PC (Program Counter)

Sometimes registers are referenced to by their aliases, which are:

Register	Alias
R9	SB or Static Base
R10	SL or Stack Limit
R11	FP or Frame Pointer
R12	IP or intra-procedure-call scratch
R13	SP or Stack Pointer
R14	LR or Link Register
R15	PC or Program counter

Registers can also be categorized into argument (or scratch) registers and variable registers. Argument registers are used to pass arguments between functions (more on that in chapter 6) and variable registers are used within functions to store and work with variable values.

3.7.2 Special-purpose registers

Which special-purpose registers are available depends on the M-profile version. For ArmV8-M the special-purpose registers include the registers used for masking exceptions (3.6.3). For all M-profile architecture cores the CONTROL register is also part of the special-purpose registers. The CONTROL register holds bitfields to control the privilege levels thread mode is running in, which stack pointer is used (MSP and PSP or only MSP) and other important values, depending on the core.

Special-purpose registers are accessible using to dedicated instructions, which you might encounter when reverse engineering code:

MSR: Write: Moves a value from a general-purpose register to a special-purpose register.(Move to Special register from Register).
MRS: Read: Moves the content from a special-purpose register to a general-purpose register (Move to Register from Special register).

References

[3]

Arm Ltd, “ARM architecture profiles.” 2015, [Online]. Available: https://developer.arm.com/docs/dui0471/l/key-features-of-arm-architecture-versions/arm-architecture-profiles.

[4]

Arm Ltd, “Armv8-M Architecture Reference Manual.” 2017, [Online]. Available: https://developer.arm.com/docs/ddi0553/bj/armv8-m-architecture-reference-manual.

Non-Maskable Interrupt↩︎
Except Armv6-M. This exception numer is reserved in Arv6-M.↩︎
Except Armv6-M. This exception numer is reserved in Arv6-M.↩︎