Hubbry Logo
ARM Cortex-MARM Cortex-MMain
Open search
ARM Cortex-M
Community hub
ARM Cortex-M
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
ARM Cortex-M
ARM Cortex-M
from Wikipedia
ARM Cortex-M0 and Cortex-M3 microcontroller ICs from NXP and Silicon Labs (Energy Micro)
Die from a STM32F100C4T6B IC.
24 MHz ARM Cortex-M3 microcontroller with 16 KB flash memory, 4 KB RAM. Manufactured by STMicroelectronics.

The ARM Cortex-M is a group of 32-bit RISC ARM processor cores licensed by ARM Limited. These cores are optimized for low-cost and energy-efficient integrated circuits, which have been embedded in tens of billions of consumer devices.[1] Though they are most often the main component of microcontroller chips, sometimes they are embedded inside other types of chips too. The Cortex-M family consists of Cortex-M0,[2] Cortex-M0+,[3] Cortex-M1,[4] Cortex-M3,[5] Cortex-M4,[6] Cortex-M7,[7] Cortex-M23,[8] Cortex-M33,[9] Cortex-M35P,[10] Cortex-M52,[11] Cortex-M55,[12] Cortex-M85.[13] A floating-point unit (FPU) option is available for Cortex-M4 / M7 / M33 / M35P / M52 / M55 / M85 cores, and when included in the silicon these cores are sometimes known as "Cortex-MxF", where 'x' is the core variant.

Overview

[edit]
32-bit
Year Core
2004 Cortex-M3
2007 Cortex-M1
2009 Cortex-M0
2010 Cortex-M4
2012 Cortex-M0+
2014 Cortex-M7
2016 Cortex-M23
2016 Cortex-M33
2018 Cortex-M35P
2020 Cortex-M55
2022 Cortex-M85
2023 Cortex-M52

The ARM Cortex-M family are ARM microprocessor cores that are designed for use in microcontrollers, ASICs, ASSPs, FPGAs, and SoCs. Cortex-M cores are commonly used as dedicated microcontroller chips, but also are "hidden" inside of SoC chips as power management controllers, I/O controllers, system controllers, touch screen controllers, smart battery controllers, and sensor controllers.

The main difference from Cortex-A cores is that Cortex-M cores have no memory management unit (MMU) for virtual memory, considered essential for "full-fledged" operating systems. Cortex-M programs instead run bare metal or on one of the many real-time operating systems which support a Cortex-M.

Though 8-bit microcontrollers were very popular in the past, Cortex-M has slowly been chipping away at the 8-bit market as the prices of low-end Cortex-M chips have moved downward. Cortex-M have become a popular replacements for 8-bit chips in applications that benefit from 32-bit math operations, and replacing older legacy ARM cores such as ARM7 and ARM9.

In particular, the embedded wear-leveling controller inside most SD cards or flash drives is a (8-bit) 8051 microcontroller or ARM CPU.[14]

License

[edit]

ARM Limited neither manufactures nor sells CPU devices based on its own designs, but rather licenses the processor architecture to interested parties. Arm offers a variety of licensing terms, varying in cost and deliverables. To all licensees, Arm provides an integratable hardware description of the ARM core, as well as complete software development toolset and the right to sell manufactured silicon containing the ARM CPU.

Silicon customization

[edit]

Integrated Device Manufacturers (IDM) receive the ARM Processor IP as synthesizable RTL (written in Verilog). In this form, they have the ability to perform architectural level optimizations and extensions. This allows the manufacturer to achieve custom design goals, such as higher clock speed, very low power consumption, instruction set extensions (including floating point), optimizations for size, debug support, etc. To determine which components have been included in a particular ARM CPU chip, consult the manufacturer datasheet and related documentation.

Some of the silicon options for the Cortex-M cores are:

  • SysTick timer: A 24-bit system timer that extends the functionality of both the processor and the Nested Vectored Interrupt Controller (NVIC). When present, it also provides an additional configurable priority SysTick interrupt.[15][16][17] Though the SysTick timer is optional for the M0/M0+/M1/M23, it is extremely rare to find a Cortex-M microcontroller without it. If a Cortex-M33/M35P/M52/M55/M85 microcontroller has the Security Extension option, then it optionally can have two SysTicks (one Secure, one Non-secure).
  • Bit-Band: Maps a complete word of memory onto a single bit in the bit-band region. For example, writing to an alias word will set or clear the corresponding bit in the bit-band region. This allows every individual bit in the bit-band region to be directly accessible from a word-aligned address. In particular, individual bits can be set, cleared, or toggled from C/C++ without performing a read-modify-write sequence of instructions.[15][16][17] Though the bit-band is optional, it is less common to find a Cortex-M3 and Cortex-M4 microcontroller without it. Some Cortex-M0 and Cortex-M0+ microcontrollers have bit-band.
  • Memory Protection Unit (MPU): Provides support for protecting regions of memory through enforcing privilege and access rules. It supports up to sixteen different regions, each of which can be split further into equal-size sub-regions.[15][16][17]
  • Tightly-Coupled Memory (TCM): Low-latency (zero wait state) SRAM that can be used to hold the call stack, RTOS control structures, interrupt data structures, interrupt handler code, and speed critical code. Other than CPU cache, TCM is the fastest memory in an ARM Cortex-M microcontroller. Since TCM isn't cached and accessible at the same speed as the processor and cache, it could be conceptually described as "addressable cache". There is an ITCM (Instruction TCM) and a DTCM (Data TCM) to allow a Harvard architecture processor to read from both simultaneously. The DTCM can't contain any instructions, but the ITCM can contain data. Since TCM is tightly connected to the processor core, DMA engines might not be able to access TCM on some implementations.
ARM Cortex-M optional components
ARM Core Cortex
M0[18]
Cortex
M0+[19]
Cortex
M1[20]
Cortex
M3[21]
Cortex
M4[22]
Cortex
M7[23]
Cortex
M23[24]
Cortex
M33[25]
Cortex
M35P[10]
Cortex
M52[26]
Cortex
M55[27]
Cortex
M85[28]
SysTick 24-bit Timer Optional
(0,1)
Optional
(0, 1)
Optional
(0,1)
Yes
(1)
Yes
(1)
Yes
(1)
Optional
(0, 1, 2)
Yes
(1, 2)
Yes
(1, 2)
Yes
(1, 2)
Yes
(1, 2)
Yes
(1, 2)
Single-cycle I/O port No Optional No No No No Optional No No No No No
Bit-Band memory No[29] No[29] No* Optional Optional Optional No No No No No No
Memory Protection
Unit (MPU)
No Optional
(0, 8)
No Optional
(0,8)
Optional
(0, 8)
Optional
(0, 8, 16)
Optional
(0, 4, 8, 12, 16)
Optional
(0, 4, 8, 12, 16)
Optional
(up to 16)*
Optional
(0, 4, 8, 12, 16)
Optional
(0, 4, 8, 12, 16)
Optional
(0, 4, 8, 12, 16)
Security Attribution
Unit (SAU) and
Stack Limits
No No No No No No Optional
(0, 4, 8)
Optional
(0, 4, 8)
Optional
(up to 8)*
Optional
(0, 4, 8)
Optional
(0, 4, 8)
Optional
(0, 4, 8)
Instruction Cache No[30] No[30] No[30] No[30] No[30] Optional
(up to 64 KB)
No No Optional
(up to 16 KB)
Optional
(up to 64 KB)
Optional
(up to 64 KB)
Optional
(up to 64 KB)
Data Cache No[30] No[30] No[30] No[30] No[30] Optional
(up to 64 KB)
No No No Optional
(up to 64 KB)
Optional
(up to 64 KB)
Optional
(up to 64 KB)
Instruction TCM
(ITCM) Memory
No No Optional
(up to 1 MB)
No No Optional
(up to 16 MB)
No No No Optional
(up to 16 MB)
Optional
(up to 16 MB)
Optional
(up to 16 MB)
Data TCM
(DTCM) Memory
No No Optional
(up to 1 MB)
No No Optional
(up to 16 MB)
No No No Optional
(up to 16 MB)
Optional
(up to 16 MB)
Optional
(up to 16 MB)
ECC for TCM
and Cache
No No No No No No No No Optional Optional Optional Optional
Vector Table Offset
Register (VTOR)
No Optional
(0,1)
Optional
(0,1)
Optional
(0,1)
Optional
(0,1)
Optional
(0,1)
Optional
(0,1,2)
Yes
(1,2)
Yes
(1,2)
Yes
(1,2)
Yes
(1,2)
Yes
(1,2)
  • Note: Most Cortex-M3 and M4 chips have bit-band and MPU. The bit-band option can be added to the M0/M0+ using the Cortex-M System Design Kit.[29]
  • Note: Software should validate the existence of each feature before attempting to use it.[17]
  • Note: Limited public information is available for the Cortex-M35P until its Technical Reference Manual is released.

Additional silicon options:[15][16]

  • Data endianness: Little-endian or big-endian. Unlike legacy ARM cores, the Cortex-M is permanently fixed in silicon as one of these choices.
  • Interrupts: 1 to 32 (M0/M0+/M1), 1 to 240 (M3/M4/M7/M23), 1 to 480 (M33/M35P/M52/M55/M85).
  • Wake-up interrupt controller: Optional.
  • Vector Table Offset Register: Optional. (not available for M0).
  • Instruction fetch width: 16-bit only, or mostly 32-bit.
  • User/privilege support: Optional.
  • Reset all registers: Optional.
  • Single-cycle I/O port: Optional. (M0+/M23).
  • Debug Access Port (DAP): None, SWD, JTAG and SWD. (optional for all Cortex-M cores)
  • Halting debug support: Optional.
  • Number of watchpoint comparators: 0 to 2 (M0/M0+/M1), 0 to 4 (M3/M4/M7/M23/M33/M35P/M52/M55/M85).
  • Number of breakpoint comparators: 0 to 4 (M0/M0+/M1/M23), 0 to 8 (M3/M4/M7/M33/M35P/M52/M55/M85).

Instruction sets

[edit]

The Cortex-M0 / M0+ / M1 implement the ARMv6-M architecture,[15] the Cortex-M3 implements the ARMv7-M architecture,[16] the Cortex-M4 / Cortex-M7 implements the ARMv7E-M architecture,[16] the Cortex-M23 / M33 / M35P implement the ARMv8-M architecture,[31] and the Cortex-M52 / M55 / M85 implements the ARMv8.1-M architecture.[31] The architectures are binary instruction upward compatible from ARMv6-M to ARMv7-M to ARMv7E-M. Binary instructions available for the Cortex-M0 / Cortex-M0+ / Cortex-M1 can execute without modification on the Cortex-M3 / Cortex-M4 / Cortex-M7. Binary instructions available for the Cortex-M3 can execute without modification on the Cortex-M4 / Cortex-M7 / Cortex-M33 / Cortex-M35P.[15][16] Only Thumb-1 and Thumb-2 instruction sets are supported in Cortex-M architectures; the legacy 32-bit ARM instruction set isn't supported.

All Cortex-M cores implement a common subset of instructions that consists of most Thumb-1, some Thumb-2, including a 32-bit result multiply. The Cortex-M0 / Cortex-M0+ / Cortex-M1 / Cortex-M23 were designed to create the smallest silicon die, thus having the fewest instructions of the Cortex-M family.

The Cortex-M0 / M0+ / M1 include Thumb-1 instructions, except new instructions (CBZ, CBNZ, IT) which were added in ARMv7-M architecture. The Cortex-M0 / M0+ / M1 include a minor subset of Thumb-2 instructions (BL, DMB, DSB, ISB, MRS, MSR).[15] The Cortex-M3 / M4 / M7 / M33 / M35P have all base Thumb-1 and Thumb-2 instructions. The Cortex-M3 adds three Thumb-1 instructions, all Thumb-2 instructions, hardware integer divide, and saturation arithmetic instructions. The Cortex-M4 adds DSP instructions and an optional single-precision floating-point unit (VFPv4-SP). The Cortex-M7 adds an optional double-precision FPU (VFPv5).[23][16] The Cortex-M23 / M33 / M35P / M52 / M55 / M85 add TrustZone instructions.

ARM Cortex-M instruction variations
Arm Core Cortex
M0[18]
Cortex
M0+[19]
Cortex
M1[20]
Cortex
M3[21]
Cortex
M4[22]
Cortex
M7[23]
Cortex
M23[24]
Cortex
M33[25]
Cortex
M35P
Cortex
M52[26]
Cortex
M55[27]
Cortex
M85[28]
ARM architecture ARMv6-M
[15]
ARMv6-M
[15]
ARMv6-M
[15]
ARMv7-M
[16]
ARMv7E-M
[16]
ARMv7E-M
[16]
ARMv8-M
Baseline[31]
ARMv8-M
Mainline[31]
ARMv8-M
Mainline[31]
Armv8.1-M
Mainline[31]
Armv8.1-M
Mainline[31]
Armv8.1-M
Mainline[31]
Computer architecture Von
Neumann
Von
Neumann
Von
Neumann
Harvard Harvard Harvard Von
Neumann
Harvard Harvard Harvard Harvard Harvard
Instruction pipeline 3 stages 2 stages 3 stages 3 stages 3 stages 6 stages 2 stages 3 stages 3 stages 4 stages 4-5 stages 7 stages
Interrupt latency
(zero wait state memory)
16 cycles 15 cycles 23 for NMI,
26 for IRQ
12 cycles 12 cycles 12 cycles,
14 worst
case
15 cycles,
24 secure
to NS IRQ
12 cycles,
21 secure
to NS IRQ
TBD TBD TBD TBD
Thumb-1 instructions Most Most Most Entire Entire Entire Most Entire Entire Entire Entire Entire
Thumb-2 instructions Some Some Some Entire Entire Entire Some Entire Entire Entire Entire Entire
Multiply instructions
32×32 = 32-bit result
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Multiply instructions
32×32 = 64-bit result
No No No Yes Yes Yes No Yes Yes Yes Yes Yes
Divide instructions
32/32 = 32-bit quotient
No No No Yes Yes Yes Yes Yes Yes Yes Yes Yes
Saturated math instructions No No No Some Yes Yes No Yes Yes Yes Yes Yes
DSP instructions No No No No Yes Yes No Optional Optional Yes Yes Yes
Half-Precision (HP)
floating-point instructions
No No No No No No No No No Optional Optional Optional
Single-Precision (SP)
floating-point instructions
No No No No Optional Optional No Optional Optional Optional Optional Optional
Double-Precision (DP)
floating-point instructions
No No No No No Optional No No No Optional Optional Optional
Helium vector instructions No No No No No No No No No Optional Optional Optional
TrustZone security instructions No No No No No No Optional Optional Optional Optional Optional Yes
Co-processor instructions No No No No No No No Optional Optional Optional Optional Optional
ARM Custom Instructions (ACI) No No No No No No No Optional No Optional Optional Optional
Pointer Authentication and Branch Target
Identification (PACBTI) instructions
No No No No No No No No No Optional No Optional
  • Note: Interrupt latency cycle count assumes: 1) stack located in zero-wait state RAM, 2) another interrupt function not currently executing, 3) Security Extension option doesn't exist, because it adds additional cycles. The Cortex-M cores with a Harvard computer architecture have a shorter interrupt latency than Cortex-M cores with a Von Neumann computer architecture.
  • Note: The Cortex-M series includes three new 16-bit Thumb-1 instructions for sleep mode: SEV, WFE, WFI.
  • Note: The Cortex-M0 / M0+ / M1 doesn't include these 16-bit Thumb-1 instructions: CBZ, CBNZ, IT.[15][16]
  • Note: The Cortex-M0 / M0+ / M1 only include these 32-bit Thumb-2 instructions: BL, DMB, DSB, ISB, MRS, MSR.[15][16]
  • Note: The Cortex-M0 / M0+ / M1 / M23 only has 32-bit multiply instructions with a lower-32-bit result (32 bit × 32 bit = lower 32 bit), where as the Cortex-M3 / M4 / M7 / M33 / M35P includes additional 32-bit multiply instructions with 64-bit results (32 bit × 32 bit = 64 bit). The Cortex-M4 / M7 (optionally M33 / M35P) include DSP instructions for (16 bit × 16 bit = 32 bit), (32 bit × 16 bit = upper 32 bit), (32 bit × 32 bit = upper 32 bit) multiplications.[15][16]
  • Note: The number of cycles to complete multiply and divide instructions vary across ARM Cortex-M core designs. Some cores have a silicon option for the choice of fast speed or small size (slow speed), so cores have the option of using less silicon with the downside of higher cycle count. An interrupt occurring during the execution of a divide instruction or slow-iterative multiply instruction will cause the processor to abandon the instruction, then restart it after the interrupt returns.
    • Multiply instructions "32-bit result" – Cortex-M0/M0+/M23 is 1 or 32 cycle silicon option, Cortex-M1 is 3 or 33 cycle silicon option, Cortex-M3/M4/M7/M33/M35P is 1 cycle.
    • Multiply instructions "64-bit result" – Cortex-M3 is 3–5 cycles (depending on values), Cortex-M4/M7/M33/M35P is 1 cycle.
    • Divide instructions – Cortex-M3/M4 is 2–12 cycles (depending on values), Cortex-M7 is 3–20 cycles (depending on values), Cortex-M23 is 17 or 34 cycle option, Cortex-M33 is 2–11 cycles (depending on values), Cortex-M35P is TBD.
  • Note: Some Cortex-M cores have silicon options for various types of floating point units (FPU). The Cortex-M55 / M85 has an option for half-precision (HP), the Cortex-M4 / M7 / M33 / M35P / M52 / M55 / M85 has an option for single-precision (SP), the Cortex-M7 / M52 / M55 / M85 has an option for double-precision (DP). When an FPU is included, the core is sometimes referred as "Cortex-MxF", where 'x' is the core variant, such as Cortex-M4F.[15][16]
ARM Cortex-M instruction groups
Group Instr
bits
Instructions Cortex
M0, M0+, M1
Cortex
M3
Cortex
M4
Cortex
M7
Cortex
M23
Cortex
M33
Cortex
M35P
Cortex
M52
Cortex
M55
Cortex
M85
Thumb-1 16 ADC, ADD, ADR, AND, ASR, B, BIC, BKPT, BLX, BX, CMN, CMP, CPS, EOR, LDM, LDR, LDRB, LDRH, LDRSB, LDRSH, LSL, LSR, MOV, MUL, MVN, NOP, ORR, POP, PUSH, REV, REV16, REVSH, ROR, RSB, SBC, SEV, STM, STR, STRB, STRH, SUB, SVC, SXTB, SXTH, TST, UXTB, UXTH, WFE, WFI, YIELD Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Thumb-1 16 CBNZ, CBZ No Yes Yes Yes Yes Yes Yes Yes Yes Yes
Thumb-1 16 IT No Yes Yes Yes No Yes Yes Yes Yes Yes
Thumb-2 32 BL, DMB, DSB, ISB, MRS, MSR Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Thumb-2 32 SDIV, UDIV, MOVT, MOVW, B.W, LDREX, LDREXB, LDREXH, STREX, STREXB, STREXH No Yes Yes Yes Yes Yes Yes Yes Yes Yes
Thumb-2 32 ADC, ADD, ADR, AND, ASR, B, BFC, BFI, BIC, CDP, CLREX, CLZ, CMN, CMP, DBG, EOR, LDC, LDM, LDR, LDRB, LDRBT, LDRD, LDRH, LDRHT, LDRSB, LDRSBT, LDRSH, LDRSHT, LDRT, LSL, LSR, MCR, MCRR, MLA, MLS, MRC, MRRC, MUL, MVN, NOP, ORN, ORR, PLD, PLDW, PLI, POP, PUSH, RBIT, REV, REV16, REVSH, ROR, RRX, RSB, SBC, SBFX, SEV, SMLAL, SMULL, SSAT, STC, STM, STR, STRB, STRBT, STRD, STRH, STRHT, STRT, SUB, SXTB, SXTH, TBB, TBH, TEQ, TST, UBFX, UMLAL, UMULL, USAT, UXTB, UXTH, WFE, WFI, YIELD No Yes Yes Yes No Yes Yes Yes Yes Yes
DSP 32 PKH, QADD, QADD16, QADD8, QASX, QDADD, QDSUB, QSAX, QSUB, QSUB16, QSUB8, SADD16, SADD8, SASX, SEL, SHADD16, SHADD8, SHASX, SHSAX, SHSUB16, SHSUB8, SMLABB, SMLABT, SMLATB, SMLATT, SMLAD, SMLALBB, SMLALBT, SMLALTB, SMLALTT, SMLALD, SMLAWB, SMLAWT, SMLSD, SMLSLD, SMMLA, SMMLS, SMMUL, SMUAD, SMULBB, SMULBT, SMULTT, SMULTB, SMULWT, SMULWB, SMUSD, SSAT16, SSAX, SSUB16, SSUB8, SXTAB, SXTAB16, SXTAH, SXTB16, UADD16, UADD8, UASX, UHADD16, UHADD8, UHASX, UHSAX, UHSUB16, UHSUB8, UMAAL, UQADD16, UQADD8, UQASX, UQSAX, UQSUB16, UQSUB8, USAD8, USADA8, USAT16, USAX, USUB16, USUB8, UXTAB, UXTAB16, UXTAH, UXTB16 No No Yes Yes No Optional Optional Yes Yes Yes
SP Float 32 VABS, VADD, VCMP, VCMPE, VCVT, VCVTR, VDIV, VLDM, VLDR, VMLA, VMLS, VMOV, VMRS, VMSR, VMUL, VNEG, VNMLA, VNMLS, VNMUL, VPOP, VPUSH, VSQRT, VSTM, VSTR, VSUB No No Optional Optional No Optional Optional Optional Optional Optional
DP Float 32 VCVTA, VCVTM, VCVTN, VCVTP, VMAXNM, VMINNM, VRINTA, VRINTM, VRINTN, VRINTP, VRINTR, VRINTX, VRINTZ, VSEL No No No Optional No No No Optional Optional Optional
Acquire/Release 32 LDA, LDAB, LDAH, LDAEX, LDAEXB, LDAEXH, STL, STLB, STLH, STLEX, STLEXB, STLEXH No No No No Yes Yes Yes Yes Yes Yes
TrustZone 16 BLXNS, BXNS No No No No Optional Optional Optional Optional Optional Yes
32 SG, TT, TTT, TTA, TTAT
Co-processor 16 CDP, CDP2, MCR, MCR2, MCRR, MCRR2, MRC, MRC2, MRRC, MRRC2 No No No No No Optional Optional Optional Optional Optional
ACI 32 CX1, CX1A, CX2, CX2A, CX3, CX3A, CX1D, CX1DA, CX2D, CX2DA, CX3D, CX3DA, VCX1, VCX1A, VCX2, VCX2A, VCX3, VCX3A No No No No No Optional No Optional Optional Optional
PACBTI 32 AUT, AUTG, BTI, BXAUT, PAC, PACBTI, PACG No No No No No No No Optional No Optional
  • Note: MOVW is an alias that means 32-bit "wide" MOV instruction.
  • Note: B.W is a long-distance unconditional branch (similar in encoding, operation, and range to BL, minus setting of the LR register).
  • Note: For Cortex-M1, WFE / WFI / SEV instructions exist, but execute as a NOP instruction.
  • Note: The half-precision (HP) FPU instructions are valid in the Cortex-M52 / M55 / M85 only when the HP FPU option exists in the silicon.
  • Note: The single-precision (SP) FPU instructions are valid in the Cortex-M4 / M7 / M33 / M35P / M52 / M55 / M85 only when the SP FPU option exists in the silicon.
  • Note: The double-precision (DP) FPU instructions are valid in the Cortex-M7 / M52 / M55 / M85 only when the DP FPU option exists in the silicon.

Deprecations

[edit]

The ARM architecture for ARM Cortex-M series removed some features from older legacy cores:[15][16]

  • The 32-bit ARM instruction set is not included in Cortex-M cores.
  • Endianness is chosen at silicon implementation in Cortex-M cores. Legacy cores allowed "on-the-fly" changing of the data endian mode.
  • Co-processors were not supported on Cortex-M cores, until the silicon option was reintroduced in "ARMv8-M Mainline" for ARM Cortex-M33/M35P cores.

The capabilities of the 32-bit ARM instruction set is duplicated in many ways by the Thumb-1 and Thumb-2 instruction sets, but some ARM features don't have a similar feature:

  • The SWP and SWPB (swap) ARM instructions don't have a similar feature in Cortex-M.

The 16-bit Thumb-1 instruction set has evolved over time since it was first released in the legacy ARM7T cores with the ARMv4T architecture. New Thumb-1 instructions were added as each legacy ARMv5 / ARMv6 / ARMv6T2 architectures were released. Some 16-bit Thumb-1 instructions were removed from the Cortex-M cores:

  • The "BLX <immediate>" instruction doesn't exist because it was used to switch from Thumb-1 to ARM instruction set. The "BLX <register>" instruction is still available in the Cortex-M.
  • SETEND doesn't exist because on-the-fly switching of data endian mode is no longer supported.
  • Co-processor instructions were not supported on Cortex-M cores, until the silicon option was reintroduced in "ARMv8-M Mainline" for ARM Cortex-M33/M35P cores.
  • The SWI instruction was renamed to SVC, though the instruction binary coding is the same. However, the SVC handler code is different from the SWI handler code, because of changes to the exception models.

Cortex-M0

[edit]
Cortex-M0
Architecture and classification
Instruction setARMv6-M (Thumb-1 (most),
Thumb-2 (some))

The Cortex-M0 core is optimized for small silicon die size and use in the lowest price chips.[2]

Key features of the Cortex-M0 core are:[18]

  • ARMv6-M architecture[15]
  • 3-stage pipeline
  • Instruction sets:
    • Thumb-1 (most), missing CBZ, CBNZ, IT
    • Thumb-2 (some), only BL, DMB, DSB, ISB, MRS, MSR
    • 32-bit hardware integer multiply with 32-bit result
  • 1 to 32 interrupts, plus NMI

Silicon options:

  • Hardware integer multiply speed: 1 or 32 cycles.

Chips

[edit]
nRF51822

The following microcontrollers are based on the Cortex-M0 core:

The following chips have a Cortex-M0 as a secondary core:

  • NXP LPC4300 (one Cortex-M4F + one Cortex-M0)
  • Texas Instruments SimpleLink Wireless MCUs CC1310 and CC2650 (one programmable Cortex-M3 + one Cortex-M0 network processor + one proprietary Sensor Controller Engine)

Cortex-M0+

[edit]
Cortex-M0+
Architecture and classification
MicroarchitectureARMv6-M
Instruction setThumb-1 (most),
Thumb-2 (some)
NXP (Freescale) FRDM-KL25Z Board with KL25Z128VLK (Kinetis L)

The Cortex-M0+ is an optimized superset of the Cortex-M0. The Cortex-M0+ has complete instruction set compatibility with the Cortex-M0 thus allowing the use of the same compiler and debug tools. The Cortex-M0+ pipeline was reduced from 3 to 2 stages, which lowers the power usage and increases performance (higher average IPC due to branches taking one fewer cycle). In addition to debug features in the existing Cortex-M0, a silicon option can be added to the Cortex-M0+ called the Micro Trace Buffer (MTB) which provides a simple instruction trace buffer. The Cortex-M0+ also received Cortex-M3 and Cortex-M4 features, which can be added as silicon options, such as the memory protection unit (MPU) and the vector table relocation.[19]

Key features of the Cortex-M0+ core are:[19]

  • ARMv6-M architecture[15]
  • 2-stage pipeline (one fewer than Cortex-M0)
  • Instruction sets: (same as Cortex-M0)
    • Thumb-1 (most), missing CBZ, CBNZ, IT
    • Thumb-2 (some), only BL, DMB, DSB, ISB, MRS, MSR
    • 32-bit hardware integer multiply with 32-bit result
  • 1 to 32 interrupts, plus NMI

Silicon options:

  • Hardware integer multiply speed: 1 or 32 cycles
  • 8-region memory protection unit (MPU) (same as M3 and M4)
  • Vector table relocation (same as M3, M4)
  • Single-cycle I/O port (available in M0+/M23)
  • Micro Trace Buffer (MTB) (available in M0+/M23/M33/M35P)

Chips

[edit]

The following microcontrollers are based on the Cortex-M0+ core:

The following chips have a Cortex-M0+ as a secondary core:

  • Cypress PSoC 6200 (one Cortex-M4F + one Cortex-M0+)
  • ST WB (one Cortex-M4F + one Cortex-M0+)

The smallest ARM microcontrollers are of the Cortex-M0+ type (as of 2014, smallest at 1.6 mm by 2 mm in a chip-scale package is Kinetis KL03).[33]

On 21 June 2018, the "world's smallest computer'", or computer device was announced – based on the ARM Cortex-M0+ (and including RAM and wireless transmitters and receivers based on photovoltaics) – by University of Michigan researchers at the 2018 Symposia on VLSI Technology and Circuits with the paper "A 0.04mm3 16nW Wireless and Batteryless Sensor System with Integrated Cortex-M0+ Processor and Optical Communication for Cellular Temperature Measurement." The device is one-tenth the size of IBM's previously claimed world-record-sized computer from months back in March 2018, which is smaller than a grain of salt.

Cortex-M1

[edit]
Cortex-M1
Architecture and classification
MicroarchitectureARMv6-M
Instruction setThumb-1 (most),
Thumb-2 (some)

The Cortex-M1 is an optimized core especially designed to be loaded into FPGA chips.[4]

Key features of the Cortex-M1 core are:[20]

  • ARMv6-M architecture[15]
  • 3-stage pipeline.
  • Instruction sets:
    • Thumb-1 (most), missing CBZ, CBNZ, IT.
    • Thumb-2 (some), only BL, DMB, DSB, ISB, MRS, MSR.
    • 32-bit hardware integer multiply with 32-bit result.
  • 1 to 32 interrupts, plus NMI.

Silicon options:

  • Hardware integer multiply speed: 3 or 33 cycles.
  • Optional Tightly-Coupled Memory (TCM): 0 to 1 MB instruction-TCM, 0 to 1 MB data-TCM, each with optional ECC.
  • External interrupts: 0, 1, 8, 16, 32.
  • Debug: none, reduced, full.
  • Data endianness: little-endian or BE-8 big-endian.
  • OS extension: present or absent.

Chips

[edit]

The following vendors support the Cortex-M1 as soft-cores on their FPGA chips:

Cortex-M3

[edit]
Cortex-M3
Architecture and classification
MicroarchitectureARMv7-M
Instruction setThumb-1, Thumb-2,
Saturated (some), Divide
Arduino Due board with Atmel ATSAM3X8E (ARM Cortex-M3 core) microcontroller
NXP LPCXpresso Development Board with LPC1343

Key features of the Cortex-M3 core are:[21][36]

  • ARMv7-M architecture[16]
  • 3-stage pipeline with branch speculation.
  • Instruction sets:
    • Thumb-1 (entire).
    • Thumb-2 (entire).
    • 32-bit hardware integer multiply with 32-bit or 64-bit result, signed or unsigned, add or subtract after the multiply. 32-bit multiply is 1 cycle, but 64-bit multiply and MAC instructions require extra cycles.
    • 32-bit hardware integer divide (2–12 cycles).
    • saturation arithmetic support.
  • 1 to 240 interrupts, plus NMI.
  • 12 cycle interrupt latency.
  • Integrated sleep modes.

Silicon options:

  • Optional Memory Protection Unit (MPU): 0 or 8 regions.

Chips

[edit]

The following microcontrollers are based on the Cortex-M3 core:

The following chips have a Cortex-M3 as a secondary core:

The following FPGAs include a Cortex-M3 core:

The following vendors support the Cortex-M3 as soft-cores on their FPGA chips:

  • Altera Cyclone-II, Cyclone-III, Stratix-II, Stratix-III
  • Xilinx Spartan-3, Virtex-2, Virtex-3, Virtex-4, Artix-7[38]

Cortex-M4

[edit]
Cortex-M4
Architecture and classification
MicroarchitectureARMv7E-M
Instruction setThumb-1, Thumb-2,
Saturated, DSP,
Divide, FPU (SP)
Silicon Labs (Energy Micro) Wonder Gecko STK Board with EFM32WG990
TI Stellaris Launchpad Board with LM4F120

Conceptually the Cortex-M4 is a Cortex-M3 plus DSP instructions, and optional floating-point unit (FPU). A core with an FPU is known as Cortex-M4F.

Key features of the Cortex-M4 core are:[22]

  • ARMv7E-M architecture[16]
  • 3-stage pipeline with branch speculation.
  • Instruction sets:
    • Thumb-1 (entire).
    • Thumb-2 (entire).
    • 32-bit hardware integer multiply with 32-bit or 64-bit result, signed or unsigned, add or subtract after the multiply. 32-bit Multiply and MAC are 1 cycle.
    • 32-bit hardware integer divide (2–12 cycles).
    • Saturation arithmetic support.
    • DSP extension: Single cycle 16/32-bit MAC, single cycle dual 16-bit MAC, 8/16-bit SIMD arithmetic.
  • 1 to 240 interrupts, plus NMI.
  • 12 cycle interrupt latency.
  • Integrated sleep modes.

Silicon options:

  • Optional floating-point unit (FPU): single-precision only IEEE-754 compliant. It is called the FPv4-SP extension.
  • Optional memory protection unit (MPU): 0 or 8 regions.

Chips

[edit]
nRF52833 on a micro bit v2
STM32F407IGH6

The following microcontrollers are based on the Cortex-M4 core:

The following microcontrollers are based on the Cortex-M4F (M4 + FPU) core:

The following chips have either a Cortex-M4 or M4F as a secondary core:

Cortex-M7

[edit]
Cortex-M7
Architecture and classification
MicroarchitectureARMv7E-M
Instruction setThumb-1, Thumb-2,
Saturated, DSP,
Divide, FPU (SP & DP)
Arduino GIGA R1 WiFi board with (dual core ARM Cortex-M7 + ARM Cortex-M4) STM32H747XIH6 microcontroller

The Cortex-M7 is a high-performance core with almost double the power efficiency of the older Cortex-M4.[7] It features a 6-stage superscalar pipeline with branch prediction and an optional floating-point unit capable of single-precision and optionally double-precision operations.[7][39] The instruction and data buses have been enlarged to 64-bit wide over the previous 32-bit buses. If a core contains an FPU, it is known as a Cortex-M7F, otherwise it is a Cortex-M7.

Key features of the Cortex-M7 core are:[23]

  • ARMv7E-M architecture.
  • 6-stage pipeline with branch speculation. Second-longest of all ARM Cortex-M cores, with the first being Cortex-M85.
  • Instruction sets:
    • Thumb-1 (entire).
    • Thumb-2 (entire).
    • 32-bit hardware integer multiply with 32-bit or 64-bit result, signed or unsigned, add or subtract after the multiply. 32-bit Multiply and MAC are 1 cycle.
    • 32-bit hardware integer divide (2–12 cycles).
    • Saturation arithmetic support.
    • DSP extension: Single cycle 16/32-bit MAC, single cycle dual 16-bit MAC, 8/16-bit SIMD arithmetic.
  • 1 to 240 interrupts, plus NMI.
  • 12 cycle interrupt latency.
  • Integrated sleep modes.

Silicon options:

  • Optional floating-point unit (FPU): (single precision) or (single and double-precision), both IEEE-754-2008 compliant. It is called the FPv5 extension.
  • Optional CPU cache: 0 to 64 KB instruction-cache, 0 to 64 KB data-cache, each with optional ECC.
  • Optional Tightly-Coupled Memory (TCM): 0 to 16 MB instruction-TCM, 0 to 16 MB data-TCM, each with optional ECC.
  • Optional Memory Protection Unit (MPU): 8 or 16 regions.
  • Optional Embedded Trace Macrocell (ETM): instruction-only, or instruction and data.
  • Optional Retention Mode (with Arm Power Management Kit) for Sleep Modes.
  • Optional dual-redundant lock-step operation.

Chips

[edit]

The following microcontrollers are based on the Cortex-M7 core:

Cortex-M23

[edit]
Cortex-M23
Architecture and classification
MicroarchitectureARMv8-M Baseline
Instruction setThumb-1 (most),
Thumb-2 (some),
Divide, TrustZone

The Cortex-M23 core was announced in October 2016[40] and based on the ARMv8-M architecture that was previously announced in November 2015.[41] Conceptually the Cortex-M23 is similar to a Cortex-M0+ plus integer divide instructions and TrustZone security features, and also has a 2-stage instruction pipeline.[8]

Key features of the Cortex-M23 core are:[24][40]

  • ARMv8-M Baseline architecture.[31]
  • 2-stage pipeline. (similar to Cortex-M0+)
  • TrustZone security instructions.
  • 32-bit hardware integer divide (17 or 34 cycles).(slower than divide in all other cores)
  • Stack limit boundaries. (available only with SAU option)

Silicon options:

  • Hardware integer multiply speed: 1 or 32 cycles.
  • Hardware integer divide speed: 17 or 34 cycles maximum. Depending on divisor, instruction may complete in fewer cycles.
  • Optional Memory Protection Unit (MPU): 0, 4, 8, 12, 16 regions.
  • Optional Security Attribution Unit (SAU): 0, 4, 8 regions.
  • Single-cycle I/O port (available in M0+/M23).
  • Micro Trace Buffer (MTB)

Chips

[edit]

The following microcontrollers are based on the Cortex-M23 core:

Cortex-M33

[edit]
Cortex-M33
Architecture and classification
MicroarchitectureARMv8-M Mainline
Instruction setThumb-1, Thumb-2,
Saturated, DSP,
Divide, FPU (SP),
TrustZone, Co-processor

The Cortex-M33 core was announced in October 2016[40] and based on the ARMv8-M architecture that was previously announced in November 2015.[41] Conceptually the Cortex-M33 is similar to a cross of Cortex-M4 and Cortex-M23, and also has a 3-stage instruction pipeline.[9]

Key features of the Cortex-M33 core are:[25][40]

  • ARMv8-M Mainline architecture.[31]
  • 3-stage pipeline.
  • TrustZone security instructions.
  • 32-bit hardware integer divide (11 cycles maximum).
  • Stack limit boundaries. (available only with SAU option)

Silicon options:

  • Optional Floating-Point Unit (FPU): single-precision only IEEE-754 compliant. It is called the FPv5 extension.
  • Optional Memory Protection Unit (MPU): 0, 4, 8, 12, 16 regions.
  • Optional Security Attribution Unit (SAU): 0, 4, 8 regions.
  • Micro Trace Buffer (MTB)

Chips

[edit]

The following microcontrollers are based on the Cortex-M33 core:

The following chips have a Cortex-M33 or M33F as a secondary core:

Cortex-M35P

[edit]
Cortex-M35P
Architecture and classification
MicroarchitectureARMv8-M Mainline
Instruction setThumb-1, Thumb-2,
Saturated, DSP,
Divide, FPU (SP),
TrustZone, Co-processor

The Cortex-M35P core was announced in May 2018 and based on the Armv8-M architecture. It is conceptually a Cortex-M33 core with a new instruction cache, plus new tamper-resistant hardware concepts borrowed from the ARM SecurCore family, and configurable parity and ECC features.[10]

Currently, information about the Cortex-M35P is limited, because its Technical Reference Manual and Generic User Guide haven't been released yet.

Chips

[edit]

The following microcontrollers are based on the Cortex-M35P core:

Cortex-M52

[edit]
Cortex-M52
Architecture and classification
MicroarchitectureARMv8.1-M Mainline Helium
Instruction setThumb-1, Thumb-2,
Saturated, DSP,
Divide, FPU (VFPv5),
TrustZone, Coprocessor, MVE

The Cortex-M52 core was announced in November 2023 and based on the Armv8.1-M architecture. Conceptually, it can be seen as a cross between the Cortex-M33 and the Cortex-M55. Key differences are that its Helium co-processor is single beat (the M55 is dual beat), and it has a 32-bit main bus similar to the M33 to ease transition of applications. It has a 4 stage instruction pipeline.[11]

Key features of the Cortex-M52 core include:

  • ARMv8.1-M Mainline/Helium architecture.[31]
  • 4-stage pipeline.
  • Stack limit boundaries (available only with SAU option).
  • 32-bit main bus (AHB or AXI)[11]

Silicon options:

  • Helium (M-Profile Vector Extension, MVE)
  • Pointer Authentication and Branch Target Identification Extension
  • Single-Precision and Double-Precision floating-point
  • Digital Signal Processing (DSP) extension support
  • TrustZone security extension support
  • Safety and reliability (RAS) support
  • Coprocessor support
  • Secure and Non-secure MPU with 0, 4, 8, 12, or 16 regions
  • SAU with 0, 4, or 8 regions
  • Instruction cache with size of up to 64 KB
  • Data cache with size of up to 64 KB
  • ECC on caches and TCMs
  • 1–480 interrupts
  • 3–8 exception priority bits
  • Internal and external WIC options, optional CTI, ITM, and DWT
  • ARM Custom Instructions

Chips

[edit]

The following microcontrollers are based on the Cortex M52 core

  • Geehy Semiconductor G32R5[43]

Cortex-M55

[edit]
Cortex-M55
Architecture and classification
MicroarchitectureARMv8.1-M Mainline Helium
Instruction setThumb-1, Thumb-2,
Saturated, DSP,
Divide, FPU (VFPv5),
TrustZone, Coprocessor, MVE

The Cortex-M55 core was announced in February 2020 and based on the Armv8.1-M architecture. It has a 4 or 5 stage instruction pipeline.[12]

Key features of the Cortex-M55 core include:

  • ARMv8.1-M Mainline/Helium architecture.[31]
  • 4-stage pipeline.
  • Stack limit boundaries (available only with SAU option).
  • 64-bit AXI main bus[12]

Silicon options:

  • Helium (M-Profile Vector Extension, MVE)
  • Single-Precision and Double-Precision floating-point
  • Digital Signal Processing (DSP) extension support
  • TrustZone security extension support
  • Safety and reliability (RAS) support
  • Coprocessor support
  • Secure and Non-secure MPU with 0, 4, 8, 12, or 16 regions
  • SAU with 0, 4, or 8 regions
  • Instruction cache with size of 4 KB, 8 KB, 16 KB, 32 KB, 64 KB
  • Data cache with size of 4 KB, 8 KB, 16 KB, 32 KB, 64 KB
  • ECC on caches and TCMs
  • 1–480 interrupts
  • 3–8 exception priority bits
  • Internal and external WIC options, optional CTI, ITM, and DWT
  • ARM Custom Instructions

Chips

[edit]

Cortex-M85

[edit]
Cortex-M85
Architecture and classification
MicroarchitectureARMv8.1-M Mainline Helium
Instruction setThumb-1, Thumb-2,
Saturated, DSP,
Divide, FPU (VFPv5),
TrustZone, Coprocessor, MVE

The Cortex-M85 core was announced in April 2022 and based on the Armv8.1-M architecture. It has a 7-stage instruction pipeline.[13]

Silicon options:

  • Optional CPU cache: 0 to 64 KB instruction-cache, 0 to 64 KB data-cache, each with optional ECC.
  • Optional Tightly-Coupled Memory (TCM): 0 to 16 MB instruction-TCM, 0 to 16 MB data-TCM, each with optional ECC.
  • Optional Memory Protection Unit (MPU): 16 regions. Can have separate ones for secure and non-secure mode if TrustZone is implemented.
  • Up to 480 interrupts and NMI
  • 3–8 exception priority bits
  • Optional dual-redundant lock-step operation.

Chips

[edit]

Development tools

[edit]

Documentation

[edit]

The documentation for ARM chips is extensive. In the past, 8-bit microcontroller documentation would typically fit in a single document, but as microcontrollers have evolved, so has everything required to support them. A documentation package for ARM chips typically consists of a collection of documents from the IC manufacturer as well as the CPU core vendor (ARM Limited).

A typical top-down documentation tree is:

Documentation tree (top to bottom)
  1. IC manufacturer website.
  2. IC manufacturer marketing slides.
  3. IC manufacturer datasheet for the exact physical chip.
  4. IC manufacturer reference manual that describes common peripherals and aspects of a physical chip family.
  5. ARM core website.
  6. ARM core generic user guide.
  7. ARM core technical reference manual.
  8. ARM architecture reference manual.

IC manufacturers have additional documents, such as: evaluation board user manuals, application notes, getting started guides, software library documents, errata, and more. See External links section for links to official Arm documents.

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The ARM Cortex-M is a family of 32-bit RISC processor cores developed by for microcontroller-based embedded systems, emphasizing low power consumption, compact size, and deterministic real-time operation to support applications in IoT, industrial , automotive, and . Based on Arm's M-profile architecture, the Cortex-M series delivers low latency, high through the Thumb and Thumb-2 instruction sets, and features like a nested vectored interrupt controller (NVIC) for efficient handling of multiple interrupts in time-critical environments. The architecture evolves from Armv6-M for entry-level cores to Armv7-M and the more recent Armv8-M, which introduces enhanced security via TrustZone technology for protecting sensitive data and in secure/non-secure execution states. Key members of the family span a range of performance levels and capabilities:
  • Cortex-M0 and M0+: Entry-level cores based on Armv6-M, optimized for ultra-low power and minimal area in simple control tasks, achieving up to 0.9 DMIPS/MHz with low gate count for cost-sensitive devices.
  • Cortex-M3: A balanced, general-purpose core on Armv7-M, providing 1.25 DMIPS/MHz for applications requiring moderate performance and single-cycle multiply instructions.
  • Cortex-M4: Enhances the M3 with (DSP) extensions and an optional single-precision (FPU), delivering up to 1.25 DMIPS/MHz and 10x faster floating-point operations for signal control in sensors and audio processing.
  • Cortex-M7: The high-performance flagship on Armv7E-M, offering single-precision FPU with optional double-precision support, branch prediction, and up to 2.14 DMIPS/MHz for demanding tasks like and in automotive and industrial systems.
  • Cortex-M23 and M33: Armv8-M implementations adding TrustZone for secure IoT, with the M23 focusing on area efficiency (0.9 DMIPS/MHz) and the M33 on balanced security and performance (1.5 DMIPS/MHz).
  • Cortex-M55 and M85: Armv8.1-M implementations as latest additions with Arm vector processing technology for inference, providing up to 4.4 CoreMark/MHz on the M55 and unprecedented scalar/DSP/ML performance on the M85 for edge AI applications.
These processors are licensed as (IP) for integration into system-on-chips (SoCs) by vendors, powering billions of devices annually due to their , debug support via CoreSight, and compatibility with the ecosystem including CMSIS software libraries.

Introduction

Overview

The ARM Cortex-M family consists of 32-bit RISC processor cores licensed by for integration into low-cost, energy-efficient embedded systems, particularly microcontrollers used in applications ranging from to industrial controls. These cores are designed to deliver reliable performance in resource-constrained environments, enabling developers to build scalable solutions without the overhead of more complex architectures. Optimized for deterministic and interrupt-driven operations in deeply embedded scenarios, the Cortex-M processors incorporate features such as the Nested Controller (NVIC), which provides low-latency handling to ensure responsive real-time behavior. This focus on predictability and efficiency makes them ideal for applications requiring consistent execution, such as sensor interfaces and control systems. By 2023, over 250 billion Arm-based chips had been shipped cumulatively, with the Cortex-M series dominating the market by capturing approximately 69% share by core architecture as of 2024. In contrast to the high-performance Cortex-A profile for application processors or the Cortex-R profile for real-time systems, the Cortex-M prioritizes low power consumption and minimal cost over maximum computational throughput.

History

The ARM Cortex-M series originated from the evolution of ARM's earlier 8/16-bit microcontroller cores in the 1990s, such as the ARM7TDMI, which dominated embedded applications but faced limitations in scalability and efficiency as demand grew for more advanced 32-bit processing in cost-sensitive devices. In response to the market's shift toward higher performance without excessive power consumption, ARM announced the first Cortex-M processor, the Cortex-M3, on October 19, 2004, marking the debut of a dedicated family optimized for deeply embedded systems. Silicon implementations of the Cortex-M3 became available in 2006, enabling widespread adoption in real-time applications. Subsequent releases expanded the family's range to address diverse embedded needs. The Cortex-M0, introduced in 2009 as the smallest 32-bit core, targeted ultra-low-power scenarios to replace legacy 8/16-bit designs. In 2010, the Cortex-M4 added (DSP) and (FPU) capabilities, enhancing support for signal processing tasks. The high-performance Cortex-M7 followed in 2014, doubling compute capabilities for demanding applications like . The transition to Armv8-M began with the announcements of the Cortex-M23 and Cortex-M33 in October 2016, introducing baseline and mainline profiles respectively. Key evolutionary drivers included the industry's move toward 32-bit dominance for better code density and performance, the integration of security features like TrustZone-M in 2016 to enable secure/non-secure execution states, and the addition of vector processing via (M-Profile Vector Extension) in the Armv8.1-M architecture starting in 2019, responding to rising IoT and demands at the edge. Later advancements featured the Cortex-M35P in May 2018 for enhanced secure isolation against physical attacks, the -enabled Cortex-M55 in February 2020, the top-performance Cortex-M85 in April 2022, and the compact -supporting Cortex-M52 in November 2023. By 2025, continued rebranding its offerings from individual "Cortex" cores toward integrated compute subsystems to streamline development for complex AIoT platforms, though the M-series naming remained for legacy support; no new Cortex-M core announcements occurred by November 2025. This licensing model has facilitated broad adoption across billions of devices, particularly fueling the IoT expansion since 2010.

Licensing and Customization

The ARM Cortex-M processor cores are licensed as (IP) by to semiconductor vendors, who integrate them into system-on-chip (SoC) designs or microcontrollers (MCUs) for embedded applications. This licensing model provides access to synthesizable (RTL) designs, enabling partners such as and to customize and manufacture chips without developing the core from scratch. The business structure typically involves upfront access fees—waived in some cases through programs like Arm DesignStart for cores such as Cortex-M0 and Cortex-M3—followed by a royalty-based payment per shipped chip, aligning costs with commercial success. Customization options allow licensees to tailor the cores to specific requirements, including configurable parameters for elements like instruction cache sizes, multiplier units, and peripheral interfaces such as AHB or APB buses. Silicon-proven implementations, including reference designs and subsystems, are available to accelerate time-to-market by reducing verification efforts. For instance, Arm's Flexible Access and Total Access programs provide scalable access to these configurable IP blocks, enabling experimentation and integration without immediate full commitment. Additionally, custom instructions introduced in Armv8-M permit vendors to add application-specific accelerations—such as for or —directly into the instruction set decoder, using the same registers as standard instructions while preserving compatibility with Arm's ecosystem. Cortex-M cores are offered in variants suited to different design needs: soft macros, which are synthesizable RTL allowing area and power optimization during place-and-route, and hard macros, which are pre-implemented layouts for fixed performance and faster integration but with less flexibility. These variants support a range of process nodes, from mature 180nm for cost-sensitive devices to advanced 7nm and below as of 2025, facilitating deployment in high-efficiency IoT and automotive applications. Arm's collaboration with foundries like ensures optimized implementations across these nodes. Semiconductor vendors frequently extend Cortex-M cores with proprietary features while upholding Arm compatibility to ensure across the ecosystem. For example, NXP incorporates vector processing capabilities in its MCU portfolios, leveraging custom extensions for enhanced in industrial and IoT devices, built atop the standard Cortex-M architecture. This approach allows differentiation in performance-critical areas without breaking binary compatibility for Armv8-M software.

Architecture

Instruction Set Architecture

The ARM Cortex-M processors implement the M-profile of the , utilizing the and Thumb-2 instruction sets, which consist of 16-bit and 32-bit instructions optimized for code density and efficient memory usage in embedded systems. The Armv6-M baseline, used in Cortex-M0 and Cortex-M0+ cores, supports the ARMv6-M instruction set with a of 32-bit Thumb-2 instructions for enhanced functionality while maintaining compactness. In contrast, the Armv7-M , implemented in Cortex-M3, Cortex-M4, and Cortex-M7 cores, provides the full Thumb-2 instruction set, enabling more complex operations through variable-length instructions that improve without significantly increasing code size. The Armv8-M , featured in Cortex-M23 and Cortex-M33 cores, employs a of the T32 (Thumb-2) instruction set, ensuring with prior M-profile versions through 16-bit and 32-bit encodings. Key extensions to the base ISA enhance capabilities in higher-end cores. The Cortex-M4 and Cortex-M7 incorporate DSP extensions under Armv7-M, including (SIMD) multiply-accumulate (MAC) operations and support, which accelerate common tasks like filtering and transforms. These extensions introduce instructions such as SMLAD (signed multiply-accumulate dual) for parallel 16-bit operations, enabling efficient handling of audio and sensor data without floating-point units. Building on this, the Armv8.1-M architecture introduces the M-Profile Vector Extension (MVE), branded as , which adds 128-bit vector processing for and advanced DSP workloads, supporting operations on 8-bit, 16-bit, and 32-bit data types with both integer and floating-point variants. The Armv8-M defines two conformance levels: Baseline and Mainline. The Baseline variant, a superset of Armv6-M, targets simpler implementations with basic Thumb instructions and omits advanced DSP and vector extensions for reduced complexity and power. The Mainline variant, a superset of Armv7-M, includes full support for DSP extensions and Helium, providing greater performance for demanding applications. Post-Armv7-M, certain legacy Thumb-1 instructions, such as those related to ThumbEE mode, are deprecated to streamline the ISA and eliminate rarely used features. Binary compatibility across Cortex-M cores is facilitated by the CMSIS software interface, allowing portable code without reliance on features like Jazelle direct bytecode execution or big.LITTLE heterogeneous processing found in A- and R-profile architectures.

Pipeline and Core Features

The ARM Cortex-M family utilizes architectures tailored to balance performance, power efficiency, and complexity across its cores. Entry-level designs, such as the Cortex-M0+ and Cortex-M23, employ a 2-stage consisting of fetch/decode and execute stages, emphasizing simplicity and minimal power draw for ultra-constrained applications. In contrast, mid-range cores like the Cortex-M3 and Cortex-M4 implement a 3-stage with fetch, decode, and execute phases, incorporating branch speculation in the Cortex-M4 to improve efficiency without full prediction hardware. Higher-end cores introduce advanced pipelining for greater throughput. The Cortex-M7 features a 6-stage superscalar with branch prediction, enabling dual-issue execution of instructions and supporting out-of-order completion for loads and stores to boost performance in demanding tasks. Branch prediction is also present in subsequent cores like the Cortex-M33 and Cortex-M55, reducing pipeline stalls from conditional branches and enhancing overall . Performance characteristics vary by core, as quantified by MIPS per MHz (DMIPS/MHz) and per MHz benchmarks, which assess integer and mixed workload efficiency, respectively. The following table summarizes representative metrics for select cores:
CoreDMIPS/MHzCoreMark/MHz
Cortex-M00.962.33
Cortex-M0+0.992.46
Cortex-M31.253.34
Cortex-M41.253.42
Cortex-M72.145.01
Cortex-M230.882.64
These ratings reflect optimized configurations and highlight the family's , with higher cores achieving up to 2.3 times the of entry-level ones for compute-intensive operations. Core components shared across the family ensure deterministic real-time behavior and system integration. The Nested Vectored Interrupt Controller (NVIC) provides low-latency interrupt handling, supporting up to 240 interrupt sources with configurable priorities (typically 8 to 256 levels via 3- to 8-bit fields), tail-chaining to minimize handler overhead, and late-arrival prioritization for critical events. The SysTick timer, a 24-bit down-counter, generates periodic interrupts for RTOS scheduling and is present or optional in all cores depending on configuration. Most cores include an optional Memory Protection Unit (MPU) with 8 to 16 configurable regions, enabling access control, sub-region disabling, and background region support to isolate code, data, and peripherals. Power management features promote energy efficiency in battery-powered and embedded systems. All cores support Wait For (WFI) and Wait For Event (WFE) instructions to halt execution and enter sleep states until an or event occurs, with Sleep-on-Exit extensions to skip unnecessary returns from handlers. Optional at architectural levels disables unused stages and peripherals during periods, reducing dynamic power. Typical active-mode power consumption falls below 1 mW/MHz on 90 nm processes, with examples including 12.5–16.6 μW/MHz for the Cortex-M0 and 8.47 μW/MHz for the Cortex-M4 on more advanced nodes.

Debug and Trace Support

The ARM Cortex-M processors incorporate the CoreSight architecture, a scalable on-chip debug and trace infrastructure developed by , which enables efficient resource sharing among debug and trace components to facilitate development, testing, and runtime analysis in embedded systems. This architecture integrates various components connected via a debug bus, typically the Advanced High-performance Bus Access Port (AHB-AP) in Cortex-M implementations, allowing non-intrusive access to processor registers, , and trace data without halting the system entirely. CoreSight supports standardized external interfaces for debug access, primarily through the Debug Access Port (DAP), which can be accessed via the Serial Wire Debug (SWD) protocol or the Joint Test Action Group (JTAG) interface compliant with IEEE 1149.1. SWD offers a two-wire alternative to the traditional four- or five-wire JTAG, reducing pin count while maintaining full debug functionality, and is widely used in resource-constrained Cortex-M devices. For halting and control, CoreSight includes breakpoint and watchpoint units, implemented via the Flash Patch and Breakpoint (FPB) unit for code breakpoints and the Data Watchpoint and Trace (DWT) unit for data access monitoring; the number of supported units varies by core, with entry-level cores like Cortex-M0+ offering 1-4 breakpoints and 1-2 watchpoints, while higher-end cores such as Cortex-M7 can support up to 16 breakpoints. These units enable precise halting on instruction execution or data accesses, essential for debugging complex firmware. Trace capabilities in CoreSight enhance runtime analysis by capturing execution flows without software modifications. The Embedded Trace Macrocell (ETM) provides instruction trace by outputting compressed packet streams of program flow, allowing reconstruction of code execution paths for profiling and . Complementing this, the DWT unit includes performance counters for cycle counting, exception tracing, and data value sampling, helping identify bottlenecks in real-time applications. For software , the Instrumentation Trace Macrocell (ITM) supports printf-style by routing application-generated messages, timestamps, and hardware events through a stimulus port, often funneled to an external trace port like Serial Wire Output (SWO) for low-overhead logging. In multi-core configurations, although less common in standard Cortex-M designs due to their focus on single-core efficiency, CoreSight enables synchronized debugging via the Cross Trigger Interface (CTI) and Embedded Cross Trigger (ECT) matrix. This setup allows debug events—such as a breakpoint on one core—to propagate triggers to others, facilitating coordinated halting and trace correlation in custom system-on-chip (SoC) implementations with multiple Cortex-M instances. Tool integration is streamlined through standards like CMSIS-DAP, which provides a vendor-neutral USB-based interface to the CoreSight DAP, enabling seamless connectivity with development environments for SWD/JTAG access and trace capture.

Security Features

TrustZone-M

TrustZone-M, introduced as part of the Armv8-M architecture in 2016, provides hardware-enforced isolation between Secure and Non-Secure worlds on Cortex-M processors. This security extension partitions the system into two execution environments, where the Secure world handles trusted operations and the Non-Secure world runs untrusted code, preventing unauthorized access to sensitive resources. The isolation is achieved through address space controllers, including the Secure Attribution Unit (SAU) and the Implementation Defined Attribution Unit (IDAU), which assign security attributes to memory regions and peripherals. The SAU is a programmable component configurable only in the Secure state, allowing up to 16 secure regions to be defined for partitioning, while the IDAU provides a fixed, implementation-specific that the SAU can override. These units ensure that Non-Secure code cannot access Secure or peripherals, enforcing runtime protection against software attacks such as buffer overflows or privilege escalations. Additionally, TrustZone-M incorporates an airgap mechanism for isolation via the Nested Vectored Interrupt Controller (NVIC), which includes a secure mask register to prevent Non-Secure handlers from responding to Secure interrupts, thereby maintaining separation even during . Processor operation in TrustZone-M builds on the traditional Handler and Thread modes, extended with Secure and Non-Secure states, as well as privilege levels (Privileged or Unprivileged). Secure software can execute in either mode with elevated privileges to manage resources, while Non-Secure is restricted to Unprivileged Thread mode for safety. Context switching between worlds occurs via Secure Gateway (SG) instructions, which are placed at entry points to the Secure world; these instructions validate the transition and ensure secure parameter passing without exposing sensitive data. The primary benefits of TrustZone-M include robust runtime security for microcontrollers, enabling features like secure boot to verify firmware integrity at startup and isolated cryptographic operations to protect keys and algorithms from compromise. By providing this foundation, it supports development of secure IoT devices and embedded systems without requiring separate secure elements, reducing costs while enhancing protection against common attack vectors. This technology is implemented in cores such as the Cortex-M33, where it integrates with debug features for secure tracing.

Additional Security Extensions

The Pointer Authentication and Branch Target Identification (PACBTI) extension in the Armv8.1-M architecture, implemented in the Cortex-M85 processor, enables cryptographic signing of pointers to defend against exploits like buffer overflows and return-oriented programming by appending a Pointer Authentication Code (PAC) to pointer values, along with BTI for validating indirect branches. The PAC is generated using a block cipher derived from AES-128, employing 128-bit keys and a modifier (such as the stack pointer) to ensure uniqueness and verifiability; upon use, the PAC is stripped and authenticated, with failed verification resulting in the pointer being replaced by an invalid address to trigger a fault. In the Cortex-M35P processor, isolation is enhanced through physical security mechanisms, including a P-channel design that provides hardware-level separation of secure assets to protect against invasive tampering and side-channel attacks. This P-channel facilitates isolated execution paths and memory regions, integrated with TrustZone-M for runtime protection, and contributes to the processor's EAL6+ certification under for high-assurance security. Helium technology, via the M-Profile Vector Extension (MVE), incorporates secure vector state isolation in TrustZone-M-enabled cores to safeguard DSP and workloads from side-channel leaks, by banking the eight 128-bit vector registers separately for secure and non-secure execution states. This prevents unauthorized access to sensitive vector data during context switches, maintaining in mixed-trust environments without impacting performance. The Armv8-M architecture deprecates legacy Memory Protection Unit (MPU) configurations from Armv7-M to streamline security and reduce vulnerabilities, eliminating support for certain outdated region setups in favor of enhanced PMSAv8 protections. Implementations without TrustZone-M are cautioned against for contemporary applications demanding robust isolation.

Processor Cores

Entry-Level Cores

The entry-level cores in the ARM Cortex-M family, including the Cortex-M0, Cortex-M0+, and Cortex-M1, are optimized for ultra-low-cost, low-power embedded applications where minimal silicon area and energy efficiency are paramount. These processors implement the ARMv6-M architecture, focusing on simplicity and compatibility with the Thumb instruction set to enable 32-bit performance at an 8/16-bit price point. They target scenarios such as simple sensors, wearables, and cost-sensitive IoT devices, prioritizing gate count reduction and power optimization over advanced features like floating-point units or digital signal processing. The Cortex-M0, introduced in 2009, serves as the foundational entry-level core with a three-stage (fetch, decode, execute) and delivers 0.9 DMIPS/MHz performance. It features an ultra-low gate count of approximately 12,000 , enabling integration into analog and mixed-signal devices, and lacks a (MPU) to minimize area. The core includes an integrated Nested Vectored Interrupt Controller (NVIC) supporting up to 32 interrupts and uses an AMBA AHB-Lite system interface for straightforward system-on-chip (SoC) integration. Ideal for ultra-low-cost applications like basic control systems and disposable , the Cortex-M0 achieves active power consumption as low as 9 μA/MHz at 0.9V supply. Building on the Cortex-M0, the Cortex-M0+ was released in 2010 as an enhanced variant with a for improved energy efficiency and code density. It offers slightly higher performance at 0.93-0.99 DMIPS/MHz while reducing area compared to its predecessor, with implementations showing up to 15% smaller footprint in certain benchmarks. Key additions include support for an optional MPU with eight regions and integration compatibility with micro-DMA controllers for efficient data transfers without CPU intervention. The core enables sleep-walking peripherals in low-power modes, allowing asynchronous peripheral operation during CPU sleep states to extend battery life. Targeted at sensors, wearables, and battery-operated devices like the , it maintains active power below 50 μA/MHz and supports three low-power modes for dynamic . The Cortex-M1, also debuted in 2009, is a synthesizable soft core specifically designed for field-programmable gate arrays (FPGAs) from vendors like and (formerly ). It supports configurable tightly coupled memories (up to 1024 KB) and operates at frequencies up to 150 MHz depending on the FPGA fabric, with four interrupt priority levels via NVIC. Unlike the M0 series, it allows up to 256 custom instructions for FPGA-specific acceleration, enhancing flexibility for hardware-software co-design in prototyping or reconfigurable systems. Suited for FPGA-based embedded prototypes and custom logic integration, it retains the ARMv6-M Thumb instruction set for low-latency handling. These entry-level cores trade advanced capabilities for extreme efficiency, featuring minimal depths and no support for full instructions beyond the basic subset to achieve sub-50 μA/MHz active currents and gate counts under 15,000. This design philosophy ensures prolonged battery life in power-constrained environments but limits them to straightforward tasks without DSP extensions or hardware floating-point, distinguishing them from mid-range siblings.
CoreArchitecturePipeline StagesPerformance (DMIPS/MHz)Gate Count (approx.)Key FeaturesTypical Power (active)
Cortex-M0ARMv6-M30.912,000NVIC (up to 32 IRQs), no MPU~9 μA/MHz @ 0.9V
Cortex-M0+ARMv6-M20.93-0.99<12,000Optional MPU, micro-DMA support, sleep modes<50 μA/MHz
Cortex-M1ARMv6-M30.88Configurable (~15k)FPGA soft core, custom instructions (up to 256), up to 150 MHzN/A (FPGA-dependent)

Mid-Range Cores

The cores in the ARM Cortex-M family, specifically the Cortex-M3 and Cortex-M4, provide a balance of performance and efficiency for applications requiring more computational capability than entry-level options, while maintaining low power consumption suitable for embedded systems. These cores build on the Thumb-2 and incorporate enhancements for handling moderately complex tasks, such as real-time processing in control systems. They feature a 3-stage design that supports efficient instruction execution without the complexity of advanced caching mechanisms found in higher-end variants. The Cortex-M3, introduced in and based on the Armv7-M architecture, serves as the foundational mid-range core with a 3-stage that delivers 1.25 DMIPS/MHz in performance efficiency. It implements the Thumb-2 ISA, enabling compact code density and high execution speeds for 32-bit operations. The core includes a Nested Controller (NVIC) capable of handling up to 240 interrupts with low latency, facilitating responsive real-time applications. An optional (MPU) is available to support secure memory partitioning, and optional divide instructions (SDIV/UDIV) enhance arithmetic capabilities for specific use cases. The Cortex-M4, released in 2010, extends the Cortex-M3 architecture by integrating a single-precision (FPU) compliant with VFPv4-SP and dedicated (DSP) extensions, including (SIMD) instructions for efficient vector operations. This delivers 1.25 DMIPS/MHz for integer performance, with the FPU and DSP enabling up to 10x faster floating-point and operations compared to software emulation. The DSP features, such as single-cycle 16/32-bit multiply-accumulate (MAC) operations and saturating arithmetic, enable streamlined without external coprocessors. Implementations of the Cortex-M4 typically operate at clock frequencies between 80 MHz and 200 MHz, with a core area of around 0.05 mm² in technology. Common features across these mid-range cores include support for branch prediction to optimize and optional hardware divide for faster division, contributing to their suitability for deterministic embedded environments. In practice, the NVIC provides handling with minimal overhead, as detailed in core pipeline features. These cores excel in applications like , where combining data from multiple sensors requires moderate floating-point and vector computations, and motor control systems, which demand precise real-time adjustments using DSP-accelerated algorithms.

High-Performance Cores

The ARM Cortex-M7 processor, released in 2014 and based on the Armv7E-M architecture, represents the high-performance scalar core in the Cortex-M family prior to the introduction of vector extensions. It features a 6-stage superscalar pipeline with branch prediction, enabling in-order dual-issue execution of instructions, including load/store pairs, to achieve up to 2.14 Dhrystone MIPS per MHz (DMIPS/MHz) in scalar configurations. An optional floating-point unit (FPU) supports both single- and double-precision operations, enhancing computational efficiency for signal processing tasks, while optional instruction and data caches—each configurable up to 64 KB—along with a branch target buffer, reduce memory access latencies and improve branch prediction accuracy. Key enhancements in the Cortex-M7 focus on deterministic performance for real-time systems, including tightly coupled (TCM) interfaces for instruction (ITCM) and data (DTCM) regions, each supporting up to 16 MB of low-latency, single-cycle access to avoid cache misses in critical code paths. The optional low-latency peripheral port (LLPP), implemented as a dedicated AHB-Lite interface, enables direct, zero-wait-state reads and writes to peripherals, minimizing latency for time-sensitive operations. These features build on the base Armv7-M , integrating seamlessly with existing debug and trace mechanisms for enhanced system observability. Performance scales to up to 600 DMIPS at 300 MHz clock frequencies, with 5.01 /MHz efficiency, making it suitable for demanding embedded workloads. Power consumption is approximately 2 mW/MHz when implemented in a 28 nm process, balancing high throughput with energy efficiency for battery-constrained designs. The core targets real-time control applications in automotive and industrial sectors, such as motor drives and , where high clock speeds and low latency—typically 12 cycles—are essential for responsive operation.

Armv8-M Baseline Cores

The Armv8-M Baseline cores represent the foundational implementations of the Armv8-M profile, emphasizing through TrustZone integration while prioritizing low power and minimal area for constrained embedded systems. These cores, such as the Cortex-M23, implement the Baseline sub-profile, which provides a superset of the Armv6-M instruction set without the advanced extensions of the Mainline sub-profile, enabling efficient operation in energy-harvesting IoT devices and deeply embedded applications. The Cortex-M23, introduced in , is the smallest processor core supporting TrustZone technology, featuring a compact two-stage optimized for ultra-low power consumption. It delivers 0.99 MIPS per MHz (DMIPS/MHz) performance and occupies approximately 0.01 mm² in a minimal configuration at 40 nm process technology, making it suitable for the most area-constrained designs. The core supports the Thumb instruction set of the Armv8-M Baseline profile, includes a (MPU) and Security Attribution Unit (SAU) for partitioning secure and non-secure states, and operates at frequencies up to around 120 MHz depending on the process node. These Baseline cores trade higher computational density for simplified pipelines and hardware-enforced security partitioning, facilitating compliance with the Platform Security Architecture (PSA) for certified IoT security without requiring full DSP or vector processing capabilities.

Advanced Secure and Vector Cores

The advanced secure and vector cores in the ARM Cortex-M family represent the evolution toward integrating robust security mechanisms with vector processing capabilities, enabling efficient (ML) and (DSP) in resource-constrained embedded systems such as IoT devices and wearables. These cores build on the Armv8.1-M Mainline architecture, which supports enhanced isolation through TrustZone-M and protects vector state in secure environments, allowing developers to partition applications between secure and non-secure worlds while leveraging vector extensions for accelerated . Operating frequencies typically range from 100 MHz for low-power applications to up to 800 MHz in high-performance implementations, balancing efficiency and throughput. The Cortex-M33, announced in , enhances security-focused capabilities within the Armv8-M Mainline framework. It achieves up to 1.54 DMIPS/MHz in its base configuration, with optional single-cycle multiply-accumulate (MAC) and (DSP) extensions for enhanced signal processing efficiency, and includes an optional (FPU). The core features a three-stage in-order , a full Nested Controller (NVIC), and supports frequencies up to 200 MHz, enabling secure/non-secure state isolation via TrustZone-M without the overhead of advanced vector extensions. The Cortex-M35P, introduced in 2018, serves as a secure variant of the Cortex-M33, implementing the Armv8-M architecture with built-in features to counter tampering attacks. It incorporates P-cell isolation technology, which provides hardware-level protection against physical probes and side-channel attacks by isolating critical cells in the design, achieving certification up to EAL6+. Performance reaches 1.5 DMIPS/MHz, with an optional single-precision (FPv5) for enhanced numerical processing in secure contexts. This core is particularly suited for applications requiring tamper resistance without compromising the deterministic behavior of Cortex-M processors. Released in 2023, the Cortex-M52 is the smallest core to incorporate Arm technology, targeting area- and cost-sensitive devices like wearables and sensors. Based on Armv8.1-M, it delivers 1.6 DMIPS/MHz in scalar mode, with 's M-Profile Vector Extension (MVE) providing up to a 4x performance boost for vector operations through support for 32-bit, 16-bit, and 8-bit multiply-accumulate cycles. Optional TrustZone integration includes Pointer Authentication (PAC) and Branch Target Identification (BTI) for PSA Certified Level 2 compliance, ensuring secure vector state handling. Its compact design minimizes silicon area while enabling compact ML inference, such as keyword spotting or basic . The Cortex-M55, announced in 2020, emphasizes efficient and DSP with Armv8.1-M Mainline and the first implementation of MVE in the Cortex-M series. It achieves 1.6 DMIPS/MHz scalar performance, augmented by branch prediction and dual-issue execution for improved efficiency, and enables low-power vector processing with significant speedups in DSP tasks—for instance, up to 15x faster compared to scalar equivalents on prior cores. Security features include optional TrustZone-M for isolating vector registers, making it ideal for always-on edge in battery-powered devices like smart sensors. As the highest-performance entry in this category, the Cortex-M85, launched in 2022, combines Armv8.1-M Mainline with advanced extensions and large vector register files for demanding edge AI workloads. It offers 3.13 DMIPS/MHz scalar performance—more than double that of mid-range cores—with up to 5x vector acceleration via enhanced MVE supporting wider data types and more parallel operations, reaching over 6 /MHz overall. Integrated Pointer Authentication (PAC) and TrustZone-M secure the vector state against software exploits, while implementations can scale to 800 MHz for real-time processing in industrial IoT and automotive applications.

Implementations and Applications

Notable Microcontroller Implementations

The ARM Cortex-M cores have been widely integrated into commercial microcontrollers (MCUs) by various semiconductor vendors, enabling diverse applications through the addition of peripherals, memory, and power optimization features. STMicroelectronics' STM32 family exemplifies this, with the STM32F1 series, introduced in 2007, utilizing the Cortex-M3 core and incorporating USB and CAN peripherals for industrial control and consumer electronics. The STM32F4 series, launched in 2011 with the Cortex-M4 core, added DSP instructions and Ethernet support, enhancing real-time processing for networking and multimedia devices. The STM32H7 series, launched in 2017 with single-core Cortex-M7 up to 480 MHz, later added dual-core configurations (M7 + M4) in 2019, supporting high-speed interfaces like DDR and PCIe for demanding embedded systems. For low-power needs, the STM32L0 series, based on the Cortex-M0+ core since around 2014, achieves sub-1 μA standby current with integrated LCD drivers and RF capabilities. NXP Semiconductors' LPC series provides another prominent implementation lineage. The LPC11xx family, released in 2010 with the core, offers basic I/O and ADC peripherals in a compact package for cost-sensitive applications like sensors and appliances. The LPC43xx series from around 2011 combines and cores in a heterogeneous setup, with the M4 handling real-time tasks and the M0 managing connectivity, integrated with Ethernet and USB HS for industrial gateways. NXP's RT series, starting with the 2018 RT1050 using the Cortex-M7 at 600 MHz, blurs MCU and MPU boundaries by including high-speed peripherals like MIPI CSI and LCD controllers, targeting crossover applications in wearables and IoT. Texas Instruments' MSP432 series, introduced in 2015 with the Cortex-M4F core, emphasizes ultra-low power consumption (down to 850 nA in standby) alongside integrated gauges for battery monitoring, making it suitable for portable medical and metering devices. Nordic Semiconductor's nRF52 series, based on the Cortex-M4 since 2015, integrates (BLE) transceivers and up to 1 MB, powering wireless sensor nodes and fitness trackers with concurrent multiprotocol support. Other notable implementations include Silicon Labs' EFM32 series spans Cortex-M0+, M3, and M4 cores across Gecko families, featuring autonomous energy modes that reduce active current to 15 μA/MHz for always-on sensing in smart home devices. Apple's M9 motion coprocessor, embedded in the A9 SoC since 2015, uses a Cortex-M3 core for low-power sensor fusion in iPhones, handling accelerometer and gyroscope data independently. As of 2025, integration trends in Cortex-M-based MCUs increasingly incorporate AI accelerators; for instance, NXP's MCX N series pairs the Cortex-M33 core with a neural processing unit (NPU) delivering up to 300 GOPS for edge AI in automotive and industrial IoT, while maintaining TrustZone security. Similarly, Renesas' RA8 series, introduced in 2024, implements the Cortex-M85 core at up to 1 GHz for high-performance AI edge applications.

Target Markets and Use Cases

The ARM Cortex-M processor family finds extensive application in the (IoT) and sectors, where low power consumption and efficient processing are paramount. Cortex-M0+ cores are particularly suited for battery-constrained devices such as sensors and wearables, enabling always-on functionality in smart home appliances and fitness trackers. For more advanced edge tasks, the Cortex-M55 supports workloads like in industrial sensors, leveraging Arm Helium technology for enhanced vector processing. Cortex-M processors dominate the low-power market, holding approximately 70% share in 2024, with projections indicating continued leadership in powering over half of IoT devices by 2025 due to their scalability across billions of connected endpoints. In automotive and industrial applications, Cortex-M cores provide real-time control and safety-critical processing. The Cortex-M4 and Cortex-M7 are commonly deployed in motor drives for precise signal control and , supporting tasks like inverter control in electric vehicles. For electronic control units (ECUs), the Cortex-M33 enables secure operations with TrustZone for isolation of critical functions, achieving compliance with D through certified safety mechanisms. Consumer electronics leverage Cortex-M's DSP capabilities for multimedia processing. The Cortex-M4's dedicated digital signal processing extensions facilitate efficient audio encoding and filtering, such as in wireless headphones and smart speakers for real-time noise cancellation. In imaging devices like digital cameras, the Cortex-M7's instruction and caches enhance performance for high-throughput tasks, including and caching of frame buffers. In medical and enterprise domains, Cortex-M cores address stringent power and security needs. The ultra-low-power Cortex-M0+ is ideal for implantable devices like pacemakers, where it manages sensing and pacing with minimal energy draw to extend battery life over years. For point-of-sale (POS) terminals, the Cortex-M85 enables on-device AI for fraud detection, processing transaction patterns in real-time while maintaining secure isolation via TrustZone. Real-world deployments highlight Cortex-M's versatility in specialized scenarios. ' series, based on Cortex-M cores, powers drone flight controllers for real-time attitude stabilization and , as demonstrated in quadrotor UAV systems that achieve stable hovering and navigation. Similarly, Nordic Semiconductor's nRF52 series with Cortex-M4 supports mesh networks in smart lighting and , enabling scalable, low-latency communication across hundreds of nodes in environments like office complexes.

Development Ecosystem

Software Development Tools

Software development for ARM Cortex-M processors relies on a suite of specialized tools that facilitate compilation, integration, , and deployment of embedded applications. These tools are designed to leverage the Cortex-M architecture's features, such as its instruction set and low-power operation, enabling efficient development for resource-constrained devices. Key components include compilers optimized for ARM's instruction sets, integrated development environments (IDEs) with simulation capabilities, standardized frameworks for , and debugging interfaces supporting protocols like Serial Wire Debug (SWD) and CoreSight. Compilers form the foundation of Cortex-M by translating high-level code into efficient machine instructions. The GNU Compiler Collection (GCC) for ARM, known as Arm GCC, is a free, open-source that supports all Cortex-M profiles, including Armv8-M, and is widely used for its compatibility with various IDEs and operating systems. Compiler, a tool, offers advanced optimizations tailored for Cortex-M, particularly for the vector extension in Armv8-M processors, enabling up to 5x performance gains in tasks and up to 15x in tasks compared to scalar code. Additionally, / provides robust support for Cortex-M through its backend integration, allowing developers to compile C/C++ code with optimizations like link-time optimization and sanitizer tools for embedded debugging. Integrated development environments streamline the workflow by combining editing, building, and debugging in a single interface. Keil MDK (Microcontroller Development Kit) provides a comprehensive ecosystem for Cortex-M devices, including the µVision IDE with advanced simulation features that emulate peripherals and real-time behavior without hardware. IAR Embedded Workbench stands out for its fast compilation speeds and static analysis tools, supporting over 280 Cortex-M devices with features like MISRA C compliance checking to ensure code reliability. For STM32-based Cortex-M microcontrollers, STM32CubeIDE offers a vendor-specific, Eclipse-based environment with integrated code generation from graphical peripheral configurators, accelerating setup for STMicroelectronics hardware. Frameworks abstract hardware complexities, promoting portability across Cortex-M implementations. The Cortex Microcontroller Software Interface Standard (CMSIS) delivers standardized APIs for accessing core peripherals like the Nested Vectored Interrupt Controller (NVIC) and system tick timer, enabling consistent software reuse without vendor-specific code. Mbed OS, an open-source (RTOS) from , targets IoT applications on Cortex-M devices, providing built-in support for connectivity protocols, security, and multithreading with low memory overhead. Zephyr, another open-source RTOS, supports Cortex-M with Arm TrustZone integration for secure execution environments, allowing isolated processing of sensitive tasks while maintaining scalability for tiny embedded systems. Debugging tools are essential for verifying and optimizing Cortex-M . OpenOCD, paired with the GNU Debugger (GDB), offers a free, open-source solution for on-chip via SWD and interfaces, compatible with CoreSight debug components for trace and management. Hardware probes like Segger J-Link provide high-speed and flashing for Cortex-M targets, supporting unlimited flash breakpoints and real-time variable monitoring through its GDB server integration. These tools collectively ensure robust development cycles, from initial prototyping to production deployment.

Documentation and Resources

Arm provides comprehensive official documentation for the Cortex-M processor family, including Technical Reference Manuals (TRMs) tailored to individual cores that detail their , programmer's model, instruction sets, registers, and integration guidelines. For instance, the Cortex-M85 TRM covers advanced features such as Pointer Authentication (PAC) and the Helium vector extension, enabling developers to implement secure and high-performance in embedded systems. Similarly, TRMs for other cores like the Cortex-M4 and Cortex-M33 describe core-specific behaviors, such as floating-point units and TrustZone security extensions. The Armv8-M Architecture Reference Manual serves as the foundational document for the microcontroller profile of the Arm architecture, specifying the instruction set, exception handling, memory model, and security features applicable to modern Cortex-M cores like the M33, M23, M55, and M85. This manual is essential for understanding baseline and extension behaviors, including the integration of Armv8.1-M enhancements for pointer authentication and memory tagging. Arm's developer ecosystem includes access to evaluation resources through the DesignStart program, which offers free, downloadable RTL designs and simulation kits for cores such as the Cortex-M0 and Cortex-M3 to facilitate prototyping and SoC integration without initial licensing costs. Additionally, the Arm KnowledgeBase provides articles on core migrations, such as transitioning from Cortex-M4 to Cortex-M33 designs, addressing changes in instruction sets, security implementations, and toolchain compatibility to minimize redesign efforts. Community-driven resources complement official documentation, with the Arm Community forums offering a platform for developers to discuss Cortex-M implementation challenges, share code snippets, and seek guidance on topics ranging from handling to power optimization. The CMSIS (Cortex Microcontroller Software Interface Standard) repositories on , maintained by , provide open-source libraries for peripheral abstraction, DSP functions, and RTOS APIs, supporting consistent software development across Cortex-M vendors. Vendor-specific documentation, such as ' Application Note (AN) series for devices, delivers practical implementation details for Cortex-M cores in real-world scenarios, including peripheral configuration and examples. As of 2025, has released updated resources reflecting ongoing evolution, including the Programmer's Guide, which details vector intrinsics, auto-vectorization techniques, and optimization strategies for DSP and ML workloads on Helium-enabled cores like the Cortex-M85. notices for legacy ISAs, such as the phase-out of certain Armv6-M and Armv7-M features in favor of Armv8-M baselines, are outlined in migration guides and updates to encourage adoption of secure, efficient modern profiles.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.