First, let's take a look at the M7 and M4 architecture comparison on ARM's official website:
It can be seen that the M7 core has more instruction and data cache functions and a new bus matrix than the M4 core, and the performance improvement it brings is very obvious, as shown in the figure below:
The STM32F7 series devices are the first 32-bit microcontrollers based on ARM
Cortex
-M7. Taking advantage of ST's ART accelerator
and L1 cache, the STM32F7 series devices achieve the maximum theoretical performance of the Cortex
-M7. Benchmark scores have consistently reached 1082 CoreMark and 462 DMIPS, regardless of whether the code is executed from the embedded Flash memory, from the internal RAM or from external memory (SRAM, SDRAM or Quad SPI Flash memory). The high performance of the STM32F7 series devices comes from: Powerful superscalar pipeline and DSP performance provides a fast real-time response with low interrupt latency Efficient access to large external memories High-performance floating-point computing capability for complex calculations
Cortex-M7 core The STM32F7
series devices are based on the high-performance ARM
Cortex
-M7 32
bitRISC
core, operating at frequencies up to 216MHz
. The Cortex-M7 core has a high-performance single/double-precision floating-point unit (ARM) that supports single/double-precision data processing instructions and data types. It also has a full set of DSP instructions and a memory protection unit (MPU) to improve application security. Upward compatibility from the Cortex-M4 to the Cortex-M7 allows binaries compiled for the Cortex-M4 to run directly on the Cortex-M7. The Cortex-M7 features a 6/7-stage superscalar pipeline with branch prediction and dual instruction execution. Branch prediction allows branch resolution to predict the next branch, thus reducing loop cycles from 4-3 cycles to 1 cycle per loop. The dual instruction feature allows the core to execute two instructions simultaneously, regardless of the order of the instructions, thereby increasing instruction throughput. Cortex-M7 System Cache The STM32F7 integrates the Cortex-M7, which features a Level 1 cache (L1-cache), which is divided into two caches: data cache (D-cache) and instruction cache (I-cache), which enables a Harvard architecture with optimal performance. These caches make it possible to achieve zero wait states even at high frequencies. By default, the instruction and data caches are disabled. The ARM CMSIS
library provides two functions to enable data and instruction caches: SCB_EnableICache()
is used to enable the instruction cache SCB_EnableDCache()
is used to enable the data cache Cortex-M7 Bus Interface Cortex
-M7
has five interfaces: AXIM
、 ITCM
、 DTCM
、 AHBS
andAHBP
. AXI bus interface AXI
is an advanced extensible interface. Cortex
-M7
implementsAXIM AMBA4
, which is a64
bit wide interface to achieve greater instruction fetch and data load bandwidth. If caching is enabled, any access not to theTCM
orAHBP
interface is handled by the appropriate cache controller. The user needs to take into account that not all memory areas can be cached, depending on their type. Memory areas with shared memory, device or strong order type cannot be cached. Only typical non-shared memories can be cached. The TCM bus interface is used as a tightly coupled TCM memory to provide connectivity from the core to the internal RAM memory. The TCM interface has a Harvard architecture, so there is an ITCM (instruction TCM) and DTCM (data TCM) interface. The ITCM is a 64-bit memory interface, whileThe DTCM is divided into two 32-bit ports: D0TCM and D1TCM. AHBS Bus Interface The Cortex-M7 AHBS (AHB Slave) is a 32-bit wide interface that provides the system with access to the ITCM, D1TCM, and D0TCM. However, in the STM32F7 architecture, the AHBS only allows data transfers to and from the DTCM-RAM. The ITCM bus is not accessible on the AHBS, so data transfers between DMA and ITCM RAM are not supported. For data transfers between DMA and the -Flash memory on the ITCM interface, all transfers are forced to go through the AHB bus. The AHBS interface can be used when the core is in sleep state, so DMA transfers can be performed in low-power mode.
AHBP bus interface The AHBP interface (AHB peripherals) is a single 32-bit wide interface dedicated to the connection between the CPU and peripherals. It is used only for data access. Instruction fetches are never performed on this interface. In the STM32F7 architecture, this bus connects the AHBP peripheral bus of the Cortex-M7 core to the AHB bus matrix. The bus is connected to the AHB1, AHB2, APB1 and APB2 peripherals. For a detailed introduction to the M7 core, please refer to:
AN4667 STM32F7 系列系统架构和性能.pdf
(570.71 KB, downloads: 18)
Click here to view the official resources of the STM32F769I development board.