Advance Information
MPC7457EC
Rev. 4, 11/2003
MPC7457
RISC Microprocessor
Hardware Specifications
This hardware specification is primarily concerned with the PowerPC™ MPC7457; however,
unless otherwise noted, all information here also applies to the MPC7447. The MPC7457 and
MPC7447 are implementations of the PowerPC microprocessor family of reduced instruction
set computer (RISC) microprocessors. This hardware specification describes pertinent
electrical and physical characteristics of the MPC7457. For functional characteristics of the
processor, refer to the
MPC7450 RISC Microprocessor Family User’s Manual.
This hardware specification contains the following topics:
Topic
Page
Section 1.1, “Overview”
1
Section 1.2, “Features”
2
Section 1.3, “Comparison with the MPC7455, MPC7445, MPC7450, MPC7451,
and MPC7441”
7
Section 1.4, “General Parameters”
10
Section 1.5, “Electrical and Thermal Characteristics”
10
Section 1.6, “Pin Assignments”
33
Section 1.7, “Pinout Listings”
35
Section 1.8, “Package Description”
41
Section 1.9, “System Design Information”
47
Section 1.10, “Document Revision History”
61
Section 1.11, “Ordering Information”
63
To locate any published updates for this hardware specification, refer to the website at
http://www.motorola.com/semiconductors.
1.1 Overview
The MPC7457 is the fourth implementation of the fourth generation (G4) microprocessors
from Motorola. The MPC7457 implements the full PowerPC 32-bit architecture and is
targeted at networking and computing systems applications. The MPC7457 consists of a
processor core, a 512-Kbyte L2, and an internal L3 tag and controller that support a glueless
backside L3 cache through a dedicated high-bandwidth interface. The MPC7447 is identical
to the MPC7457 except that it does not support the L3 cache interface.
Features
Figure 1 shows a block diagram of the MPC7457. The core is a high-performance superscalar design
supporting a double-precision floating-point unit and a SIMD multimedia unit. The memory storage
subsystem supports the MPX bus protocol and a subset of the 60x bus protocol to main memory and other
system resources. The L3 interface supports 1, 2, or 4 Mbytes of external SRAM for L3 cache and/or private
memory data. For systems implementing 4 Mbytes of SRAM, a maximum of 2 Mbytes may be used as
cache; the remaining 2 Mbytes must be private memory.
Note that the MPC7457 is a footprint-compatible, drop-in replacement in a MPC7455 application if the core
power supply is 1.3 V.
1.2
Features
This section summarizes features of the MPC7457 implementation of the PowerPC architecture.
Major features of the MPC7457 are as follows:
•
High-performance, superscalar microprocessor
— As many as four instructions can be fetched from the instruction cache at a time.
— As many as three instructions can be dispatched to the issue queues at a time.
— As many as 12 instructions can be in the instruction queue (IQ).
— As many as 16 instructions can be at some stage of execution simultaneously.
— Single-cycle execution for most instructions
— One instruction per clock cycle throughput for most instructions
— Seven-stage pipeline control
Eleven independent execution units and three register files
— Branch processing unit (BPU) features static and dynamic branch prediction
– 128-entry (32-set, four-way set-associative) branch target instruction cache (BTIC), a cache
of branch instructions that have been encountered in branch/loop code sequences. If a target
instruction is in the BTIC, it is fetched into the instruction queue a cycle sooner than it can
be made available from the instruction cache. Typically, a fetch that hits the BTIC provides
the first four instructions in the target stream.
– 2048-entry branch history (BHT) with two bits per entry for four levels of
prediction—not-taken, strongly not-taken, taken, and strongly taken
– Up to three outstanding speculative branches
– Branch instructions that do not update the count register (CTR) or link register (LR) are
often removed from the instruction stream.
– Eight-entry link register stack to predict the target address of Branch Conditional to Link
Register (bclr) instructions
— Four integer units (IUs) that share 32 GPRs for integer operands
– Three identical IUs (IU1a, IU1b, and IU1c) can execute all integer instructions except
multiply, divide, and move to/from special-purpose register instructions
– IU2 executes miscellaneous instructions including the CR logical operations, integer
multiplication and division instructions, and move to/from special-purpose register
instructions
•
2
MPC7457 RISC Microprocessor Hardware Specifications
MOTOROLA
Additional Features
Instruction Unit
Branch Processing Unit
Fetcher
Tags
IBAT Array
BHT (2048-Entry)
Dispatch
Unit
Data MMU
SRs
(Original)
VR Issue
(4-Entry/2-Issue)
DBAT Array
GPR Issue
(6-Entry/3-Issue)
FPR Issue
(2-Entry/1-Issue)
128-Entry
DTLB
Tags
LR
BTIC (128-Entry)
CTR
Instruction Queue
(12-Word)
SRs
(Shadow)
128-Entry
ITLB
Instruction MMU
128-Bit (4 Instructions)
MOTOROLA
32-Kbyte
I Cache
32-Kbyte
D Cache
Reservation
Stations (2-Entry)
EA
Load/Store Unit
Vector Touch Engine
+ (EA Calculation)
Finished
Stores
L1 Castout
PA
FPR File
16 Rename
Buffers
Reservation
Stations (2)
Vector
Touch
Queue
GPR File
16 Rename
Buffers
Completes up
to three
instructions
per clock
VR File
16 Rename
Buffers
Integer
Unit 2
Integer
Integer
Integer
Unit 122
Unit
Unit
(3)
+++
32-Bit
32-Bit
x÷
Vector
FPU
32-Bit
128-Bit
128-Bit
Reservation
Stations (2)
Reservation
Reservation
Reservation
Station
Station
Station
Floating-
Point Unit
L1 Push
Completed
Stores
+ x÷
FPSCR
Load Miss
64-Bit
64-Bit
Vector
Integer
Unit 1
512-Kbyte Unified L2 Cache Controller
L1 Service
Queues
Line
Block 0 (32-Byte)
Block 1 (32-Byte)
Tags Status
Status
L3 Cache Controller
1
Line Block 0/1
Tags Status
L3CR
L2 Prefetch (3)
Bus Accumulator
19-Bit Address
64-Bit Data
(8-Bit Parity)
External SRAM
(1, 2, or 4 Mbytes)
System Bus Interface
Load
Queue (11)
Bus Store Queue
Castout
Queue (9)/
Push
Queue (10)
2
L2 Store Queue (L2SQ)
Snoop Push/
L1 Castouts
Interventions
(4)
Bus Accumulator
36-Bit
Address Bus
64-Bit
Data Bus
• Time Base Counter/Decrementer
• Clock Multiplier
• JTAG/COP Interface
• Thermal/Power Management
• Performance Monitor
Completion Unit
96-Bit (3 Instructions)
Completion Queue
(16-Entry)
Reservation Reservation Reservation Reservation
Station
Station
Station
Station
Figure 1. MPC7457 Block Diagram
Vector
Permute
Unit
Vector
Integer
Unit 2
Memory Subsystem
MPC7457 RISC Microprocessor Hardware Specifications
L1 Store Queue
(LSQ)
L1 Load Queue (LLQ)
L1 Load Miss (5)
Instruction Fetch (2)
Cacheable Store Request(1)
Features
Notes:
1. The L3 cache interface is not implemented on the MPC7447.
2. The Castout Queue and Push Queue share resources such for a combined total of 10 entries.
The Castout Queue itself is limited to 9 entries, ensuring 1 entry will be available for a push.
3
Features
•
•
•
— Five-stage FPU and a 32-entry FPR file
– Fully IEEE 754-1985-compliant FPU for both single- and double-precision operations
– Supports non-IEEE mode for time-critical operations
– Hardware support for denormalized numbers
– Thirty-two 64-bit FPRs for single- or double-precision operands
— Four vector units and 32-entry vector register file (VRs)
– Vector permute unit (VPU)
– Vector integer unit 1 (VIU1) handles short-latency AltiVec™ integer instructions, such as
vector add instructions (for example,
vaddsbs, vaddshs,
and
vaddsws)
– Vector integer unit 2 (VIU2) handles longer-latency AltiVec integer instructions, such as
vector multiply add instructions (for example,
vmhaddshs, vmhraddshs,
and
vmladduhm)
– Vector floating-point unit (VFPU)
— Three-stage load/store unit (LSU)
– Supports integer, floating-point, and vector instruction load/store traffic
– Four-entry vector touch queue (VTQ) supports all four architected AltiVec data stream
operations
– Three-cycle GPR and AltiVec load latency (byte, half-word, word, vector) with one-cycle
throughput
– Four-cycle FPR load latency (single, double) with one-cycle throughput
– No additional delay for misaligned access within double-word boundary
– Dedicated adder calculates effective addresses (EAs)
– Supports store gathering
– Performs alignment, normalization, and precision conversion for floating-point data
– Executes cache control and TLB instructions
– Performs alignment, zero padding, and sign extension for integer data
– Supports hits under misses (multiple outstanding misses)
– Supports both big- and little-endian modes, including misaligned little-endian accesses
Three issue queues FIQ, VIQ, and GIQ can accept as many as one, two, and three instructions,
respectively, in a cycle. Instruction dispatch requires the following:
— Instructions can be dispatched only from the three lowest IQ entries—IQ0, IQ1, and IQ2
— A maximum of three instructions can be dispatched to the issue queues per clock cycle
— Space must be available in the CQ for an instruction to dispatch (this includes instructions that
are assigned a space in the CQ but not in an issue queue)
Rename buffers
— 16 GPR rename buffers
— 16 FPR rename buffers
— 16 VR rename buffers
Dispatch unit
— Decode/dispatch stage fully decodes each instruction
4
MPC7457 RISC Microprocessor Hardware Specifications
MOTOROLA
Features
•
•
•
•
Completion unit
— The completion unit retires an instruction from the 16-entry completion queue (CQ) when all
instructions ahead of it have been completed, the instruction has finished execution, and no
exceptions are pending.
— Guarantees sequential programming model (precise exception model)
— Monitors all dispatched instructions and retires them in order
— Tracks unresolved branches and flushes instructions after a mispredicted branch
— Retires as many as three instructions per clock cycle
Separate on-chip L1 instruction and data caches (Harvard architecture)
— 32-Kbyte, eight-way set-associative instruction and data caches
— Pseudo least-recently-used (PLRU) replacement algorithm
— 32-byte (eight-word) L1 cache block
— Physically indexed/physical tags
— Cache write-back or write-through operation programmable on a per-page or per-block basis
— Instruction cache can provide four instructions per clock cycle; data cache can provide four
words per clock cycle
— Caches can be disabled in software.
— Caches can be locked in software.
— MESI data cache coherency maintained in hardware
— Separate copy of data cache tags for efficient snooping
— Parity support on cache and tags
— No snooping of instruction cache except for
icbi
instruction
— Data cache supports AltiVec LRU and transient instructions
— Critical double- and/or quad-word forwarding is performed as needed. Critical quad-word
forwarding is used for AltiVec loads and instruction fetches. Other accesses use critical
double-word forwarding.
Level 2 (L2) cache interface
— On-chip, 512-Kbyte, eight-way set-associative unified instruction and data cache
— Fully pipelined to provide 32 bytes per clock cycle to the L1 caches
— A total nine-cycle load latency for an L1 data cache miss that hits in L2
— PLRU replacement algorithm
— Cache write-back or write-through operation programmable on a per-page or per-block basis
— 64-byte, two-sectored line size
— Parity support on cache
Level 3 (L3) cache interface (not implemented on MPC7447)
— Provides critical double-word forwarding to the requesting unit
— Internal L3 cache controller and tags
— External data SRAMs
— Support for 1-, 2-, and 4-Mbyte (MB) total SRAM space
— Support for 1- or 2-MB of cache space
— Cache write-back or write-through operation programmable on a per-page or per-block basis
MOTOROLA
MPC7457 RISC Microprocessor Hardware Specifications
5