RM5271™ Superscalar
Microprocessor
with External Cache Interface
FEATURES
• Dual Issue superscalar microprocessor
—
200, 225, 250, 266, 300, 350 MHz operating frequencies
—
420 Dhrystone 2.1 MIPS maximum
—
SPECInt95 7.3, SPECfp95 8.3 maximum
• High-performance system interface
—
64-bitmultiplexed system address/data bus for optimum
price/performance with up to 125MHz operation frequency
—
High-performance write protocols to maximize uncached
write bandwidth
—
Processor clock multipliers 2, 2.5, 3, 3.5, 4, 4.5, 5, 6, 7, 8, 9
—
IEEE 1149.1 JTAG boundary scan
• Integrated on-chip caches
—
32KB instruction and 32KB data - 2-way set associative
—
Virtually indexed, physically tagged
—
Write-back and write-through on per page basis
—
Pipeline restart on first double for data cache misses
• Integrated secondary cache controller (R5000 compatible)
—
Supports 512K or 2MByte block write-through secondary
• Integrated memory management unit
—
Fully associative joint TLB (shared by I and D translations)
—
48 dual entries map 96 pages
—
Variable page size (4KB to 16MB in 4x increments)
• High-performance floating point unit - up to 700 MFLOPS
—
Single cycle repeat rate for common single precision opera-
tions and some double precision operations
—
Two cycle repeat rate for double precision multiply and dou-
ble precision combined multiply-add operations
—
Single cycle repeat rate for single precision combined multi-
ply-add operation
• MIPS IV instruction set
—
Floating point multiply-add instruction increases perfor-
mance in signal processing and graphics applications
—
Conditional moves to reduce branch frequency
—
Index address modes (register + register)
• Embedded application enhancements
—
Specialized DSP integer Multiply-Accumulate instruction and
3 operand multiply instruction
—
I and D cache locking by set
—
Optional dedicated exception vector for interrupts
• Fully static CMOS design with power down logic
—
Standby reduced power mode with WAIT instruction
—
3.5 Watts typical power @ 200MHz
—
2.5V core with 3.3V IO’s
• 304-pin SBGA package (31x31mm)
BLOCK DIAGRAM
Extenal Cache Controller
Primary Data Cache
2-way Set Associative
DTag
DTLB
ITag
ITLB
Primary Instruction Cache
2-way Set Associative
A/D Bus
Pad Bus
Store Buffer
Write Buffer
Read Buffer
Pad Buffer
Address Buffer
Instruction Dispatch Unit
FP
Instruction
Register
FP Bus
Integer Bus
Integer
Instruction
Register
D Bus
Floating-Point Control
Floating-Point
Load/Align
Floating-Point
Register File
Packer/Unpacker
Joint TLB
DVA
Load Aligner
Integer Address/Adder
System/Memory
Control
PC Incrementer
FA Bus
IVA
Shifter/Store Aligner
Logic Unit
Floating-Point
MultAdd, Add, Sub,
Cvt, Div, Sqrt
Branch PC Adder
ITLB Virtual
Program Counter
DTLB Virtual
PLL/Clocks
Int Mult, Div, Madd
QUANTUM EFFECT DEVICES, INC., 3255-3 SCOTT BLVD., SUITE 200, SANTA CLARA, CA 95054
PHONE
Integer Control
Coprocessor 0
Integer Register File
1
408.565.0300
FAX
408.565.0335
WEB
www.qedinc.com
RM5271™ Superscalar Microprocessor, with External Cache Interface
DESCRIPTION
The QED RM5271 is a highly integrated superscalar micro-
processor that implements a superset of the MIPS IV
Instruction Set Architecture (ISA). It has a high-perfor-
mance 64-bit integer unit, a high-throughput, fully pipelined
64-bit floating point unit, an operating system friendly mem-
ory management unit with a 48-entry fully associative TLB,
a 32 KByte 2-way set associative instruction cache, a 32
KByte 2-way set associative data cache, and a high-perfor-
mance 64-bit system interface with support for an optional
external secondary cache. The RM5271 can issue both an
integer and a floating point instruction in the same cycle.
The RM5271 is ideally suited for high-end embedded con-
trol applications such as internetworking, high-performance
image manipulation, high-speed printing, and 3-D visual-
ization.The RM5271 is also applicable to the low end work-
station market where its balanced integer and floating-point
performance and direct support for a large secondary
cache (up to 2MB) provide outstanding price/performance
HARDWARE OVERVIEW
The RM5271 offers a high-level of integration targeted at
high-performance embedded applications. The key ele-
ments of the RM5271 are briefly described below.
pose registers, two special purpose registers for integer
multiplication and division, a program counter, and no con-
dition code bits. Figure 1 shows the user visible state.
Superscalar Dispatch
The RM5271 has an efficient asymmetric superscalar dis-
patch unit which allows it to issue an integer instruction and
a floating-point computation instruction simultaneously.
With respect to superscalar issue, integer instructions
include alu, branch, load/store, and floating-point load/
store, while floating-point computation instructions include
floating-point add, subtract, combined multiply-add, con-
verts, etc. In combination with its high-throughput fully pipe-
lined floating-point execution unit, the superscalar
capability of the RM5271 provides unparalleled price/per-
formance in computationally intensive embedded applica-
tions.
Pipeline
For integer operations, loads, stores, and other non-float-
ing-point operations, the RM5271 uses the simple 5-stage
pipeline also found in the rest of the RM5200 Family,
R4600, R4700, and R5000. In addition to this standard
pipeline, the RM5271 uses an extended 7-stage pipeline for
floating-point operations. Like the RM5270 and R5000, the
RM5271 does virtual to physical translation in parallel with
cache access.
Figure 2 on page 3 shows the RM5271 integer pipeline. As
illustrated in the figure, up to five integer instructions can be
executing simultaneously.
CPU Registers
Like all MIPS ISA processors, the RM5271 CPU has a sim-
ple, clean user visible state consisting of 32 general pur-
General Purpose Registers
63
0
r1
r2
•
•
•
•
r29
r30
r31
0
Integer Unit
As part of the RM5200 Family, the RM5271 implements the
MIPS IV Instruction Set Architecture, and is therefore fully
Multiply/Divide Registers
63
HI
63
LO
Program Counter
63
PC
0
0
0
Figure 1 CPU Registers
2
QUANTUM EFFECT DEVICES, INC., 3255-3 SCOTT BLVD., SUITE 200, SANTA CLARA, CA 95054
PHONE
408.565.0300
FAX
408.565.0335
WEB
www.qedinc.com
RM5271™ Superscalar Microprocessor, with External Cache Interface
I0
I1
I2
I3
I4
1I
2I
1R
1I
2R
2I
1A
1R
1I
2A
2R
2I
1D
1A
1R
1I
2D
2A
2R
2I
1W
1D
1A
1R
1I
2W
2D
2A
2R
2I
1W
1D
1A
1R
2W
2D
2A
2R
1W
1D
1A
2W
2D
2A
1W
1D
2W
2D
1W
2W
one cycle
1I-1R: Instruction cache access
2I: Instruction virtual to physical address translation
2R: Register file read, Bypass calculation, Instruction decode, Branch address calculation
1A: Issue or slip decision, Branch decision
1A: Data virtual address calculation
1A-2A: Integer add, logical, shift
2A: Store Align
2A-2D: Data cache access and load align
1D: Data virtual to physical address translation
2W: Register file write
Figure 2 Pipeline
upward compatible with applications that run on processors
implementing the earlier generation MIPS I-III instruction
sets. Additionally, the RM5271 includes two implementation
specific instructions not found in the baseline MIPS IV ISA
but that are useful in the embedded market place.
Described in detail in a later section, these instructions are
integer multiply-accumulate and 3-operand integer multiply.
The RM5271 integer unit includes thirty-two general pur-
pose 64-bit registers, a load/store architecture with single
cycle ALU operations (add, sub, logical, shift) and an
autonomous multiply/divide unit. Additional register
resources include: the
HI/LO
result registers for the two-
operand integer multiply/divide operations, and the pro-
gram counter (PC).
Integer Multiply/Divide
The RM5271 has a dedicated integer multiply/divide unit
optimized for high-speed multiply and multiply-accumulate
operations. Table 1 shows the performance of the multiply/
divide unit on each operation.
Table 1:
Integer Multiply/Divide Operations
Operand
Size
16 bit
32 bit
16 bit
32 bit
DMULT,
DMULTU
DIV, DIVD
DDIV,
DDIVU
any
any
any
Repeat
Rate
2
3
2
3
6
36
68
0
0
1
2
0
0
0
Stall
Cycles
Opcode
MULT/U,
MAD/U
MUL
Latency
3
4
3
4
7
36
68
Register File
The RM5271 has thirty-two general purpose registers with
register location 0 (r0) hard wired to a zero value. These
registers are used for scalar integer operations and
address calculation. The register file has two read ports
and one write port and is fully bypassed to minimize opera-
tion latency in the pipeline.
ALU
The RM5271 ALU consists of the integer adder/subtractor,
the logic unit, and the shifter. The adder performs address
calculations in addition to arithmetic operations, the logic
unit performs all logical and zero shift data moves, and the
shifter performs shifts and store alignment operations.
Each of these units is optimized to perform all operations in
a single processor cycle.
The baseline MIPS IV ISA specifies that the results of a
multiply or divide operation be placed in the
Hi
and
Lo
reg-
isters. These values can then be transferred to the general
purpose register file using the Move-from-Hi and Move-
from-Lo (MFHI/MFLO) instructions.
In addition to the baseline MIPS IV integer multiply instruc-
tions, the RM5271 also implements the multiply instruction,
MUL.
This instruction specifies that the multiply result go
directly to the integer register file rather than the
Lo
regis-
3
408.565.0335
WEB
QUANTUM EFFECT DEVICES, INC., 3255-3 SCOTT BLVD., SUITE 200, SANTA CLARA, CA 95054
PHONE
408.565.0300
FAX
www.qedinc.com
RM5271™ Superscalar Microprocessor, with External Cache Interface
ter. The portion of the multiply that would have normally
gone into the
Hi
register is discarded. For applications
where it is known that the high half of the multiply result is
not required, using the
MUL
instruction eliminates the
necessity of executing an explicit
MFLO
instruction.
Also included in the RM5271 is the multiply-add instruction,
MAD.
This instruction multiplies two operands and adds
the resulting product to the current contents of the
Hi
and
Lo
registers. The multiply-accumulate operation is the core
primitive of almost all signal processing algorithms allowing
the RM5271 to eliminate the need for a separate DSP
engine in many embedded applications.
By pipelining the multiply-accumulate function and dynami-
cally determining the size of the input operands, the
RM5271 is able to maximize throughput while still using an
area efficient implementation.
Table 2:
Floating-Point Instruction Cycles
Latency
4
4
4/5
4/5
4/5
21/36
21/36
21/36
38/68
4
6
6
4
4
4
4
4
4
4
1
1
1
1
1
Repeat Rate
1
1
1/2
1/2
1/2
19/34
19/34
19/34
36/66
1
3
3
1
1
1
1
1
1
1
1
1
1
1
1
Operation
fadd
fsub
fmult
fmadd
fmsub
fdiv
fsqrt
frecip
frsqrt
fcvt.s.d
fcvt.s.w
fcvt.s.l
fcvt.d.s
fcvt.d.w
fcvt.d.l
fcvt.w.s
fcvt.w.d
fcvt.l.s
fcvt.l.d
fcmp
fmov
fmovc
fabs
fneg
Floating-Point Co-Processor
The RM5271 incorporates a high-performance fully pipe-
lined floating-point co-processor which includes a floating-
point register file and autonomous execution units for multi-
ply/add/convert and divide/square root. The floating-point
coprocessor is a tightly coupled co-execution unit, decod-
ing and executing instructions in parallel with, and in the
case of floating-point loads and stores, in cooperation with
the integer unit. As described earlier, the superscalar capa-
bilities of the RM5271 allow floating-point computation
instructions to issue concurrently with integer instructions.
Floating-Point Unit
The RM5271 floating-point execution unit supports single
and double precision arithmetic, as specified in the IEEE
Standard 754. The execution unit is broken into a separate
divide/square root unit and a pipelined multiply/add unit.
Overlap of divide/square root and multiply/add is sup-
ported.
The RM5271 maintains fully precise floating-point excep-
tions while allowing both overlapped and pipelined opera-
tions. Precise exceptions are extremely important in object-
oriented programming environments and highly desirable
for debugging in any environment.
The floating-point unit’s operation set includes floating-point
add, subtract, multiply, divide, square root, reciprocal, recip-
rocal square root, conditional moves, conversion between
fixed-point and floating-point format, conversion between
floating-point formats, and floating-point compare.
Table 2 gives the latencies of the floating-point instructions
in internal processor cycles.
Floating-Point General Register File
The floating-point general register file, FGR, is made up of
thirty-two 64-bit registers. With the floating-point load and
store double instructions,
LDC1
and
SDC1,
the floating-
point unit can take advantage of the 64-bit wide data cache
and issue a floating-point co-processor load or store dou-
bleword instruction in every cycle.
The floating-point control register space contains two regis-
ters; one for determining configuration and revision infor-
mation for the coprocessor and one for control and status
information. These are primarily used for diagnostic soft-
ware, exception handling, state saving and restoring, and
control of rounding modes. To support superscalar opera-
tion, the FGR has four read ports and two write ports, and
is fully bypassed to minimize operation latency in the pipe-
line. Three of the read ports and one write port are used to
support the combined multiply-add instruction while the
fourth read and second write port allows a concurrent float-
ing-point load or store.
4
QUANTUM EFFECT DEVICES, INC., 3255-3 SCOTT BLVD., SUITE 200, SANTA CLARA, CA 95054
PHONE
408.565.0300
FAX
408.565.0335
WEB
www.qedinc.com
RM5271™ Superscalar Microprocessor, with External Cache Interface
Context
4*
Count
9*
Status
12*
Index
0*
TLB
Random
1*
Wired
6*
(entries protected
from TLBWR)
0
LLAddr
17*
TagLo
28*
TagHi
29*
PRId
15*
Config
16*
ECC
26*
XContext
20*
CacheErr
27*
ErrorEPC
30*
EPC
14*
BadVAddr
8*
Compare
11*
Cause
13*
PageMask
5*
EntryHi
10*
47
EntryLo0
2*
EntryLo1
3*
Used for memory
management
* Register number
Used for exception
processing
Figure 3 CP0 Registers
System Control Co-processor (CP0)
The system control co-processor, co-processor 0 or CP0,
in the MIPS architecture is responsible for the virtual mem-
ory sub-system, the exception control system, and the
diagnostics capability of the processor. In the MIPS archi-
tecture, the system control co-processor (and thus the ker-
nel software) is implementation dependent. The RM5271
CP0 is logically identical to that of the other members of the
RM5200 F family and R5000.
The memory management unit controls the virtual memory
system page mapping. It consists of an instruction address
translation buffer, ITLB, a data address translation buffer,
DTLB, a Joint instruction and data address translation
buffer, JTLB, and co-processor registers used by the virtual
memory mapping sub-system.
cycle counting facility, to aid in cache diagnostic testing,
and to assist in data error detection.
Figure 3 shows the CP0 registers.
Virtual to Physical Address Mapping
The RM5271 provides three modes of virtual addressing:
•
•
•
user mode
supervisor mode
kernel mode
This mechanism is available to system software to provide
a secure environment for user processes. Bits in the CP0
Status
register determine which virtual addressing mode is
used. In the user mode, the RM5271 provides a single, uni-
form virtual address space of 256GB (2GB in 32-bit mode).
When operating in the kernel mode, four distinct virtual
address spaces, totalling 1024GB (4GB in 32-bit mode),
are simultaneously available and are differentiated by the
high-order bits of the virtual address.
The RM5271 processors also support a supervisor mode in
which the virtual address space is 256.5GB (2.5GB in 32-
bit mode), divided into three regions based on the high-
order bits of the virtual address.
5
System Control Co-Processor Registers
The RM5271 incorporates all system control co-processor
(CP0) registers on-chip. These registers provide the path
through which the virtual memory system’s page mapping
is examined and modified, exceptions are handled, and
operating modes are controlled (kernel vs. user mode,
interrupts enabled or disabled, cache features). In addition,
the RM5271 includes registers to implement a real-time
QUANTUM EFFECT DEVICES, INC., 3255-3 SCOTT BLVD., SUITE 200, SANTA CLARA, CA 95054
PHONE
408.565.0300
FAX
408.565.0335
WEB
www.qedinc.com