a
SUMMARY
High performance 32-bit/40-bit floating-point processor
optimized for professional audio processing
At 333 MHz/2 GFLOPs, with unique audio centric peripherals
such as the digital audio interface that includes a high-pre-
cision 8-channel asynchronous sample rate converter
among others, the ADSP-21364 SHARC processor is ideal
for applications that require industry leading equalization,
reverberation and other effects processing
Single-instruction, multiple-data (SIMD) computational
architecture
Two 32-bit IEEE floating-point/32-bit fixed-point/40-bit
extended precision floating-point computational units,
each with a multiplier, ALU, shifter, and register file
SHARC
®
Processor
ADSP-21364
On-chip memory—3M bit of on-chip SRAM and a dedicated
4M bit of on-chip mask-programmable ROM
Code compatible with all other members of the SHARC family
The ADSP-21364 is available with a 333 MHz core instruction
rate and unique audiocentric peripherals such as the digi-
tal audio interface, S/PDIF transceiver, serial ports, 8-
channel asynchronous sample rate converter, precision
clock generators, and more. For complete ordering infor-
mation, see
Ordering Guide on Page 53.
CORE PROCESSOR
TIMER
INSTRUCTION
CACHE
32 X 48-BIT
4 BLOCKS OF ON-CHIP MEMORY
BLOCK 0
SRAM
1M BIT
BLOCK 1
SRAM
1M BIT
BLOCK 2
SRAM
0.5M BIT
BLOCK 3
SRAM
0.5M BIT
ROM
2M BIT
ROM
2M BIT
DAG1
8X4X32
DAG2
8X4X32
PROGRAM
SEQUENCER
ADDR
DATA
ADDR
DATA
ADDR
DATA
ADDR
DATA
PM ADDRESS BUS
DM ADDRESS BUS
PM DATA BUS
32
32
64
DM DATA BUS
64
IOA
IOD
IOA
IOD
IOA
IOD
IOA
IOD
PX REGISTER
PROCESSING
ELEMENT
(PEX)
PROCESSING
ELEMENT
(PEY)
IOP REGISTERS
(MEMORY MAPPED)
SPI
SPORTS
IDP
PCG
TIMERS
SRC
SPDIF
SIGNAL
ROUTING
UNIT
6
JTAG TEST AND EMULATION
I/O PROCESSOR
AND PERIPHERALS
S
Figure 1. Functional Block Diagram—Processor Core
SHARC and the SHARC logo are registered trademarks of Analog Devices, Inc.
SEE ADSP-21364 MEMORY
AND I/O INTERFACE FEATURES
SECTION FOR DETAILS
Rev. 0
Information furnished by Analog Devices is believed to be accurate and reliable.
However, no responsibility is assumed by Analog Devices for its use, nor for any
infringements of patents or other rights of third parties that may result from its use.
Specifications subject to change without notice. No license is granted by implication
or otherwise under any patent or patent rights of Analog Devices. Trademarks and
registered trademarks are the property of their respective owners.
One Technology Way, P.O. Box 9106, Norwood, MA 02062-9106 U.S.A.
Tel: 781.329.4700
www.analog.com
Fax: 781.326.3113
© 2005 Analog Devices, Inc. All rights reserved.
ADSP-21364
KEY FEATURES—PROCESSOR CORE
At 333 MHz (3.0 ns) core instruction rate, the ADSP-21364
performs 2 GFLOPS/666 MMACS
3M bit on-chip SRAM (1M bit in blocks 0 and 1, and 0.50M bit
in blocks 2 and 3) for simultaneous access by core proces-
sor and DMA
Dual data address generators (DAGs) with modulo and bit-
reverse addressing
4M bit on-chip, single-ported mask-programmable ROM (2M
bit in block 0 and 2M bit in block 1)
Zero-overhead looping with single-cycle loop setup, provid-
ing efficient program sequencing
Single-instruction multiple-data (SIMD) architecture
provides:
Two computational processing elements
Concurrent execution
Code compatibility with other SHARC family members at
the assembly level
Parallelism in buses and computational units allows single
cycle execution (with or without SIMD) of a multiply or
ALU operation, a dual memory read or write, and an
instruction fetch
Transfers between memory and core at a sustained 5.4
Gbytes/s bandwidth at 333 MHz core instruction rate
Up to 12 TDM stream support, each with 128 channels per
frame
Companding selection on a per channel basis in TDM mode
Input data port provides an additional input path to the
SHARC core, configurable as eight channels of serial data
or seven channels of serial data and up to a 20-bit wide
parallel data channel
Signal routing unit provides configurable and flexible con-
nections between all DAI components–six serial ports, two
precision clock generators, an input data port with a data
acquisition port, one SPI port, eight channels of asynchro-
nous sample rate converters, three timers, 10 interrupts,
six flag inputs, six flag outputs, and 20 SRU I/O pins
(DAI_Px)
Two serial peripheral interfaces (SPI): primary on dedicated
pins, secondary on DAI pins provide:
Master or slave serial boot through primary SPI
Full-duplex operation
Master slave mode multimaster support
Open drain outputs
Programmable baud rates, clock polarities and phases
3 Muxed Flag/IRQ lines
1 Muxed Flag/Timer expired line
DEDICATED AUDIO COMPONENTS
S/PDIF-compatible digital audio receiver/transmitter
supports:
EIAJ CP-340 (CP-1201), IEC-958, AES/EBU standards
Left-justified, I
2
S or right-justified serial data input with
16-, 18-, 20- or 24-bit word widths (transmitter)
Two channel mode and single channel double frequency
(SCDF) mode
Four independent asynchronous sample rate converters
(SRC). Each converter has separate serial input and output
ports, a deemphasis filter providing up to –140dB SNR per-
formance, stereo sample rate converter (SRC) and supports
left-justified, I
2
S, TDM, and right-justified modes and 24-,
20-, 18-, and 16- audio data word lengths
Pulse-width modulation provides:
16 PWM outputs configured as four groups of four outputs
Supports center-aligned or edge-aligned PWM waveforms
Can generate complementary signals on two outputs in
paired mode or independent signals in nonpaired mode
PLL has a wide variety of software and hardware multi-
plier/divider ratios
Dual voltage: 3.3 V I/O, 1.2 V, or 1.0 V core
Available in 136-ball BGA and 144-lead LQFP Packages
INPUT/OUTPUT FEATURES
DMA controller supports:
25 DMA channels for transfers between ADSP-21364 internal
memory and a variety of peripherals
32-bit DMA transfers at peripheral clock speed, in parallel
with full-speed processor execution
Asynchronous parallel port provides access to asynchronous
external memory
16 multiplexed address/data lines support 24-bit address
external address range with 8-bit data or 16-bit address
external address range with 16-bit data
55 Mbyte per sec transfer rate
External memory access in a dedicated DMA channel
8-bit to 32-bit and 16-bit to 32-bit packing options
Programmable data cycle duration: 2 to 31 CCLK
Digital audio interface (DAI) includes six serial ports, two pre-
cision clock generators, an input data port, three timers,
eight-channel asynchronous sample rate converter, and a
signal routing unit
Six dual data line serial ports that operate at up to 50M bit/s
on each data line—each has a clock, frame sync, and two
data lines that can be configured as either a receiver or
transmitter pair
Left-justified sample pair and I
2
S support, programmable
direction for up to 24 simultaneous receive or transmit
channels using two I
2
S-compatible stereo devices per
serial port
TDM support for telecommunications interfaces including
128 TDM channel support for newer telephony interfaces
such as H.100/H.110
Rev. 0 | Page 2 of 56 | October 2005
ADSP-21364
CONTENTS
Summary ................................................................1
Key Features—Processor Core ..................................2
Input/Output Features ............................................2
Dedicated Audio Components ..................................2
General Description ..................................................4
ADSP-21364 Family Core Architecture .......................4
ADSP-21364 Memory and I/O Interface Features ..........6
Development Tools ................................................9
Additional Information ......................................... 10
Pin Function Descriptions ........................................ 11
Address Data Pins as Flags ..................................... 14
Address Data Modes ............................................. 14
Boot Modes ........................................................ 14
Core Instruction Rate to CLKIN Ratio Modes ............. 14
ADSP-21364 Specifications ....................................... 15
Recommended Operating Conditions ....................... 15
Electrical Characteristics ........................................ 15
Maximum Power Dissipation ................................. 16
Absolute Maximum Ratings ................................... 16
ESD Sensitivity .................................................... 16
Timing Specifications ........................................... 17
Output Drive Currents .......................................... 45
Test Conditions ................................................... 45
Capacitive Loading ............................................... 45
Thermal Characteristics ........................................ 46
136-Ball BGA Pin Configurations ............................... 47
144-Lead LQFP Pin Configurations ............................. 50
Outline Dimensions ................................................ 51
Ordering Guide ...................................................... 53
REVISION HISTORY
10/05—Revision 0: Initial Version
Rev. 0 | Page 3 of 56 |
October 2005
ADSP-21364
GENERAL DESCRIPTION
The ADSP-21364 SHARC processor is a member of the SIMD
SHARC family of DSPs that feature Analog Devices’ Super Har-
vard Architecture. The ADSP-21364 is source code compatible
with the ADSP-2126x, and ADSP-2116x DSPs as well as with
first generation ADSP-2106x SHARC processors in SISD (sin-
gle-instruction, single-data) mode. The ADSP-21364 is a 32-
bit/40-bit floating-point processor optimized for professional
audio applications with a large on-chip SRAM, multiple internal
buses to eliminate I/O bottlenecks, and an innovative digital
audio interface (DAI).
As shown in the functional block diagram
on Page 1,
the
ADSP-21364 uses two computational units to deliver a signifi-
cant performance increase over previous SHARC processors on
a range of signal processing algorithms. Fabricated in a state-of-
the-art, high speed, CMOS process, the ADSP-21364 processor
achieves an instruction cycle time of 3.0 ns at 333 MHz. With its
SIMD computational hardware, the ADSP-21364 can perform 2
GFLOPS running at 333 MHz.
Table 1
shows performance benchmarks for the ADSP-21364
running at 333 MHz.
Table 1. Benchmarks at 333 MHz
Speed
(at 333 MHz)
1024 Point Complex FFT (Radix 4, with reversal) 27.9
μs
FIR Filter (per tap)
1
1.5 ns
1
IIR Filter (per biquad)
6.0 ns
Matrix Multiply (pipelined)
[3×3] × [3×1]
13.5 ns
[4×4] × [4×1]
23.9 ns
Divide (y/x)
10.5 ns
Inverse Square Root
16.3 ns
1
• On-chip mask-programmable ROM (4M bit)
• 8-bit or 16-bit parallel port that supports interfaces to off-
chip memory peripherals
• JTAG test access port
The block diagram of the ADSP-21364
on Page 7,
illustrates the
following architectural features:
• DMA controller
• Six full duplex serial ports
• Two SPI-compatible interface ports—primary on dedi-
cated pins secondary on DAI pins
• Digital audio interface that includes two precision clock
generators (PCG), an input data port (IDP), an S/PDIF
receiver/transmitter, eight channels asynchronous sample
rate converters, six serial ports, eight serial interfaces, a 20-
bit parallel input port, 10 interrupts, six flag outputs, six
flag inputs, three timers, and a flexible signal routing unit
(SRU)
Figure 2 on Page 5
shows one sample configuration of a SPORT
using the precision clock generators to interface with an I
2
S
ADC and an I
2
S DAC with a much lower jitter clock than the
serial port would generate itself. Many other SRU configura-
tions are possible.
Benchmark Algorithm
ADSP-21364 FAMILY CORE ARCHITECTURE
The ADSP-21364 is code compatible at the assembly level with
the ADSP-2126x, ADSP-21160 and ADSP-21161, and with the
first generation ADSP-2106x SHARC DSPs. The ADSP-21364
shares architectural features with the ADSP-2126x and
ADSP-2116x SIMD SHARC processors, as detailed in the fol-
lowing sections.
SIMD Computational Engine
The ADSP-21364 contains two computational processing ele-
ments that operate as a single-instruction multiple-data (SIMD)
engine. The processing elements are referred to as PEX and PEY
and each contains an ALU, multiplier, shifter, and register file.
PEX is always active, and PEY may be enabled by setting the
PEYEN mode bit in the MODE1 register. When this mode is
enabled, the same instruction is executed in both processing ele-
ments, but each processing element operates on different data.
This architecture is efficient at executing math intensive signal
processing algorithms.
Entering SIMD mode also has an effect on the way data is trans-
ferred between memory and the processing elements. When in
SIMD mode, twice the data bandwidth is required to sustain
computational operation in the processing elements. Because of
this requirement, entering SIMD mode also doubles the band-
width between memory and the processing elements. When
using the DAGs to transfer data in SIMD mode, two data values
are transferred with each access of memory or the register file.
Assumes two files in multichannel SIMD mode
The ADSP-21364 continues SHARC’s industry-leading stan-
dards of integration for DSPs, combining a high performance
32-bit DSP core with integrated, on-chip system features.
The block diagram of the ADSP-21364
on Page 1,
illustrates the
following architectural features:
• Two processing elements, each of which comprises an
ALU, multiplier, shifter and data register file
• Data address generators (DAG1, DAG2)
• Program sequencer with instruction cache
• PM and DM buses capable of supporting four 32-bit data
transfers between memory and the core at every core pro-
cessor cycle
• Three programmable interval timers with PWM genera-
tion, PWM capture/pulse width measurement, and
external event counter capabilities
• On-chip SRAM (3M bit)
Rev. 0 | Page 4 of 56 | October 2005
ADSP-21364
ADSP-21364
CLKOUT
CLOCK
2
2
3
CLKIN
XTAL
CLK_CFG1-0
BOOTCFG1-0
FLAG3-1
RD
WR
FLAG0
ADC
(OPTIONAL)
CLK
FS
SDAT
ALE
AD15-0
LATCH
ADDR
DATA
OE
WE
CS
PARALLEL
PORT
RAM, ROM
BOO T ROM
I /O DEVI CE
CONTROL
DATA
ADDRESS
DAI_P1
DAI_P2
DAI_P3
SRU
DAI_P18
DAI_P19
DAI_P20
SCLK0
SFS0
SD0A
SD0B
SPORT0-5
TIMERS
SPDI F
SRC
IDP
SPI
DAC
(OPTIONAL)
CLK
FS
SDAT
CLK
FS
DAI
RESET
PCGA
PCG B
JTAG
6
Figure 2. ADSP-21364 System Sample Configuration
Independent, Parallel Computation Units
Within each processing element is a set of computational units.
The computational units consist of an arithmetic/logic unit
(ALU), multiplier, and shifter. These units perform all opera-
tions in a single cycle. The three units within each processing
element are arranged in parallel, maximizing computational
throughput. Single multifunction instructions execute parallel
ALU and multiplier operations. In SIMD mode, the parallel
ALU and multiplier operations occur in both processing ele-
ments. These computation units support IEEE 32-bit, single-
precision floating-point, 40-bit, extended-precision floating-
point, and 32-bit, fixed-point data formats.
Single-Cycle Fetch of Instruction and Four Operands
The ADSP-21364 features an enhanced Harvard architecture in
which the data memory (DM) bus transfers data and the pro-
gram memory (PM) bus transfers both instructions and data
(see
Figure 1 on Page 1).
With the ADSP-21364’s separate pro-
gram and data memory buses and on-chip instruction cache,
the processor can simultaneously fetch four operands (two over
each data bus) and one instruction (from the cache), all in a sin-
gle cycle.
Instruction Cache
The ADSP-21364 includes an on-chip instruction cache that
enables three-bus operation for fetching an instruction and four
data values. The cache is selective—only the instructions whose
fetches conflict with PM bus data accesses are cached. This
cache allows full-speed execution of core, looped operations
such as digital filter multiply-accumulates, and FFT butterfly
processing.
Data Register File
A general-purpose data register file is contained in each
processing element. The register files transfer data between the
computation units and the data buses, and store intermediate
results. These 10-port, 32-register (16 primary, 16 secondary)
register files, combined with the ADSP-2136x enhanced Har-
vard architecture, allow unconstrained data flow between
computation units and internal memory. The registers in PEX
are referred to as R0–R15 and in PEY as S0–S15.
Data Address Generators with Zero-Overhead Hardware
Circular Buffer Support
The ADSP-21364’s two data address generators (DAGs) are
used for indirect addressing and implementing circular data
buffers in hardware. Circular buffers allow efficient program-
ming of delay lines and other data structures required in digital
Rev. 0 | Page 5 of 56 |
October 2005