PDSP16488A MA
PDSP16488A MA
Single Chip 2D Convolver with Integral Line Delays
Supersedes January 1997 version, DS3742 - 3.1
DS3742 -
5.0 November
2000
The PDSP16488A is a fully integrated, application spe-
cific, image processing device. It performs a two dimensional
convolution between the pixels within a video window and a
set of stored coefficients. An internal multiplier accumulator
array can be multi-cycled at double or quadruple the pixel
clock rate. This then gives the window size options listed in
Table 1.
An internal 32k bit RAM can be configured to provide
either four or eight line delays. The length of each delay can
be programmed to the users requirement, up to a maximum of
1024 pixels per line. The line delays are arranged in two
groups,which may be internally connected in series or may be
configured to accept separate pixel inputs. This allows inter-
laced video or frame to frame operations to be supported.
The 8 bit coefficients are also stored internally and can
be downloaded from a host computer or from an EPROM. No
additional logic is required to support the EPROM and a single
device can support up to 16 convolvers.
The PDSP16488A contains an expansion adder and
delay network which allows several devices to be cascaded.
Convolvers with larger windows can then be fabricated as
shown in Table 2.
Intermediate 32 bit precision is provided to avoid any
danger of overflow, but the final result will not normally occupy
all bits. The PDSP16488A thus provides a multiplier in the
output path, which allows the user to align the result to the
most significant end of the 32 bit word.
FEATURES
I
I
I
I
I
I
I
I
I
The PDSP16488A is a fully compatible replacement
for the PDSP16488
8 or 16 bit pixels with rates up to 40 MHz
Window sizes up to 8 x 8 with a single device
Eight internal line delays
Supports interlace and frame to frame operations
Coefficients supplied from an EPROM or remote host
Expandable in both X and Y for larger windows
Gain control and pixel output manipulation
132 pin QFP
A
B
C
JAN1997
D
Rev
Date
NOTE
MAR 1993 JUL 1996
Polyimide is used as an inter-layer dielectric and as
glassivation.
Polymeric material is also used for die attach which according
to the requirement in paragraph 1.2.1.b. (2) precludes
catagorising this device as fully compliant. In every other
respect this device has been manufactured and screened in full
accordance with the requirements of Mil-Std 883 (latest revi-
sion).
CHANGE NOTIFICATION
Data
Size
8
8
8
16
16
Window Size
Width X Depth
4
8
8
4
8
4
4
8
4
4
Max Pixel
Rate
40MHz
20MHz
10MHz
20MHz
10MHz
Line
Delays
4x1024
4x1024
8x512
4x512
4x512
The change notification requirements of MIL-PRF-38535 will
be implemented on this device type. Known customers will be
notified of any changes since the last buy when ordering further
parts if significant changes have been made.
PIXEL
CLOCK
GENERATOR
EPROM
ADDR
DATA
POWER ON
RESET
Table 1 Single Device Configurations
Max Pixel
Rate
10MHz
10MHz
20MHz
20MHz
40MHz
40MHz
Pixel
Size
8
16
8
16
8
16
3x3
1
1
1
1
1
2
5x5
1
2
2
4
4*
-
Window size
7x7
1
2
2
4
4 *
-
9x9 11x11 15x15 23x23
4
-
6
-
-
-
4
-
6
-
-
-
4
-
8
-
-
-
9
-
-
-
-
-
COMPOSITE
OPTIONAL
FIELD
STORE
AUX
DATA
A/D
CONVERTER
DATA
IN
SYNC
EXTRACT
CLK
SYNC
BYPASS
RES
DELAYED
SYNC
PDSP
16488A
CONVOLVER
OUTPUT
DATA
* Maximum rate is limite to 30 MHz by line stor expansion delays
d
e
Table 2 Devices needed to implement typical window sizes
Fig. 1 Typical , Stand Alone, Real Time System
1
PDSP16488A MA
NAME
IP7:0
L7:0
TYPE
INPUT
I/O
DESCRIPTION
Pixel data input to the first line delay. [most significant byte in 16 bit mode]
Pixel data input to the second group of line delays. [least significant byte in 16bit mode]. Alternatively
an output from the last line delay when the appropriate mode bit is set.
The first line delay in the first group is bypassed when this input is active. (High). No internal pull up.
Resets the line delay address pointers when high. Normally the composite sync signal in real time
applications. In non real time systems it defines a frame store update period, when low.
Address/data connections from a MASTER or SINGLE device to the external coefficient source, with
X15 defining EPROM or Host support. Otherwise they provide the expansion data input.
Signed 16 bit scaled data or multiplexed 32 bit intermediate data. During intermediate transfers the
most significant half is valid when the clock is low, and the least significant half when clock is high.
During programming a MASTER device outputs a timing strobe on this pin. This is passed down the
chain in a multiple device system, using the
PC0
input on the next device.
This pin is used in conjunction with
PC1
in multiple device systems. It terminates the write strobe from
a MASTER device which is EPROM supported.
This output provides a version of the HRES input which has been delayed by an amount defined by
the user.
The data strobe from a host computer. Active low. This pin will be an output from an EPROM supported
MASTER device which provides strobes to the remaining devices.
An active low enable which is internally gated with R/
W
and
DS
to perform reads or writes to the
internal registers. In a SINGLE or MASTER device, which is supported from an EPROM, the bottom
72 addresses are always used and CE is not needed.
CE
can then be used to initiate a new register
load sequence after the power on load sequence.
Read / not write line from the host CPU. When an EPROM is used this pin should be tied low.
This pin is normally an input which signifies that registers are to be changed or examined. It is,
however, an output from an EPROM supported SINGLE or MASTER device indicating to the rest of
the system that registers are being updated.
Clock. All events are triggered on the rising edge of the clock, except the latching of least significant
expansion inputs . Internally the clock can be multiplied by two or four in order to increase the effective
number of multipliers.
This output indicates the result from the internal comparison. A high value indicates that the pixel
was greater than the internal threshold. The output is only valid from the last device in a chain.
When high this output indicates that there has been a gain control overflow.
Active low power on reset signal.
Tied to ground to indicate a SINGLE device system. Internal pull up resistor.
Tied to ground to indicate the MASTER device in a multiple device system. Must be left open circuit
in a SINGLE device system. Internal pull up.
Output enable signal. Active low.
Four address bits from a MASTER specifying one of sixteen devices in a multiple device system. Must
be externally decoded to provide chip enables for the additional devices.
These bits indicate the field selection given by the auto select logic. The same coding as that used
for Control Register bits C5:4 is used.
Four Power and ground pairs. All must be connected.
BYPASS
HRES
INPUT
INPUT
X15:0
DUAL
FUNCTION
OUTPUT
D15:0
PC1
OUTPUT
PC0
INPUT
DELOP
OUTPUT
DS
I/O
CE
INPUT
R/ W
PROG
INPUT
I/O
CLK
INPUT
BIN
OUTPUT
OV
RES
SINGLE
MASTER
OUTPUT
INPUT
INPUT
INPUT
OEN
CS3:0
INPUT
OUTPUTS
F1:0
OUTPUTS
VCC / GND
SUPPLY
3
PDSP16488A MA
BASIC OPERATION
MULTIPLIER ARRAY
The PDSP16488A convolver performs a weighted
sum of all the pixels within an N x N two dimensional window.
Each pixel value is multiplied by a signed coefficient, or weight,
and the products are summed together. In practice positive
weights would be used to produce averaging effects, with
various distribution laws, and negative weights would be used
for edge enhancement. The window is moved continuously
over the video frame, and for real time operation a new result
must be obtained for every pixel clock. In most applications
odd sized windows will be used, resulting in a centre pixel
whose value is modified by the surrounding pixels.
The PDSP16488A contains sixteen 8x8 multipliers
each producing a 16 bit result. Internally the pixel clock
supplied by the user can be multiplied by two or four, which
together with the proprietary architecture, allows each multi-
plier to be used several times within a pixel clock period. This
increases the effective number of multipliers, which are avail-
able to the user, from 16 to 32 or 64 respectively. This
architecture produces a very efficient utilization of chip area,
and allows the line delays to be accommodated on the same
device.
The sixteen multipliers are arranged in a 4 deep by 4
wide array, resulting in effective arrays of 4 by 8 or 8 by 8 with
the multi-cycling options. The multiplier array can also be
configured to handle 16 bit signed pixels; the effective number
of available multipliers is then halved.
OUTPUT ACCURACY
With 8 bit pixels, and an 8 x 8 window, it is possible for
the accumulated sum to grow to 22 bits within a single device.
With 16 bit pixels, and an 8 x 4 window ( the maximum
possible ), the sum can grow to 29 bits. The PDSP16488A
actually allows for word growth up to 32 bits, and thus allows
several devices to be cascaded without any danger of over-
flow. Since coefficients can be negative, the final result is a 32
bit signed two's complement number.
In a particular application the desired output will lie
somewhere within these 32 bits, the actual position being
dependent on the coefficient values used. This causes prob-
lems in physically choosing which output pins to connect to the
rest of the system. To overcome this problem the
PDSP16488A contains an output multiplier, or gain control,
which allows the final result to be aligned to the most signifi-
cant end of the 32 bit internal result.The provision of a
multiplier, rather than a simple shifter, allows the gain to be
defined more accurately.
The sixteen most significant bits of the adjusted result are
available on output pins, and contain a sign bit.
LINE DELAY OPERATION
Internal RAM is arranged in two separate groups, and
can be configured to provide line delays to match the chosen
size of the convolver. When a four deep arrangement is used,
with 8 bit pixels, four line delays are available, and each can
be programmed to contain up to 1024 pixels. In an eight deep
array, or if16 bit pixels are needed, each line can contain up
to 512 pixels. Figure 4 illustrates the options available.
The first line delay in one of the groups can optionally
be switched in or out under the control of an input pin. It is used
to delay the pixel input when data is obtained from another
convolver in a multiple device system, or it is used to support
interlaced video.
Signals L7:0 may be used as pixel inputs or outputs.
They are configured as inputs at power-on to avoid possible
bus conflicts, but by setting a mode control bit can become
outputs. They can then be used to drive another device when
multiple PDSP16488A's are required.
OUTPUT SATURATION
If the output from the convolver is driving a display,
negative pixels will give erroneous results. An option is thus
provided which forces all negative results to zero, which are
then interpreted as black by the display. At the same time
positive results, which overflow the gain control, are forced to
saturate at the most positive number ie peak white. In this
mode the output sign bit is always zero,and should not be
connected to an A/D converter.
A separate option forces both negative and positive
overflows to saturate at their respective maximum values, but
in scale negative results remain valid. A gain control overflow
warning flag is also available, which can be used in a host CPU
supported system to change the gain parameters if overflows
are not acceptable.
INTERLACED VIDEO
When using real time interlaced video, a picture or
frame is composed from two fields, with odd lines in one field
and even lines in the other. An external field delay is thus
required to gather information from adjacent lines, and the
convolver needs two input busses. The bus providing the
delayed pixels has an extra internal line delay. This is only
used in the field containing the upper line in any pair of lines,
and must be bypassed in the other field. It ensures that data
from the previous field always corresponds to the line above
the present active line, and avoids the need to change the
position of the coefficients from one field to the next.
Figure 3 shows the translation from physical to internal
line positions, for single device interlaced systems. Line N is
the line presently being convolved, which is either one or two
lines previous to the line presently being produced.
When windows requiring four or more lines are to be
implemented, the first line delay, in the group supplied from
the L7:0 pins, must always be by-passed. This by-pass option
is controlled by Register B, bit 7 and is not effected by the
BYPASS input pin.. The coefficients must be loaded into the
locations shown, which match the translated line positions,
with unused coefficients, shown shaded, loaded with zero's.
BINARY OUTPUT
The PDSP16488A contains a 16 bit arithmetic com-
parator which allows the output from the gain control to be
compared with a previously programmed value. An output
flag allows the user to detemine if the result was above or
below a value contained within an internal register.
4