Unlocking the Pain Points of ML Image Processing with Efinix FPGA Development Boards
Article Summary
A new
FPGA
architecture approach brings finer control and greater flexibility to meet
the needs of
machine learning
(ML)
and artificial intelligence
(AI)
. This series of articles includes two parts,
discussing the connection of the development board to external devices and peripherals (such as cameras), and how to use
FPGAs
to eliminate bottlenecks in image processing.
From industrial control and security to robotics, aerospace, and automotive,
FPGAs
play a vital role in numerous applications. With the flexibility of their programmable logic cores and their extensive interface capabilities,
FPGAs
are increasingly used in image processing where
machine learning
(ML)
can be deployed. Due to their parallel logic architecture,
FPGAs
are well-suited for implementing solutions with multiple high-speed camera interfaces. Furthermore,
FPGAs
can utilize dedicated processing pipelines within the logic, eliminating the shared resource bottlenecks associated with
CPU-
or
GPU
-based solutions.
This article introduces
Efinix
's
Titanium
FPGA
and explores
the reference image processing applications that come with
the Ti180 M484
development board
for
this
FPGA
. The aim is to understand the components of the design and clarify
which bottlenecks
FPGA
technology can eliminate or other benefits it can bring to developers.
Reference design
based on
Ti180 M484
Conceptually, this reference design ( Figure 1 ) receives images from several Mobile Industry Processor Interface (MIPI) cameras, performs frame buffering in LPDDR4x , and then outputs the images to an HDMI display. Camera input and HDMI output are provided using an FPGA mezzanine card (FMC) and four Samtec QSE interfaces on the development board .
Figure 1 : Conceptually, the Ti180 M484 reference design receives images from several MIPI cameras, performs frame buffering in LPDDR4x , and then outputs the images to an HDMI display. > (Image source: Efinix )
The FMC
to
QSE
expansion card works in conjunction with
the HDMI
daughter card to provide an output video path, while
the
three
QSE
connectors are used to connect to
the DFRobot
SEN0494
MIPI
camera. If multiple
MIPI
cameras are not available, a single camera can be used, simulating other cameras by looping back the single camera channel.
From a high-level perspective, this application may seem simple. However,
receiving multiple high-definition
(HD) MIPI
streams at high frame rates is quite challenging
. This is precisely
where
FPGA
technology excels, as it allows designers to utilize multiple
MIPI
streams in parallel.
The architecture of this reference design leverages
the parallel and sequential processing architectures of
the FPGA
. The parallel architecture is used to implement the image processing pipeline, while the
RISC-V
processor provides
sequential processing
for
the FPGA
lookup table
(LUT)
.
In many
FPGA
-based image processing systems, the image processing pipeline can be divided into two parts: the input stream and the output stream
. The input stream connects to the camera
/
sensor interface, and various processing functions are applied to the sensor output. These processing functions include
Bayer
conversion, automatic white balance, and other enhancements. In the output stream, the image is prepared for display. This includes changing the color space (e.g., from
RGB
to
YUV
) and post-processing to the desired output format, such as
HDMI
.
Typically, the input image processing chain operates at the sensor's pixel clock frequency. This differs from the timing of the output chain, which processes data at the output display frequency.
A frame buffer is used to connect the input and output processing pipelines, and it is typically stored in
external high-performance memory such as
LPDDR4x
. This frame buffer decouples the input and output pipelines, allowing it to be accessed via direct memory access at an appropriate clock frequency.
The Ti180
reference design employs a similar approach to the concepts described above. The input image processing pipeline implements a
MIPI
Camera Serial Interface
2 (CSI-2)
receiver intellectual property
(IP)
core, built upon
Titanium FPGA
input
/
output
(I/O)
supporting
the MIPI
physical layer
(MIPI D-PHY)
.
The
MIPI
interface is quite complex because, in addition to low-speed and high-speed communication, it
simultaneously uses single-ended and differential signals on the same differential pair
.
Integrating
the MIPI D-PHY
into
the FPGA I/O
reduces board design complexity and simplifies the bill of materials
(BOM)
.
Upon receiving the image stream from the camera, the reference design
converts
the MIPI CSI-2 RX
output into an Advanced Extensible Interface
(AXI)
stream.
The AXI
stream is a unidirectional high-speed interface that provides data flow from the master device to the slave device. In addition to handshake signals (
tvalid
and
tready
)
transmitted between the master and slave devices
, sideband signals are also provided. These sideband signals can be used to transmit image timing information, such as the start of a frame and the end of a line.
AXI
streaming is ideal for image processing applications, enabling
Efinix
to offer a range of image processing
IPs
that can then be easily integrated into the processing chain as needed.
Upon reception,
the MIPI CSI-2
image data and timing signals are converted into
an
AXI
stream and input to the Direct Memory Access
(DMA)
module, which writes the image frames
to LPDDR4x
and acts as a frame buffer.
This
DMA
module
operates under the control of
a RISC-V
core
within
the FPGA
on
a
Sapphire
System-on-Chip
(SoC)
. The
SoC
provides control functions such as stopping and starting
DMA
writes, and also provides the necessary information for
the DMA
write channel to correctly write image data
to LPDDR4x
. This includes information about the memory location and the image width and height (in bytes).
In this reference design, the output channel
reads image information
from the
LPDDR4x
frame buffer
under the control of
the RISC-V SoC
. The data is output as
an AXI
stream from
the DMA IP
, then
converted
from the
RAW
format
provided by the sensor
to the
RGB
format (Figure
2
), and is ready for output via the onboard
Analog
Devices
ADV7511
HDMI
transmitter
.
Figure 2 : Sample image of the reference design output. (Image source: Adam Taylor )
With the help of
DMA
,
the Sapphire SoC RISC-V
can also access images stored in the framebuffer, as well as statistical and image information summaries.
The Sapphire SoC
can also write overlays
to
LPDDR4x
for merging with the output video stream.
Modern
CMOS
image sensors
(CIS)
have several operating modes that can be configured to provide on-chip processing, as well as several different output formats and clock schemes. This configuration is typically provided via
an I²C
interface
.
In this
Efinix
reference design,
I²C
communication
with
the MIPI
camera
is provided by a
Sapphire SoC RISC-
V
processor
.
Integrating
a
RISC-V
processor
into
a Titanium FPGA
reduces the overall size of the final solution because it eliminates the need to deploy complex
FPGA
state machines that increase design risk, as well as
external processors
that add to
the BOM
.
Integrating this processor also allows
for
communication
with additional
IP addresses
and
microSD
cards
. This enables real-world applications that may require storing images for later analysis.
In summary,
the Ti180
reference design features an optimized architecture that enables
a
compact, low-cost, yet high-performance
solution, allowing developers to reduce
BOM
costs through system integration.
One of the key advantages of the reference design is its ability to initiate application development on custom hardware , enabling developers to leverage key elements of the design and build upon them for necessary customization. This includes the ability to implement vision-based TinyML applications running on FPGAs using Efinix 's TinyML flow . This utilizes both the parallel nature of FPGA logic and the ease of adding custom instructions to the RISC-V processor, allowing for the creation of accelerators within the FPGA logic.
accomplish
As
described in Part
1
,
the unique feature of the
Efinix
architecture is that it
uses Switchable Logic and Routing
(XLR)
units to provide both routing and logic functions
. Video systems like the reference design described above are hybrid systems with complex logic and routing: a significant amount of logic is required to implement image processing functions, and extensive routing is also needed to connect
IP
units at the required frequencies.
This reference design utilizes approximately
42%
of the device
's
XLR
cells, leaving ample room to add content, including
custom applications such as
edge
ML
.
The use of block
RAM
and digital signal processing
(DSP)
blocks is also very efficient, using only
4
out of
640
DSP
blocks
and
40%
of the memory blocks (Figure
3
).
Figure 3
:
Resource allocation on the
Efinix
architecture shows that only
42%
of
the
XLR
units are used, leaving ample space for other processes. (Image source:
Adam Taylor
)
On the device
I/O
,
the
LPDDR4x
DDR
interface is used to provide application memory for
the Sapphire SoC
and to provide image frame buffers. All device-specific
MIPI
resources are used in conjunction with
50%
of the phase-locked loop (PLL) (Figure
4
).
General purpose
I/O (GPIO)
provides
I²C
communication
and several interfaces for connecting to
the Sapphire SoC
, including
NOR FLASH
,
USB UART
, and
SD
card.
HSIO
is
used
to provide high-speed video output to
the
ADC7511 HDMI
transmitter.
A key factor in designing
with
FPGAs
is
not only
implementing and fitting the design within
the FPGA
, but also placing the logic design within
the FPGA
and achieving the required timing performance during routing
.
The era of single-clock domain
FPGA
design is over. In
the Ti180
reference design, several different clocks operate at high frequencies. The final timing table shows the maximum frequency reached by the clocks within the system. The timing performance requirements under clock constraints can also be seen in this table (Figure
5
), where
the maximum clock frequency for the
HDMI
output clock is
148.5 MHz
.
Figure
5
: Clock constraints of the reference design. (
Image credit:
Adam Taylor
)
The timing implementation for clock constraints demonstrates
the potential of
the Titanium FPGA XLR
architecture
, as it reduces potential routing delays, thereby improving design performance (Figure
6
).
Figure 6 : The timing implementation for clock constraints demonstrates the potential of the Titanium FPGA XLR architecture to reduce potential routing latency and thus improve design performance. (Image credit: Adam Taylor )
Conclusion
The Ti180 M484
reference design clearly demonstrates
the capabilities of
Efinix FPGAs
, especially
the Ti180
. This design utilizes several unique
I/O
structures to implement complex image processing paths, supporting multiple incoming
MIPI
streams. This image processing system operates under the control of a soft-core
Sapphire SoC
, implementing the sequential processing elements necessary for this application
.










京公网安备 11010802033920号