Unlocking the Pain Points of ML Image Processing with Efinix FPGA Development Boards

Latest update time：2026-03-25

Reads：

Article Summary

A new FPGA architecture approach brings finer control and greater flexibility to meet the needs of machine learning (ML) and artificial intelligence (AI) . This series of articles includes two parts, discussing the connection of the development board to external devices and peripherals (such as cameras), and how to use FPGAs to eliminate bottlenecks in image processing.

From industrial control and security to robotics, aerospace, and automotive, FPGAs play a vital role in numerous applications. With the flexibility of their programmable logic cores and their extensive interface capabilities, FPGAs are increasingly used in image processing where machine learning (ML) can be deployed. Due to their parallel logic architecture, FPGAs are well-suited for implementing solutions with multiple high-speed camera interfaces. Furthermore, FPGAs can utilize dedicated processing pipelines within the logic, eliminating the shared resource bottlenecks associated with CPU- or GPU -based solutions.

This article introduces Efinix 's Titanium FPGA and explores the reference image processing applications that come with the Ti180 M484 development board for this FPGA . The aim is to understand the components of the design and clarify which bottlenecks FPGA technology can eliminate or other benefits it can bring to developers.

Reference design based on Ti180 M484

Conceptually, this reference design ( Figure 1 ) receives images from several Mobile Industry Processor Interface (MIPI) cameras, performs frame buffering in LPDDR4x , and then outputs the images to an HDMI display. Camera input and HDMI output are provided using an FPGA mezzanine card (FMC) and four Samtec QSE interfaces on the development board .

Figure 1 : Conceptually, the Ti180 M484 reference design receives images from several MIPI cameras, performs frame buffering in LPDDR4x , and then outputs the images to an HDMI display. > (Image source: Efinix )

The FMC to QSE expansion card works in conjunction with the HDMI daughter card to provide an output video path, while the three QSE connectors are used to connect to the DFRobot SEN0494 MIPI camera. If multiple MIPI cameras are not available, a single camera can be used, simulating other cameras by looping back the single camera channel.

From a high-level perspective, this application may seem simple. However, receiving multiple high-definition (HD) MIPI streams at high frame rates is quite challenging . This is precisely where FPGA technology excels, as it allows designers to utilize multiple MIPI streams in parallel.

The architecture of this reference design leverages the parallel and sequential processing architectures of the FPGA . The parallel architecture is used to implement the image processing pipeline, while the RISC-V processor provides sequential processing for the FPGA lookup table (LUT) .

In many FPGA -based image processing systems, the image processing pipeline can be divided into two parts: the input stream and the output stream . The input stream connects to the camera / sensor interface, and various processing functions are applied to the sensor output. These processing functions include Bayer conversion, automatic white balance, and other enhancements. In the output stream, the image is prepared for display. This includes changing the color space (e.g., from RGB to YUV ) and post-processing to the desired output format, such as HDMI .

Typically, the input image processing chain operates at the sensor's pixel clock frequency. This differs from the timing of the output chain, which processes data at the output display frequency.

A frame buffer is used to connect the input and output processing pipelines, and it is typically stored in external high-performance memory such as LPDDR4x . This frame buffer decouples the input and output pipelines, allowing it to be accessed via direct memory access at an appropriate clock frequency.

The Ti180 reference design employs a similar approach to the concepts described above. The input image processing pipeline implements a MIPI Camera Serial Interface 2 (CSI-2) receiver intellectual property (IP) core, built upon Titanium FPGA input / output (I/O) supporting the MIPI physical layer (MIPI D-PHY) . The MIPI interface is quite complex because, in addition to low-speed and high-speed communication, it simultaneously uses single-ended and differential signals on the same differential pair . Integrating the MIPI D-PHY into the FPGA I/O reduces board design complexity and simplifies the bill of materials (BOM) .

Upon receiving the image stream from the camera, the reference design converts the MIPI CSI-2 RX output into an Advanced Extensible Interface (AXI) stream. The AXI stream is a unidirectional high-speed interface that provides data flow from the master device to the slave device. In addition to handshake signals ( tvalid and tready ) transmitted between the master and slave devices , sideband signals are also provided. These sideband signals can be used to transmit image timing information, such as the start of a frame and the end of a line.

AXI streaming is ideal for image processing applications, enabling Efinix to offer a range of image processing IPs that can then be easily integrated into the processing chain as needed.

Upon reception, the MIPI CSI-2 image data and timing signals are converted into an AXI stream and input to the Direct Memory Access (DMA) module, which writes the image frames to LPDDR4x and acts as a frame buffer.

This DMA module operates under the control of a RISC-V core within the FPGA on a Sapphire System-on-Chip (SoC) . The SoC provides control functions such as stopping and starting DMA writes, and also provides the necessary information for the DMA write channel to correctly write image data to LPDDR4x . This includes information about the memory location and the image width and height (in bytes).

In this reference design, the output channel reads image information from the LPDDR4x frame buffer under the control of the RISC-V SoC . The data is output as an AXI stream from the DMA IP , then converted from the RAW format provided by the sensor to the RGB format (Figure 2 ), and is ready for output via the onboard Analog Devices ADV7511 HDMI transmitter .

Figure 2 : Sample image of the reference design output. (Image source: Adam Taylor )

With the help of DMA , the Sapphire SoC RISC-V can also access images stored in the framebuffer, as well as statistical and image information summaries. The Sapphire SoC can also write overlays to LPDDR4x for merging with the output video stream.

Modern CMOS image sensors (CIS) have several operating modes that can be configured to provide on-chip processing, as well as several different output formats and clock schemes. This configuration is typically provided via an I²C interface . In this Efinix reference design, I²C communication with the MIPI camera is provided by a Sapphire SoC RISC- V processor .

Integrating a RISC-V processor into a Titanium FPGA reduces the overall size of the final solution because it eliminates the need to deploy complex FPGA state machines that increase design risk, as well as external processors that add to the BOM .

Integrating this processor also allows for communication with additional IP addresses and microSD cards . This enables real-world applications that may require storing images for later analysis.

In summary, the Ti180 reference design features an optimized architecture that enables a compact, low-cost, yet high-performance solution, allowing developers to reduce BOM costs through system integration.

One of the key advantages of the reference design is its ability to initiate application development on custom hardware , enabling developers to leverage key elements of the design and build upon them for necessary customization. This includes the ability to implement vision-based TinyML applications running on FPGAs using Efinix 's TinyML flow . This utilizes both the parallel nature of FPGA logic and the ease of adding custom instructions to the RISC-V processor, allowing for the creation of accelerators within the FPGA logic.

accomplish

As described in Part 1 , the unique feature of the Efinix architecture is that it uses Switchable Logic and Routing (XLR) units to provide both routing and logic functions . Video systems like the reference design described above are hybrid systems with complex logic and routing: a significant amount of logic is required to implement image processing functions, and extensive routing is also needed to connect IP units at the required frequencies.

This reference design utilizes approximately 42% of the device 's XLR cells, leaving ample room to add content, including custom applications such as edge ML .

The use of block RAM and digital signal processing (DSP) blocks is also very efficient, using only 4 out of 640 DSP blocks and 40% of the memory blocks (Figure 3 ).

Figure 3 : Resource allocation on the Efinix architecture shows that only 42% of the XLR units are used, leaving ample space for other processes. (Image source: Adam Taylor )

On the device I/O , the LPDDR4x DDR interface is used to provide application memory for the Sapphire SoC and to provide image frame buffers. All device-specific MIPI resources are used in conjunction with 50% of the phase-locked loop (PLL) (Figure 4 ).

General purpose I/O (GPIO) provides I²C communication and several interfaces for connecting to the Sapphire SoC , including NOR FLASH , USB UART , and SD card. HSIO is used to provide high-speed video output to the ADC7511 HDMI transmitter.

A key factor in designing with FPGAs is not only implementing and fitting the design within the FPGA , but also placing the logic design within the FPGA and achieving the required timing performance during routing .

The era of single-clock domain FPGA design is over. In the Ti180 reference design, several different clocks operate at high frequencies. The final timing table shows the maximum frequency reached by the clocks within the system. The timing performance requirements under clock constraints can also be seen in this table (Figure 5 ), where the maximum clock frequency for the HDMI output clock is 148.5 MHz .

Figure 5 : Clock constraints of the reference design. ( Image credit: Adam Taylor )

The timing implementation for clock constraints demonstrates the potential of the Titanium FPGA XLR architecture , as it reduces potential routing delays, thereby improving design performance (Figure 6 ).

Figure 6 : The timing implementation for clock constraints demonstrates the potential of the Titanium FPGA XLR architecture to reduce potential routing latency and thus improve design performance. (Image credit: Adam Taylor )

Conclusion

The Ti180 M484 reference design clearly demonstrates the capabilities of Efinix FPGAs , especially the Ti180 . This design utilizes several unique I/O structures to implement complex image processing paths, supporting multiple incoming MIPI streams. This image processing system operates under the control of a soft-core Sapphire SoC , implementing the sequential processing elements necessary for this application .

"Recommend" us to share fresh case studies and industry insights.

Latest articles about

■It truly lives up to its reputation as an industrial control marvel! It effortlessly solves these problems in industrial automation projects!

■Countdown! Only 7 days left! AI practical video series concludes [Exclusive gifts await!]

■It's important to understand these key points when using power transformers!

■Can't connect your oscilloscope probe to your breadboard? Here's a little trick!

■Developing large-screen IoT devices is difficult? The Tab5 smart panel makes it easy!

■Struggling to implement AI projects? Hands-on videos to ignite your passion! [Exclusive gifts are ready!]

■Too little space? Try an up-tilted RF connector!

■Analysis of surface deformation of tantalum capacitor lead contact surfaces

■Practical Guide: Implementing System-Level Security Embedded Designs Using DSC+MCU

■Want to develop an AI voice project? This kit will get you started in 3 minutes!