Nowadays, artificial intelligence (AI) based on deep learning (DL) is being used more and more widely. We can see it in areas such as smart home and smart driving related to individual consumers, as well as in public management fields such as video surveillance and smart cities.
As we all know, implementing a complete AI application requires two processes: training and inference. "Training" involves feeding a large amount of data into a neural network model and performing repeated calculations to "teach" the algorithm how to function correctly, ultimately training a DL model. Next, we can use this trained model to respond to user needs online and make accurate and timely decisions based on new data input. This process is called "inference."
Typically, training an AI application only requires a single step—sometimes outsourced to a third-party professional team with ample computing resources. Application developers then deploy the trained model on a specific hardware platform to meet the inference requirements of the target application scenario. Because the inference process directly impacts end users, its accuracy and speed directly impact the user experience. Therefore, effectively accelerating AI inference has become a hot topic among developers.
FPGA wins in AI inference acceleration!
From a hardware architecture perspective, there are four options for supporting AI inference acceleration: CPU, GPU, FPGA, and ASIC. Comparing the characteristics of these device types reveals that, from left to right, device flexibility and adaptability decrease, while processing power and performance-power ratio increase.
The CPU is based on the von Neumann architecture. Although it is very flexible, the memory access often takes several clock cycles to execute a simple task, resulting in long delays. When dealing with computationally intensive tasks such as neural networks (NN), the power consumption is also relatively high, making it the least suitable for AI reasoning.
GPUs have powerful data parallel processing capabilities and have obvious advantages in training massive data. However, inference computing usually only processes one input item at a time, and the advantages of GPU parallel computing cannot be brought into play. In addition, its power consumption is relatively high, so it is not the best choice for AI reasoning.
From the perspective of high performance and low power consumption, customized ASIC seems to be an ideal solution, but its development cycle is long and the cost is high. For DL and NN algorithms that are always in rapid evolution and iteration, the flexibility is severely limited and the risk is too great. Therefore, people usually do not consider it in AI reasoning.
So, only FPGAs remain on our list. Over the years, people have become increasingly aware of the speed, flexibility, and efficiency of FPGAs. Their hardware programmability enables targeted optimization for DL and NN processing, providing ample computing power while maintaining sufficient flexibility. Today's heterogeneous FPGA-based computing platforms, in addition to programmable logic, also integrate multiple Arm processor cores, DSPs, on-chip memory, and other resources. The processing power required for DL can be well mapped to these FPGA resources, and all of these resources can operate in parallel, triggering up to millions of simultaneous operations per clock cycle, making them ideal for AI inference.
Compared with CPUs and GPUs, FPGAs also have the following advantages in AI reasoning applications:
-
It is not restricted by data types. For example, it can process non-standard low-precision data, thereby improving data processing throughput.
-
Lower power consumption: For the same NN calculation, FPGA consumes 5 to 10 times less power on average than CPU/GPU.
-
It can be reprogrammed to suit the needs of different tasks. This flexibility is particularly critical for adapting to the ongoing development of DL and NN algorithms.
-
It has a wide range of applications and can handle AI reasoning tasks from the cloud to the edge.
In a word,
in the competition of AI inference computing, FPGA’s victory is undoubted
.
GPU seamless connection, FPGA plug and play
However, although FPGA seems to be "really attractive", many AI application developers still "stay away" from it. The most important reason is that FPGA is too difficult to use!
The difficulties mainly lie in two aspects:
-
First, programming FPGAs requires specific skills and knowledge. One must be familiar with specialized hardware programming languages and be proficient in using FPGA-specific tools to compile designs through complex steps like synthesis, placement, and routing. For many embedded engineers, this is a completely unfamiliar "language."
-
Furthermore, because many DL models are trained on computing architectures such as GPUs, when these trained models are ported and deployed on FPGAs, they are likely to encounter problems such as the need for retraining and parameter adjustment, which requires developers to have specialized AI-related knowledge and skills.
How can we lower the barrier to entry for using FPGAs for AI inference? Mipsology has a surprising solution in this regard: it has developed an
FPGA-based deep learning inference engine called Zebra
. This engine allows developers to convert GPU-trained model code to run on FPGAs with zero effort, without rewriting any code or retraining.
This means that adjusting NN parameters or even changing the neural network does not require recompiling the FPGA, which can take hours, days, or even longer. Zebra makes FPGAs "transparent" to developers. After the NN model is trained, they can seamlessly switch from CPU or GPU to FPGA for inference without spending more time!
Currently, Zebra supports mainstream NN frameworks such as Caffe, Caffe2, MXNet, and TensorFlow. In terms of hardware, Zebra fully supports Xilinx's series of accelerator cards, such as the Alveo U200, Alveo U250, and Alveo U50. For developers, "once the FPGA board is plugged into the PC, with a single Linux command," the FPGA can instantly and seamlessly replace the CPU or GPU for inference, increasing computing speed by an order of magnitude while maintaining lower power consumption. This is a truly plug-and-play experience for users.
Figure 1. Zebra can adapt NNs trained on GPU accelerators and seamlessly deploy them on FPGAs.
Powerful collaboration, full ecological support
The better news is: in order to accelerate the implementation of more AI applications, Avnet Asia and Mipsology have reached a cooperation agreement to promote and sell Mipsology's unique FPGA deep learning inference acceleration software - Zebra to its Asia-Pacific customers.
This is undoubtedly a win-win situation for both parties: for Mipsology, it allows the innovative tool Zebra to cover and benefit more developers at a faster rate; for Avnet, this move further expands its powerful IoT ecosystem, bringing greater value to customers and providing a complete set of comprehensive services for customers who want to deploy DL, including hardware, software, system integration, application development, design chain and professional technology.
Avnet's successful application case for inference acceleration: AI Bluebox, an intelligent network monitoring platform
Want to learn more about the magic of Zebra software? Learn how to conveniently and effectively install the right CNN neural network inference accelerator using Zebra software, Avnet servers, and Xilinx Alveo accelerator cards. And experience in-depth how Zebra-based solutions seamlessly replace GPU boards for AI inference? Register now for the upcoming webinar on
Thursday, September 3rd, from 2:00 PM to 3:30 PM.
[Event Preview] Avnet and Mipsology Accelerate AI Solution Deployment
. Technology experts will answer all your questions!