ESL integrated solutions improve the design efficiency of DSP and promote the development of ASICS and FPGA devices

The use of digital signal processing (DSP) chips in Electronic products is increasing dramatically. Field Programmable Gate Arrays (FPGAs) can support millions of gates and are DSP-centric, a feature that gives them a significant performance boost over standard DSP chips. In addition, FPGA can also be produced in small and medium batches, and can support very powerful prototyping and verification techniques to realize real-time simulation of DSP algorithms. But creating portable algorithm IP for FPGAs and ASICs also presents many challenges and requirements.

This article will describe how ESL synthesis techniques can dramatically reduce the time required to implement an algorithm on an FPGA or ASIC and simplify the work involved.

Challenges of RTL porting between FPGA and ASIC

Although RTL supports portability at the logic level, it does not support portability at the architectural level. Implementing the same RTL on different target devices can lead to suboptimal results; in different target devices, the synthesis results may be functionally correct, but not optimized at all.

How to choose an algorithm architecture depends on a fundamental question of how much pipeline, parallel, and serial optimization processing is required to meet the sample rate and throughput requirements of the algorithm. In addition, basic DSP functions such as FIR, FFT, sine, cosine, and division have different optimization implementation requirements depending on the target technology.For example, the direct and transposed forms of FIR filters are a good example of a suitable

for a specific FPGA device, while the other is more applicable to ASIC technology.

According to the different requirements of FPGA and ASIC, we often need different architectures. It is well known that FPGA devices tend to be register-centric, and many ASIC-to-FPGA porting guides also recommend adding pipeline design, registering all ports, and breaking combinational logic into smaller pieces. Implementing such a design on an ASIC would increase the area required to meet FPGA timing requirements.

If targeting ASICs, we often need to do the exact opposite. At this time, we recommend minimizing the registers to minimize the footprint and power consumption. We can use time division multiplexing and resource sharing to increase clock speed to minimize multipliers and other resource-wasting operations. Recent design trends in the consumer and wireless product markets are the result of careful balancing of the above.

One of the inevitable differences between ASIC RTL and FPGA RTL is the use of memory. As far as FPGAs are concerned, standard memory is built into the device. Depending on the FPGA tool flow and vendor, we need a specific coding style to describe storage arrays and memories. High-quality FPGA synthesis tools automatically map the RTL code to memory for implementation. In the ASIC world, however, IP and fab library vendors have a wide variety of memory options, and users need to select and edit memory for specific configurations and instantiate them in RTL designs.

Numerous articles and sources describe coding styles and migration techniques for transferring IP between FPGAs and ASICs. Migration of implementations between different device types requires extensive coding and verification work, as well as considerable expertise.

If you first prototype with an FPGA and then port to an ASIC design, there are more challenges. The above problem arises in situations where real-time stimulation and actual velocity verification are required. In order to meet the above requirements, we should ensure the bit and sample accuracy between simulation models, especially FPGA implementation and ASIC model can not go wrong. This requires a lot of work, especially if the implementations are different or change frequently. In addition, we had to manually modify, compare, and debug the test harness.

ESL Integrated Solutions

A comprehensive ESL solution can provide robustness to help address many of these issues.

  • Use the Electronic System Level (ESL) model to support high-level architecture and hardware abstraction;
  • Automatic optimization based on user-defined sampling rate;
  • The user selects the target technology;
  • Native support for multi-rate designs.

ESL integrated solutions improve the design efficiency of DSP and promote the development of ASICS and FPGA devices

Figure 1 Quickly implement the design scheme from the unified ESL model

Taking advantage of the above characteristics, the DSP synthesis engine can optimize the entire system based on the understanding of the target and synthesize different RTLs according to user-defined constraints. These RTLs for optimized architectures and specific coding styles can then enter a standardized logic synthesis flow.

Using ESL synthesis technology, we can complete the design work at a highly abstract level, which not only improves portability, shortens development time, but also improves the efficiency of engineering design. In addition to maintaining RTL-level IP, we can also maintain IP at the algorithm model layer, which improves portability and the productivity of algorithm developers.

As shown in Figure 1, DSP synthesis technology enables users to quickly generate and implement a variety of different implementations through a unified algorithm model. FPGAs can use a fully parallel pipelined architecture or, like an ASIC, can use a smaller footprint in-line architecture. In addition, different implementations automatically maintain bit and sample accuracy and enable a complete verification path through standardized RTL simulation tools. In contrast, parametric schematic input methods and RTL methods that require the user to determine the specific architecture before knowing the area and delay characteristics often require extensive modifications when porting to new implementation targets.

Table 1 Effects of automatic folding optimized synthesis for Virtex-4 FPGAs on filter throughput and hardware sharing

ESL integrated solutions improve the design efficiency of DSP and promote the development of ASICS and FPGA devices

Algorithm Implementation Using DSP Synthesis Technology

Tools that support DSP synthesis and automatically optimize the architecture, such as Synplicity’s Synplify DSP tool, provide design advantages that facilitate smooth implementation of designs on FPGAs and ASICs. The user does not have to define the target device and make architectural optimization choices before the DSP synthesis step. The DSP synthesis engine can then synthesize the RTL optimized implementation starting from the algorithm model.

We pay particular attention to the Retiming and Folding options. Timing optimization options allow us to modify the architecture to use pipelining and other techniques to achieve desired performance goals, but incur output delays. The folding option allows the design to share hardware, but at the expense of throughput (ie, a trade-off between resource utilization and maximum sample rate).

Table 2 Serialization and hardware sharing halves the implementation footprint of a 65-tap FIR filter

  • The measure that expresses the unit of area is 2.8 square nanometers, which is about the size of a two-input NAND gate.
  • Multipliers are implemented with logic (gates).
  • The extracted memory is dual ported.

Architecture implementation

The advantage of the automatic DSP synthesis engine is that it can quickly implement multiple architectures and target technologies. This design space implementation process helps to significantly optimize the solution, especially when we need to consider implementing DSP algorithms on multiple FPGA and ASIC technologies.

Below we give an example of timing optimization and folding optimization to see how these two options do between speed and footprint

make important trade-offs. First, we generate four 10 MHz 64-tap FIR filters in a Virtex-4 FPGA: one serves as a baseline, and the other three use different folding factors to reduce the area footprint to varying degrees. We use the logic synthesis technique of Synplify DSP RTL to generate the results shown in Table 1.

Similar analysis data for an ASIC implementation of the same design is given in Table 2. We can see the difference in area between the two extreme implementations, fully parallel versus fully serial, using 90nm technology.

We can clearly see from Table 2 that at lower sampling rates and allowing shared hardware, DSP synthesis can automatically reduce area occupancy. In addition, powerful ESL functions can be more easily implemented on various technologies by taking advantage of higher clock frequencies. At the same time, because we can work on the basis of a unified algorithm model, there is no need to change the model or re-validate the model.

in conclusion

The simple FIR example above demonstrates that DSP synthesis techniques allow us to quickly and efficiently make architectural trade-offs based on accurate simulations of relative performance and footprint. In this way, users have the possibility to implement multiple architectures, including important implementation details such as fixed-point design considerations, while efficiently obtaining useful price/performance data. This allows us to achieve optimal FPGA and ASIC implementations based on advanced algorithms while minimizing design time.

The EDA industry appears to be moving toward realizing the benefits of early ESL design, leveraging the benefits of an integrated design flow for hardware prototyping while leveraging the shipping system.

The Links:   2MBI150TA-060 LQ104V1LG92

Read More