Wringing the Power Consumption Out of That FPGA

by Jamie Murphy, Product Develop Manager, Atmel

Jamie MurphyComputationally intensive DSP functions often require hardware acceleration. Increasingly, designers are implementing their DSP algorithms in FPGAs because they offer better performance than DSP processors.  Benchmarks show that FPGAs execute turbocoding, GPS correlation, H264 and other DSP functions much more quickly than DSPs.

Although FPGAs offer excellent design flexibility and performance, they are notoriously power hungry – especially when compared to microcontrollers.  For instance, static power consumption for a highly integrated ARM7-based microcontroller can be as low as 60 uA, whereas a 500K gate Xilinx Spartan 3-E FPGA consumes 13.46 mW in standby – 44 times more than an ARM7.  Active power is also higher.  In active mode, an FPGA draws 3 to 4 Watts, while an entire ARM7-based SoC consumes around 90 mW running at 80 MHz with all its peripherals turned on – about 97% less power than an FPGA. Clearly it would be very nice if the functionality implemented in power-hungry FPGAs could somehow be integrated into the microcontroller.

Customizable MCUs Present Flexibility, Power Savings

A new breed of microcontroller may provide just such a solution.  Customizable microcontrollers consist of a standard product ARM7- or ARM9-based MCU with a variety of peripherals and a block of metal-programmable (MP) logic with the equivalent of about Cap 7 Customizable MCU60,000 FPGA logic elements, or 400,000 to 500,000 ASIC-gates.  

The MP block is implemented in a metal-programmable cell fabric (MPCF) developed from a library of over 400 custom designed cells. Its 8-transistor core cell is only 3.2 um high and 2.0 um wide, with two layers of metal for interconnect. ICs fabricated with the MPCF technology achieve 80 to 90% placement utilization allowing gate densities on silicon of between 170K and 210K gates/mm2. In a 130 nm process, MPCF achieves utilization of 80% to 90% and routing density -- essentially the same as those of standard cell ASICs and much better than FPGAs. For example, a D flip-flop (DFF)  implemented in an MPCF cell implementing using a 130 nm process consumes slightly less silicon area than that one implemented in a120 nm standard cell. 

MPCF’s higher gate density and utilization achieve much smaller silicon area, lower costs and lower power consumption than an FPGA. Integrating the FPGA logic on the customizable MCU also produces smaller footprint single-chip solution and eliminates on- and off-chip delays, saving even more power. Any functionality that can be implemented in an FPGA can be ported to the mask programmable MP block, including DSP algorithms, small memories, unique peripherals sets (e.g. 25 PWMs) or even multiple, deeply embedded MCU cores.

Implement Functionality Without Bandwidth Obstacles

Implementing functionality in logic that is external to the microcontroller can be tricky. The microcontroller must be connected to the FPGA either through serial I/O or an external bus interface, and managed by the CPU. This situation can create a huge bandwidth obstacle, so that moving data between the FPGA and microcontroller can Atmel Figure 2actually consume most of the CPU’s cycles.

Customizable MCUs resolve this issue with a multi-layer advanced high speed bus (AHB) matrix. ARM7-based devices have six masters and six slaves; ARM9 devices have 12 masters and 12 slaves. The bus matrix completely eliminates bus contention.  In addition, data transfers between all masters, slaves, memories, and on-chip peripherals are managed by a peripheral DMA controller (PDC) independently of the CPU. Off-loading data transfers frees up the CPU so it can do more processing or spend substantially more time in a powered-down sleep mode state.

The six masters on an ARM7-based device are the CPU data, CPU instruction, peripheral DMA controller, Ethernet and USB Host. The slaves are the memories, USB device, and the peripheral bus bridge. Any master can take control of any available bus when needed.

The MCU Vendor Does Most of the Work

Most of the work involved in migrating an existing ARM-plus-FPGA design to a customizable MCU is done by the silicon vendor. For a nominal on-time NRE fee of $150,000, the MCU vendor ports the verified FPGA RTL to the MP block in the customizable microcontroller, resolves any timing issues and provides prototypes. 

The only task for the designer is the emulation of both the hardware and software using a vendor-supplied emulation board that includes the same peripherals, memories, bus structure, standard interfaces, network and configurable connections as the customizable MCU.  Although the RTL for the DSP or other functionality in the FPGA should not vary much between hardware architectures, the fact is that ARM MCUs from different vendors can vary greatly. The CPU cores and instruction set architectures (ISAs) are the same, but the devices can have different peripheral sets, memory maps and bus architectures, meaning software is very rarely portable Atmel Figure 1between ARM MCUs from different vendors.  For this reason, the entire design (both software and RTL) needs to be emulated.

Experience indicates that this emulation step almost always highlights errors in the hardware and/or software, or the hardware/software interface of the device. The ability to correct and re-test the complete design of the device at this stage is a major factor in reducing the design time and cost.  An additional benefit is that the emulated version of the final design can be used as the starting point for future design iterations, at a substantial savings in design effort.

The silicon vendor does everything else, providing prototypes within 12 weeks of receiving the verified RTL design. The vendor does a post-layout simulation, ensuring that no timing constraints have been violated.

Significantly Improved Power Consumption

The improvement in power consumption that can be realized by migrating an FPGA design to a customizable MCU is astonishing.  Stanford University researcher, Koji Gardiner, compared the power characteristics of a variety of peripherals implemented in the MP block of a customizable ARM7-based MCU and a Xilinx Spartan 3-E plus ARM7 MCU implementation.

The WebPack synthesis tool used to design the Spartan helps to keep FPGA power consumption low by reusing the same logic between multiple peripheral instantiations. The mask-programmable customizable MCU does not have this luxury. Each peripheral must be instantiated as a distinct block with no shared logic. As a result, the ec86sm100cpower advantage of the customizable MCU shrinks somewhat as a larger number of peripherals is added. For example, a single timer/counter consumes 81% less power in a customizable MCU than in an FPGA, while four timer/counters consume 50% less power in a customizable MCU.

Overall, however, the aggregate dynamic power consumption of functions implemented in the MP block of a customizable MCU is 70% lower than it is in the Spartan 3-E FPGA-plus ARM7 design. Static power consumption is 97% less.

A second, and equally significant reduction in power drain accrues from the fact that a two-chip design is shrunk to a to a single-chip implementation. The single-chip customizable microcontroller allows a much smaller footprint design and offers power consumption that is basically on parity with the 80 mA to 90 mA active power consumed by a stand-alone microcontroller. When the single chip advantage is considered, the customizable MCU cuts power consumption by 99%.

 Additional Benefits

There are some additional benefits offered by customizable microcontrollers: lower unit costs and faster performance.  Unit prices for customizable MCUs are about 20% to 50% lower than the cost of an MCU-plus-DSP combination. The lower unit costs fully offset the small NRE charge at around 25,000 units.

On the performance side, the logic in the metal programmable block can toggle at 400 MHz -- 8-times faster than the 50 MHz maximum rate of an FPGA. Designs implemented in customizable MCUs benefit from cost and performance improvements, as well as experiencing huge reductions in power consumption.

Jamie Murphy has over 19 years experience in the design of SoCs and wireless chipsets supporting ZigBee, TDMA, CDMA, GSM, avionics and satellite phone technologies.  He received his BSEE from North Carolina State University in 1989.