Intel Showcases Intelligent Edge and Energy Efficient Performance Research at 2020 VLSI Symposium

At this week’s 2020 VLSI Technology and Circuits Symposium, Intel will present a series of research findings and technical perspectives on the computing transformation caused by the growing amount of data distributed across the core, edge and endpoint. CTO Mike Mayberry will deliver a keynote speech titled “The Future of Computing: How Data Transformation is Reshaping VLSI,” highlighting the importance of transitioning from hardware/program-centric computing to data/information-centric computing.

“The massive flow of data across distributed edge, network, and cloud infrastructure requires energy-efficient and robust processing close to where the data is generated, often constrained by bandwidth, memory, and power resources .Intel Research at the VLSI Symposium highlighted several new ways to improve computational efficiency that show promise in a variety of application areas, including robotics, augmented reality, machine vision, and video analytics. The focus is on addressing the barriers to data movement and computing that represent the biggest data challenges of the future.”

– Vivek K. De, Intel Fellow, Research Director, Circuit Technology, Intel Research

What will be shown: This workshop will present some of Intel’s research papers on how higher levels of intelligence and greater energy efficiency can be achieved in future edge-network-cloud systems to support a growing number of edge applications. Some of the topics covered in the research paper (full list of studies at the end of this press release) include:

Using ray casting hardware accelerators to improve the efficiency and accuracy of 3D scene reconstruction for edge robots

Thesis: Efficient 3D Scene Reconstruction via a Raycast Accelerator of 10nm CMOS in Edge Robotics and Augmented Reality Applications

Significance: Certain applications, including edge robotics and augmented reality, require accurate, fast, and energy-efficient reconstruction of complex 3D scenes from large amounts of data generated by raycasting operations for dense simultaneous localization and Mapping (SLAM). In this research paper, Intel highlights a new ray-casting hardware accelerator that leverages new technologies to maintain scene reconstruction accuracy while achieving exceptionally energy-efficient performance. These innovative approaches include techniques such as voxel overlap search and hardware-assisted approximate voxel computation, which reduce the need for local memory, and improve power efficiency for future edge robotics and augmented reality applications.

Utilize event-driven visual data processing units (EPUs) to reduce power consumption for deep learning-based video streaming analysis

Thesis: A 0.05pJ/pixel 70fps FHD 1Meps event-driven visual data processing unit

Why it matters: Visual data analysis based on real-time deep learning is mainly used in fields such as safety and security, requiring fast object detection in multiple video streams, thus requiring long computing time and high memory bandwidth. The input frames from these cameras are often downsampled to minimize the load, which reduces image accuracy. In this study, Intel demonstrated an event-driven visual data processing unit (EPU) that, when combined with novel algorithms, can instruct deep learning accelerators to process visual input using only motion-based “target regions.” This novel approach alleviates the computationally intensive and high memory requirements in edge vision analysis.

Extend local memory bandwidth to meet the demands of AI, machine learning and deep learning applications

Paper: 2x Bandwidth Burst 6T-SRAM for Memory Bandwidth-Limited Workloads

Why it matters: Many AI chips, especially those used for natural language processing (such as voice assistants), are increasingly constrained by local memory. Addressing memory challenges requires providing frequency multipliers or increasing the number of memory slots at the cost of lower power consumption and area efficiency, especially for area-constrained edge devices. Through this study, Intel demonstrated how to use a 6T-SRAM array to provide 2x the read bandwidth on demand in burst mode, with 51% higher energy efficiency than frequency doubling and area efficiency over doubling the number of memory sockets 30%.

All-Digital Binary Neural Network Accelerator

Thesis: 617TOPS/W All-Digital Binary Neural Network Accelerator Using 10nm FinFET CMOS

Why it matters: In power- and resource-constrained edge devices, some applications can accept low-precision outputs, making analog binary neural networks (BNNs) an alternative to higher-precision neural networks. The latter are more computationally demanding and have intensive memory requirements. However, the prediction accuracy of analog BNNs is lower because they are less tolerant to process variation and noise. Through this study, Intel demonstrates the use of an all-digital BNN that has energy efficiency similar to analog input memory technology, while providing better robustness and scalability for advanced process nodes.

Other Intel research presented at the 2020 VLSI Symposium includes the following papers:

The future of computing: How data transformation is reshaping VLSI

Low clock power digital standard cell IP for high performance graphics/AI processors in 10nm CMOS

An autonomously reconfigurable power delivery network (RPDN) for multicore SoCs with dynamic current control

3D monolithic heterogeneous integration enables GaN and Si transistors on 300mm silicon wafers (111)

Low-swing and column-multiplexed bitline technology for low-Vmin, noise-tolerant, high-density 1R1W 8T-bit cell SRAM in 10nm FinFET CMOS

A dual-rail hybrid analog/digital LDO with dynamic current control for tunable high PSRR and high efficiency

A 435MHz, 600Kops/J side-channel attack-resistant encryption processor for secure RSA-4K public key encryption in 14nm CMOS

A 0.26% BER 10^28 modeling challenge-response PUF in 14nm CMOS with Stability-Aware Adversarial Challenge Selection

An anti-SCA AES engine with 6000x time-/frequency-domain leakage suppression using nonlinear digital low-dropout regulators cascaded with computational countermeasures in 14nm CMOS

SOT-MRAM CMOS compatible process integration with heavy metal bilayer bottom electrode and 10ns field-free SOT conversion with STT assistance

Self-folding write-assisted 10nm SRAM design with gate modulation reduces VMIN by 175mV with negligible power overhead

The Links:   G154I1-L01 FZ3600R17HE4

Read More