Imagine yourself sitting in an autonomous car, enjoying the convenience of modern technology, when suddenly, a rabbit appears out of nowhere and hops onto the road directly in front of you. In this critical moment, the car’s advanced sensors quickly capture detailed images of the rabbit. These images are then swiftly transmitted to a computer system, which begins processing and analyzing them in real-time. The computer’s sophisticated algorithms and artificial intelligence capabilities come into play, enabling it to make swift decisions based on the image analysis.
Within moments, the computer reaches a conclusion and sends its decision to the car’s controls. With precise adjustments made, the car maneuvers deftly to avoid any harm to the rabbit or passengers. This remarkable feat is achieved through the wonders of computer vision, a branch of artificial intelligence that empowers computers to perceive and interpret digital images. It enables machines to acquire visual data, process it effectively, and derive meaningful insights or make informed decisions based on that analysis.
The computer vision market is experiencing exponential growth, encompassing a wide range of applications. From the cutting-edge drone surveillance systems employed by defense organizations to commercially available smart glasses that augment human perception, the potential of computer vision is immense. Even everyday scenarios like autonomous vehicle systems benefit from this technology, as it equips them with the ability to detect and react to unexpected obstacles such as our furry friend, the rabbit.
Given the growing demand and potential, there is a significant drive to enhance computer vision technology further. To address this need, dedicated researchers at the USC Viterbi’s Information Sciences Institute (ISI) and the Ming Hsieh Department of Electrical and Computer Engineering (ECE) have undertaken a groundbreaking project in collaboration with DARPA (Defense Advanced Research Projects Agency). This project, now completing Phases 1 and 2, aims to push the boundaries of computer vision, unlocking new frontiers of capability and advancing the state of the art.
Two jobs spread over two separate platforms
In the aforementioned scenario involving a rabbit on the road, the process of computer vision can be divided into two key components: the front end and the back end. The front end encompasses the vision sensing aspect, where the car’s sensors capture the image of the rabbit, while the back end involves vision processing, where the data from the sensors is analyzed and interpreted. Traditionally, these two components are physically separated, leading to potential issues such as bottlenecks in throughput, bandwidth, and energy efficiency, particularly when dealing with large amounts of data.
Ajey Jacob, the Director of Advanced Electronics at ISI, recognizes this challenge and proposes a solution that focuses on bringing the backend processing closer to the frontend image collection. This approach involves placing a CPU (computer) near the sensor to handle the processing tasks. For instance, in the context of a car, a CPU within the vehicle can effectively perform the necessary computations. However, this solution may not be feasible for certain scenarios like drones, as they have limited space, connectivity, and battery capacity, making it impractical to accommodate a large CPU for processing tasks.
To overcome these limitations, the ISI/ECE team adopted a different strategy by exploring the possibility of reducing or eliminating the need for backend processing altogether. Their innovative approach involves performing computations directly on the pixel itself, thereby obviating the requirement for an additional processing unit or a separate computer. This localized processing on the chip offers a promising alternative, allowing for efficient and real-time analysis of the captured data without the need for extensive computational resources.
By shifting the focus from traditional backend processing to on-chip computation, the researchers aim to streamline the computer vision pipeline, improving performance, reducing energy consumption, and eliminating potential bottlenecks. This novel approach opens up exciting possibilities for advancing computer vision technology and expanding its applications across various domains.
Front-end processing inside a pixel
The revolutionary concept of in-pixel intelligent processing (IP2) involves conducting processing directly on the image sensor chip for AI applications. With IP2, the processing takes place right beneath the data on each pixel, allowing for the extraction of relevant information in a highly efficient manner. This remarkable capability has been made possible by the advancements in computer microchips, particularly CMOS (complementary metal–oxide–semiconductors), which are extensively utilized for image processing.
Building upon this foundation, the ISI/ECE team has introduced an innovative paradigm known as processing-in-pixel-in-memory (P2M). This novel approach leverages state-of-the-art CMOS technologies to empower the pixel array with the ability to perform an extensive range of complex operations, including image processing tasks. By integrating sensing, memory, and computing functionalities within the camera chip itself, P2M achieves a remarkable fusion of capabilities.
Akhilesh Jaiswal, a computer scientist at ISI and assistant professor at ECE, has played a significant role in leading the front-end circuit design. Jaiswal explains that the team has effectively combined advances in mixed-signal analog computing with the strides being made in 3D integration of semiconductor chips. This integration has opened up new possibilities for seamlessly merging sensing, memory, and computing functions within the compact confines of the camera chip.
One of the key advantages of the P2M approach is that it enables the pixel to perform processing tasks in-pixel, leading to a significant reduction in power consumption and bandwidth. Instead of transmitting large amounts of raw data downstream to the AI processor, only the compressed and meaningful data is transmitted, greatly optimizing the overall system efficiency. The team dedicated considerable effort to striking the right balance between compression and computing on the pixel sensor, ensuring that the final framework achieves the desired outcomes.
As a result of their meticulous analysis and optimizations, the team successfully developed a framework that reduces the size of the chip to be comparable to that of a sensor. Furthermore, the data transferred from the sensor to the computer is minimized due to the initial pruning and on-pixel computation, effectively streamlining the data flow and enhancing the overall efficiency of the system. This breakthrough in P2M not only enables efficient in-pixel processing but also reduces the computational burden on downstream AI processors, opening up exciting prospects for the future of computer vision.
From the front to the back and into the future
The culmination of the DARPA challenge has led to the development of RPIXELS (Recurrent Neural Network Processing In-Pixel for Efficient Low-energy Heterogeneous Systems), a groundbreaking solution proposed by the ISI team. RPIXELS combines the front-end in-pixel processing with an optimized back end, carefully designed to support the front-end operations seamlessly.
The initial testing of the RPIXELS framework has yielded highly promising results. Notably, it has achieved a remarkable reduction of 13.5 times in both data size and bandwidth, surpassing the DARPA goal of a 10 times reduction in these metrics. This significant reduction in data requirements and bandwidth usage indicates the remarkable efficiency and effectiveness of RPIXELS.
Andrew Schmidt, an esteemed senior computer scientist at ISI, emphasizes the impact of RPIXELS on latency reduction and bandwidth optimization. By tightly integrating the initial layers of a neural network directly into the pixel for computing, RPIXELS enables faster decision-making based on the sensor’s captured information. This approach also paves the way for the development of novel back-end algorithms for object detection and tracking, fostering continuous innovation toward more accurate and high-performance systems.
The success of this project stands as a testament to the collaborative efforts between USC’s ECE department and ISI. The expertise at the intersection of hardware and machine learning algorithms from the ECE department combines seamlessly with ISI’s proficiency in device technology, circuit design, and machine learning applications. This collaboration has resulted in a truly synergistic effort, leveraging the strengths of both entities to drive transformative advancements in computer vision technology.
Moving forward, the next crucial step entails the physical implementation of the RPIXELS solution by integrating the circuit design onto silicon chips. Real-world testing will provide invaluable insights and validation of the framework’s performance and capabilities. This exciting endeavor holds the potential to not only save rabbits but also revolutionize various applications by enabling efficient, intelligent, and high-performance computer vision systems.