Patents by Inventor Allen Rush

Allen Rush has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Machine learning inference engine scalability

Patent number: 11948073

Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core.

Type: Grant

Filed: August 30, 2018

Date of Patent: April 2, 2024

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush
MEMORY BANDWIDTH REDUCTION TECHNIQUES FOR LOW POWER CONVOLUTIONAL NEURAL NETWORK INFERENCE APPLICATIONS

Publication number: 20220129752

Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

Type: Application

Filed: January 7, 2022

Publication date: April 28, 2022

Inventors: Sateesh Lagudu, Lei Zhang, Allen Rush
Memory bandwidth reduction techniques for low power convolutional neural network inference applications

Patent number: 11227214

Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

Type: Grant

Filed: November 14, 2017

Date of Patent: January 18, 2022

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Sateesh Lagudu, Lei Zhang, Allen Rush
Integrated video codec and inference engine

Patent number: 10582250

Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.

Type: Grant

Filed: July 24, 2017

Date of Patent: March 3, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush, Razvan Dan-Dobre
MACHINE LEARNING INFERENCE ENGINE SCALABILITY

Publication number: 20190325305

Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core.

Type: Application

Filed: August 30, 2018

Publication date: October 24, 2019

Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush
STREAM PROCESSOR WITH LOW POWER PARALLEL MATRIX MULTIPLY PIPELINE

Publication number: 20190171448

Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.

Type: Application

Filed: December 27, 2017

Publication date: June 6, 2019

Inventors: Jiasheng Chen, Yunxiao Zou, Michael J. Mantor, Allen Rush
MEMORY BANDWIDTH REDUCTION TECHNIQUES FOR LOW POWER CONVOLUTIONAL NEURAL NETWORK INFERENCE APPLICATIONS

Publication number: 20190147332

Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

Type: Application

Filed: November 14, 2017

Publication date: May 16, 2019

Inventors: Sateesh Lagudu, Lei Zhang, Allen Rush
INTEGRATED VIDEO CODEC AND INFERENCE ENGINE

Publication number: 20190028752

Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.

Type: Application

Filed: July 24, 2017

Publication date: January 24, 2019

Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush, Razvan Dan-Dobre
Sheet margin for feeding through nip rollers

Publication number: 20030082333

Abstract: Sheets (3) have a narrow margin (3a) which are the first part of the sheet to enter nip rollers (5 and 7). This concentrates the pressure of the nip rollers on the narrow margin to assure a firm grip without slippage. Once the sheet is moving, continuing movement without slippage is assured from the momentum. This is particularly useful when the nip rollers are fixing rollers which are oiled to minimize toner transfer and when the sheet is a smooth transparency.

Type: Application

Filed: February 24, 2000

Publication date: May 1, 2003

Inventors: Wayne Edward Evans, James Allen Lokovich, John Stephen Mullins, Edward Allen Rush, William Reed Summers