Patents by Inventor Allen Rush

Allen Rush has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11948073
    Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core.
    Type: Grant
    Filed: August 30, 2018
    Date of Patent: April 2, 2024
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush
  • Publication number: 20220129752
    Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.
    Type: Application
    Filed: January 7, 2022
    Publication date: April 28, 2022
    Inventors: Sateesh Lagudu, Lei Zhang, Allen Rush
  • Patent number: 11227214
    Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.
    Type: Grant
    Filed: November 14, 2017
    Date of Patent: January 18, 2022
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Sateesh Lagudu, Lei Zhang, Allen Rush
  • Patent number: 10582250
    Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.
    Type: Grant
    Filed: July 24, 2017
    Date of Patent: March 3, 2020
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush, Razvan Dan-Dobre
  • Publication number: 20190325305
    Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core.
    Type: Application
    Filed: August 30, 2018
    Publication date: October 24, 2019
    Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush
  • Publication number: 20190171448
    Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.
    Type: Application
    Filed: December 27, 2017
    Publication date: June 6, 2019
    Inventors: Jiasheng Chen, Yunxiao Zou, Michael J. Mantor, Allen Rush
  • Publication number: 20190147332
    Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.
    Type: Application
    Filed: November 14, 2017
    Publication date: May 16, 2019
    Inventors: Sateesh Lagudu, Lei Zhang, Allen Rush
  • Publication number: 20190028752
    Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.
    Type: Application
    Filed: July 24, 2017
    Publication date: January 24, 2019
    Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush, Razvan Dan-Dobre
  • Publication number: 20030082333
    Abstract: Sheets (3) have a narrow margin (3a) which are the first part of the sheet to enter nip rollers (5 and 7). This concentrates the pressure of the nip rollers on the narrow margin to assure a firm grip without slippage. Once the sheet is moving, continuing movement without slippage is assured from the momentum. This is particularly useful when the nip rollers are fixing rollers which are oiled to minimize toner transfer and when the sheet is a smooth transparency.
    Type: Application
    Filed: February 24, 2000
    Publication date: May 1, 2003
    Inventors: Wayne Edward Evans, James Allen Lokovich, John Stephen Mullins, Edward Allen Rush, William Reed Summers