Patents by Inventor Allen Rush
Allen Rush has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12067401Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.Type: GrantFiled: December 27, 2017Date of Patent: August 20, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Jiasheng Chen, Yunxiao Zou, Michael J. Mantor, Allen Rush
-
Patent number: 11948073Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core.Type: GrantFiled: August 30, 2018Date of Patent: April 2, 2024Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Lei Zhang, Sateesh Lagudu, Allen Rush
-
Publication number: 20220129752Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.Type: ApplicationFiled: January 7, 2022Publication date: April 28, 2022Inventors: Sateesh Lagudu, Lei Zhang, Allen Rush
-
Patent number: 11227214Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.Type: GrantFiled: November 14, 2017Date of Patent: January 18, 2022Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Sateesh Lagudu, Lei Zhang, Allen Rush
-
Patent number: 10582250Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.Type: GrantFiled: July 24, 2017Date of Patent: March 3, 2020Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Lei Zhang, Sateesh Lagudu, Allen Rush, Razvan Dan-Dobre
-
Publication number: 20190325305Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core.Type: ApplicationFiled: August 30, 2018Publication date: October 24, 2019Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush
-
Publication number: 20190171448Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.Type: ApplicationFiled: December 27, 2017Publication date: June 6, 2019Inventors: Jiasheng Chen, Yunxiao Zou, Michael J. Mantor, Allen Rush
-
Publication number: 20190147332Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.Type: ApplicationFiled: November 14, 2017Publication date: May 16, 2019Inventors: Sateesh Lagudu, Lei Zhang, Allen Rush
-
Publication number: 20190028752Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.Type: ApplicationFiled: July 24, 2017Publication date: January 24, 2019Inventors: Lei Zhang, Sateesh Lagudu, Allen Rush, Razvan Dan-Dobre
-
Publication number: 20030082333Abstract: Sheets (3) have a narrow margin (3a) which are the first part of the sheet to enter nip rollers (5 and 7). This concentrates the pressure of the nip rollers on the narrow margin to assure a firm grip without slippage. Once the sheet is moving, continuing movement without slippage is assured from the momentum. This is particularly useful when the nip rollers are fixing rollers which are oiled to minimize toner transfer and when the sheet is a smooth transparency.Type: ApplicationFiled: February 24, 2000Publication date: May 1, 2003Inventors: Wayne Edward Evans, James Allen Lokovich, John Stephen Mullins, Edward Allen Rush, William Reed Summers