Patents by Inventor Deborah Marr

Deborah Marr has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Publication number: 20230359695

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Application

Filed: July 17, 2023

Publication date: November 9, 2023

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
MACHINE LEARNING SPARSE COMPUTATION MECHANISM FOR ARBITRARY NEURAL NETWORKS, ARITHMETIC COMPUTE MICROARCHITECTURE, AND SPARSITY FOR TRAINING MECHANISM

Publication number: 20230316058

Abstract: An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing circuitry is coupled to the DMU. The processing circuitry comprises a plurality of processing elements including logic to read operands and a multiplication unit to multiply two or more operands for the arbitrary graph data and customizable circuitry to provide custom functions.

Type: Application

Filed: April 19, 2023

Publication date: October 5, 2023

Applicant: Intel Corporation

Inventors: Eriko Nurvitadhi, Amit Bleiweiss, Deborah Marr, Eugene Wang, Saritha Dwarakapuram, Sabareesh Ganapathy
Machine learning sparse computation mechanism for arbitrary neural networks, arithmetic compute microarchitecture, and sparsity for training mechanism

Patent number: 11636327

Abstract: An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing circuitry is coupled to the DMU. The processing circuitry comprises a plurality of processing elements including logic to read operands and a multiplication unit to multiply two or more operands for the arbitrary graph data.

Type: Grant

Filed: December 29, 2017

Date of Patent: April 25, 2023

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Amit Bleiweiss, Deborah Marr, Eugene Wang, Saritha Dwarakapuram, Sabareesh Ganapathy
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Publication number: 20230064381

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Application

Filed: May 9, 2022

Publication date: March 2, 2023

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
MACHINE LEARNING ACCELERATOR MECHANISM

Publication number: 20230053289

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Application

Filed: June 21, 2022

Publication date: February 16, 2023

Applicant: Intel Corporation

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
Method and system for efficient floating-point compression

Patent number: 11416248

Abstract: An apparatus and method for compressing floating-point values.

Type: Grant

Filed: March 28, 2020

Date of Patent: August 16, 2022

Assignee: INTEL CORPORATION

Inventors: Jaewoong Sim, Alaa Alameldeen, Eriko Nurvitadhi, Deborah Marr
Machine learning accelerator mechanism

Patent number: 11373088

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Grant

Filed: December 30, 2017

Date of Patent: June 28, 2022

Assignee: INTEL CORPORATION

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
Memory-size- and bandwidth-efficient method for feeding systolic array matrix multipliers

Patent number: 11328037

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Grant

Filed: July 7, 2017

Date of Patent: May 10, 2022

Assignee: Intel Corporation

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
HIGH CONFIDENCE MULTIPLE BRANCH OFFSET PREDICTOR

Publication number: 20220129763

Abstract: An embodiment of an integrated circuit may comprise a front end unit, and circuitry coupled to the front end unit, the circuitry to provide a high confidence, multiple branch offset predictor. For example, the circuitry may be configured to identify an entry in a multiple-taken-branch prediction table that corresponds to a conditional branch instruction, determine if a confidence level of the entry exceeds a threshold confidence level, and, if so determined, provide multiple taken branch predictions that stem from the conditional branch instruction from the entry in the multiple-taken-branch prediction table. Other embodiments are disclosed and claimed.

Type: Application

Filed: December 22, 2020

Publication date: April 28, 2022

Applicant: Intel Corporation

Inventors: Sumeet Bandishte, Jayesh Gaur, Polychronis Xekalakis, Ariel Sabba, Deborah Marr, Sreenivas Subramoney
HARDWARE ACCELERATOR TEMPLATE AND DESIGN FRAMEWORK FOR IMPLEMENTING RECURRENT NEURAL NETWORKS

Publication number: 20220121917

Abstract: Hardware accelerator templates and design frameworks for implementing recurrent neural networks (RNNs) and variants thereof are described. A design framework module obtains a flow graph for an RNN algorithm. The flow graph identifies operations to be performed to implement the RNN algorithm and further identifies data dependencies between ones of the operations. The operations include matrix operations and vector operations. The design framework module maps the operations of the flow graph to an accelerator hardware template, yielding an accelerator instance comprising register transfer language code that describes how one or more matrix processing units and one or more vector processing units are to be arranged to perform the RNN algorithm. At least one of the one or more MPUs, as part of implementing the RNN algorithm, is to directly provide or directly receive a value from one of the one or more VPUs.

Type: Application

Filed: December 28, 2021

Publication date: April 21, 2022

Applicant: Intel Corporation

Inventors: Eriko NURVITADHI, Deborah MARR
Hardware accelerator template and design framework for implementing recurrent neural networks

Patent number: 11216722

Abstract: Hardware accelerator templates and design frameworks for implementing recurrent neural networks (RNNs) and variants thereof are described. A design framework module obtains a flow graph for an RNN algorithm. The flow graph identifies operations to be performed to implement the RNN algorithm and further identifies data dependencies between ones of the operations. The operations include matrix operations and vector operations. The design framework module maps the operations of the flow graph to an accelerator hardware template, yielding an accelerator instance comprising register transfer language code that describes how one or more matrix processing units and one or more vector processing units are to be arranged to perform the RNN algorithm. At least one of the one or more MPUs, as part of implementing the RNN algorithm, is to directly provide or directly receive a value from one of the one or more VPUs.

Type: Grant

Filed: December 31, 2016

Date of Patent: January 4, 2022

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Deborah Marr
Apparatus and method for a high throughput parallel co-processor and interconnect with low offload latency

Patent number: 10915328

Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.

Type: Grant

Filed: December 14, 2018

Date of Patent: February 9, 2021

Assignee: Intel Corporation

Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr
Architecture and method for data parallel single program multiple data (SPMD) execution

Patent number: 10831505

Abstract: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.

Type: Grant

Filed: September 29, 2018

Date of Patent: November 10, 2020

Assignee: Intel Corporation

Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Andrey Ayupov
Apparatus and method for adaptable and efficient lane-wise tensor processing

Patent number: 10776110

Abstract: An apparatus and method for performing efficient, adaptable tensor operations.

Type: Grant

Filed: September 29, 2018

Date of Patent: September 15, 2020

Assignee: Intel Corporation

Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Asit Mishra, Steven Burns, Desmond Kirkpatrick, Andrey Ayupov, Anton Alexandrovich Sorokin, Eriko Nurvitadhi
Method and System for Efficient Floating-Point Compression

Publication number: 20200225948

Abstract: An apparatus and method for compressing floating-point values.

Type: Application

Filed: March 28, 2020

Publication date: July 16, 2020

Inventors: Jaewoong SIM, Alaa ALAMELDEEN, Eriko NURVITADHI, Deborah MARR
APPARATUS AND METHOD FOR A HIGH THROUGHPUT PARALLEL CO-PROCESSOR AND INTERCONNECT WITH LOW OFFLOAD LATENCY

Publication number: 20200192676

Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.

Type: Application

Filed: December 14, 2018

Publication date: June 18, 2020

Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr
ARCHITECTURE AND METHOD FOR DATA PARALLEL SINGLE PROGRAM MULTIPLE DATA (SPMD) EXECUTION

Publication number: 20200104139

Abstract: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.

Type: Application

Filed: September 29, 2018

Publication date: April 2, 2020

Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Andrey Ayupov
APPARATUS AND METHOD FOR ADAPTABLE AND EFFICIENT LANE-WISE TENSOR PROCESSING

Publication number: 20200104126

Abstract: An apparatus and method for performing efficient, adaptable tensor operations.

Type: Application

Filed: September 29, 2018

Publication date: April 2, 2020

Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Asit Mishra, Steven Burns, Desmond Kirkpatrick, Andrey Ayupov, Anton Alexandrovich Sorokin, Eriko Nurvitadhi
Microarchitecture enabling enhanced parallelism for sparse linear algebra operations having write-to-read dependencies

Patent number: 10387037

Abstract: Techniques for enabling enhanced parallelism for sparse linear algebra operations having write-to-read dependencies are disclosed. A hardware processor includes a plurality of processing elements, a memory that is heavily-banked into a plurality of banks, and an arbiter. The arbiter is to receive requests from threads executing at the plurality of processing elements seeking to perform operations involving the memory, and to maintain a plurality of lock buffers corresponding to the plurality of banks. Each of the lock buffers is able to track up to a plurality of memory addresses within the corresponding bank that are to be treated as locked in that the values stored at those memory addresses cannot be updated by those of the threads that did not cause the memory addresses to be locked until those memory addresses have been removed from being tracked by the plurality of lock buffers.

Type: Grant

Filed: December 31, 2016

Date of Patent: August 20, 2019

Assignee: Intel Corporation

Inventors: Ganesh Venkatesh, Deborah Marr
Compute engine architecture to support data-parallel loops with reduction operations

Patent number: 10372507

Abstract: Techniques involving a compute engine architecture to support data-parallel loops with reduction operations are described. In some embodiments, a hardware processor includes a memory unit and a plurality of processing elements (PEs). Each of the PEs is directly coupled via one or more neighbor-to-neighbor links with one or more neighboring PEs so that each PE can receive a value from a neighboring PE, provide a value to a neighboring PE, or both receive a value from one neighboring PE and also provide a value to another neighboring PE. The hardware processor also includes a control engine coupled with the plurality of PEs that is to cause the plurality of PEs to collectively perform a task to generate one or more output values by each performing one or more iterations of a same subtask of the task.

Type: Grant

Filed: December 31, 2016

Date of Patent: August 6, 2019

Assignee: Intel Corporation

Inventors: Ganesh Venkatesh, Deborah Marr

1 2 next