Patents by Inventor Deborah Marr

Deborah Marr has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230359695
    Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.
    Type: Application
    Filed: July 17, 2023
    Publication date: November 9, 2023
    Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
  • Publication number: 20230316058
    Abstract: An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing circuitry is coupled to the DMU. The processing circuitry comprises a plurality of processing elements including logic to read operands and a multiplication unit to multiply two or more operands for the arbitrary graph data and customizable circuitry to provide custom functions.
    Type: Application
    Filed: April 19, 2023
    Publication date: October 5, 2023
    Applicant: Intel Corporation
    Inventors: Eriko Nurvitadhi, Amit Bleiweiss, Deborah Marr, Eugene Wang, Saritha Dwarakapuram, Sabareesh Ganapathy
  • Patent number: 11636327
    Abstract: An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing circuitry is coupled to the DMU. The processing circuitry comprises a plurality of processing elements including logic to read operands and a multiplication unit to multiply two or more operands for the arbitrary graph data.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: April 25, 2023
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Amit Bleiweiss, Deborah Marr, Eugene Wang, Saritha Dwarakapuram, Sabareesh Ganapathy
  • Publication number: 20230064381
    Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.
    Type: Application
    Filed: May 9, 2022
    Publication date: March 2, 2023
    Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
  • Publication number: 20230053289
    Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
    Type: Application
    Filed: June 21, 2022
    Publication date: February 16, 2023
    Applicant: Intel Corporation
    Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
  • Patent number: 11416248
    Abstract: An apparatus and method for compressing floating-point values.
    Type: Grant
    Filed: March 28, 2020
    Date of Patent: August 16, 2022
    Assignee: INTEL CORPORATION
    Inventors: Jaewoong Sim, Alaa Alameldeen, Eriko Nurvitadhi, Deborah Marr
  • Patent number: 11373088
    Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
    Type: Grant
    Filed: December 30, 2017
    Date of Patent: June 28, 2022
    Assignee: INTEL CORPORATION
    Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
  • Patent number: 11328037
    Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.
    Type: Grant
    Filed: July 7, 2017
    Date of Patent: May 10, 2022
    Assignee: Intel Corporation
    Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
  • Publication number: 20220129763
    Abstract: An embodiment of an integrated circuit may comprise a front end unit, and circuitry coupled to the front end unit, the circuitry to provide a high confidence, multiple branch offset predictor. For example, the circuitry may be configured to identify an entry in a multiple-taken-branch prediction table that corresponds to a conditional branch instruction, determine if a confidence level of the entry exceeds a threshold confidence level, and, if so determined, provide multiple taken branch predictions that stem from the conditional branch instruction from the entry in the multiple-taken-branch prediction table. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: December 22, 2020
    Publication date: April 28, 2022
    Applicant: Intel Corporation
    Inventors: Sumeet Bandishte, Jayesh Gaur, Polychronis Xekalakis, Ariel Sabba, Deborah Marr, Sreenivas Subramoney
  • Publication number: 20220121917
    Abstract: Hardware accelerator templates and design frameworks for implementing recurrent neural networks (RNNs) and variants thereof are described. A design framework module obtains a flow graph for an RNN algorithm. The flow graph identifies operations to be performed to implement the RNN algorithm and further identifies data dependencies between ones of the operations. The operations include matrix operations and vector operations. The design framework module maps the operations of the flow graph to an accelerator hardware template, yielding an accelerator instance comprising register transfer language code that describes how one or more matrix processing units and one or more vector processing units are to be arranged to perform the RNN algorithm. At least one of the one or more MPUs, as part of implementing the RNN algorithm, is to directly provide or directly receive a value from one of the one or more VPUs.
    Type: Application
    Filed: December 28, 2021
    Publication date: April 21, 2022
    Applicant: Intel Corporation
    Inventors: Eriko NURVITADHI, Deborah MARR
  • Patent number: 11216722
    Abstract: Hardware accelerator templates and design frameworks for implementing recurrent neural networks (RNNs) and variants thereof are described. A design framework module obtains a flow graph for an RNN algorithm. The flow graph identifies operations to be performed to implement the RNN algorithm and further identifies data dependencies between ones of the operations. The operations include matrix operations and vector operations. The design framework module maps the operations of the flow graph to an accelerator hardware template, yielding an accelerator instance comprising register transfer language code that describes how one or more matrix processing units and one or more vector processing units are to be arranged to perform the RNN algorithm. At least one of the one or more MPUs, as part of implementing the RNN algorithm, is to directly provide or directly receive a value from one of the one or more VPUs.
    Type: Grant
    Filed: December 31, 2016
    Date of Patent: January 4, 2022
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Deborah Marr
  • Patent number: 10915328
    Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: February 9, 2021
    Assignee: Intel Corporation
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr
  • Patent number: 10831505
    Abstract: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.
    Type: Grant
    Filed: September 29, 2018
    Date of Patent: November 10, 2020
    Assignee: Intel Corporation
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Andrey Ayupov
  • Patent number: 10776110
    Abstract: An apparatus and method for performing efficient, adaptable tensor operations.
    Type: Grant
    Filed: September 29, 2018
    Date of Patent: September 15, 2020
    Assignee: Intel Corporation
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Asit Mishra, Steven Burns, Desmond Kirkpatrick, Andrey Ayupov, Anton Alexandrovich Sorokin, Eriko Nurvitadhi
  • Publication number: 20200225948
    Abstract: An apparatus and method for compressing floating-point values.
    Type: Application
    Filed: March 28, 2020
    Publication date: July 16, 2020
    Inventors: Jaewoong SIM, Alaa ALAMELDEEN, Eriko NURVITADHI, Deborah MARR
  • Publication number: 20200192676
    Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.
    Type: Application
    Filed: December 14, 2018
    Publication date: June 18, 2020
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr
  • Publication number: 20200104139
    Abstract: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.
    Type: Application
    Filed: September 29, 2018
    Publication date: April 2, 2020
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Andrey Ayupov
  • Publication number: 20200104126
    Abstract: An apparatus and method for performing efficient, adaptable tensor operations.
    Type: Application
    Filed: September 29, 2018
    Publication date: April 2, 2020
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Asit Mishra, Steven Burns, Desmond Kirkpatrick, Andrey Ayupov, Anton Alexandrovich Sorokin, Eriko Nurvitadhi
  • Patent number: 10387037
    Abstract: Techniques for enabling enhanced parallelism for sparse linear algebra operations having write-to-read dependencies are disclosed. A hardware processor includes a plurality of processing elements, a memory that is heavily-banked into a plurality of banks, and an arbiter. The arbiter is to receive requests from threads executing at the plurality of processing elements seeking to perform operations involving the memory, and to maintain a plurality of lock buffers corresponding to the plurality of banks. Each of the lock buffers is able to track up to a plurality of memory addresses within the corresponding bank that are to be treated as locked in that the values stored at those memory addresses cannot be updated by those of the threads that did not cause the memory addresses to be locked until those memory addresses have been removed from being tracked by the plurality of lock buffers.
    Type: Grant
    Filed: December 31, 2016
    Date of Patent: August 20, 2019
    Assignee: Intel Corporation
    Inventors: Ganesh Venkatesh, Deborah Marr
  • Patent number: 10372507
    Abstract: Techniques involving a compute engine architecture to support data-parallel loops with reduction operations are described. In some embodiments, a hardware processor includes a memory unit and a plurality of processing elements (PEs). Each of the PEs is directly coupled via one or more neighbor-to-neighbor links with one or more neighboring PEs so that each PE can receive a value from a neighboring PE, provide a value to a neighboring PE, or both receive a value from one neighboring PE and also provide a value to another neighboring PE. The hardware processor also includes a control engine coupled with the plurality of PEs that is to cause the plurality of PEs to collectively perform a task to generate one or more output values by each performing one or more iterations of a same subtask of the task.
    Type: Grant
    Filed: December 31, 2016
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventors: Ganesh Venkatesh, Deborah Marr