Patents by Inventor Deborah Marr

Deborah Marr has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MACHINE LEARNING SPARSE COMPUTATION MECHANISM FOR ARBITRARY NEURAL NETWORKS, ARITHMETIC COMPUTE MICROARCHITECTURE, AND SPARSITY FOR TRAINING MECHANISM

Publication number: 20190205746

Abstract: An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing circuitry is coupled to the DMU. The processing circuitry comprises a plurality of processing elements including logic to read operands and a multiplication unit to multiply two or more operands for the arbitrary graph data.

Type: Application

Filed: December 29, 2017

Publication date: July 4, 2019

Applicant: Intel Corporation

Inventors: Eriko Nurvitadhi, Amit Bleiweiss, Deborah Marr, Eugene Wang, Saritha Dwarakapuram, Sabareesh Ganapathy
MACHINE LEARNING ACCELERATOR MECHANISM

Publication number: 20190205737

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Application

Filed: December 30, 2017

Publication date: July 4, 2019

Applicant: Intel Corporation

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
Heterogeneous hardware accelerator architecture for processing sparse matrix data with skewed non-zero distributions

Patent number: 10180928

Abstract: Heterogeneous hardware accelerator architectures for processing sparse matrix data having skewed non-zero distributions are described. An accelerator includes sparse tiles to access data from a first memory over a high bandwidth interface and very/hyper sparse tiles to randomly access data from a second memory over a low-latency interface. The accelerator determines that one or more computational tasks involving a matrix are to be performed, partitions the matrix into a first plurality of blocks that includes one or more sparse sections of the matrix, and a second plurality of blocks that includes sections of the matrix that are very- or hyper-sparse. The accelerator causes the sparse tile(s) to perform one or more matrix operations for the computational task(s) using the first plurality of blocks and further causes the very/hyper sparse tile(s) to perform the one or more matrix operations for the computational task(s) using the second plurality of blocks.

Type: Grant

Filed: December 31, 2016

Date of Patent: January 15, 2019

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Deborah Marr
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Publication number: 20190012295

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Application

Filed: July 7, 2017

Publication date: January 10, 2019

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
Hardware accelerator architecture for processing very-sparse and hyper-sparse matrix data

Patent number: 10146738

Abstract: An accelerator architecture for processing very-sparse and hyper-sparse matrix data is disclosed. A hardware accelerator comprises one or more tiles, each including a plurality of processing elements (PEs) and a data management unit (DMU). The PEs are to perform matrix operations involving very- or hyper-sparse matrices that are stored by a memory. The DMU is to provide the plurality of PEs access to the memory via an interface that is optimized to provide low-latency, parallel, random accesses to the memory. The PEs, via the DMU, perform the matrix operations by, issuing random access read requests for values of the one or more matrices, issuing random access read requests for values of one or more vectors serving as a second operand, and issuing random access write requests for values of one or more vectors serving as a result.

Type: Grant

Filed: December 31, 2016

Date of Patent: December 4, 2018

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Deborah Marr
HARDWARE ACCELERATOR ARCHITECTURE FOR PROCESSING VERY-SPARSE AND HYPER-SPARSE MATRIX DATA

Publication number: 20180189234

Abstract: An accelerator architecture for processing very-sparse and hyper-sparse matrix data is disclosed. A hardware accelerator comprises one or more tiles, each including a plurality of processing elements (PEs) and a data management unit (DMU). The PEs are to perform matrix operations involving very- or hyper-sparse matrices that are stored by a memory. The DMU is to provide the plurality of PEs access to the memory via an interface that is optimized to provide low-latency, parallel, random accesses to the memory. The PEs, via the DMU, perform the matrix operations by, issuing random access read requests for values of the one or more matrices, issuing random access read requests for values of one or more vectors serving as a second operand, and issuing random access write requests for values of one or more vectors serving as a result.

Type: Application

Filed: December 31, 2016

Publication date: July 5, 2018

Inventors: Eriko NURVITADHI, Deborah MARR
HARDWARE ACCELERATOR ARCHITECTURE AND TEMPLATE FOR WEB-SCALE K-MEANS CLUSTERING

Publication number: 20180189675

Abstract: Hardware accelerator architectures for clustering are described. A hardware accelerator includes sparse tiles and very/hyper sparse tiles. The sparse tile(s) execute operations for a clustering task involving a matrix. Each sparse tile includes a first plurality of processing units to operate upon a first plurality of blocks of the matrix that have been streamed to one or more random access memories of the sparse tiles over a high bandwidth interface from a first memory unit. Each of the very/hyper sparse tiles are to execute operations for the clustering task involving the matrix. Each of the very/hyper sparse tiles includes a second plurality of processing units to operate upon a second plurality of blocks of the matrix that have been randomly accessed over a low-latency interface from a second memory unit.

Type: Application

Filed: December 31, 2016

Publication date: July 5, 2018

Inventors: Eriko NURVITADHI, Ganesh VENKATESH, Srivatsan KRISHNAN, Suchit SUBHASCHANDRA, Deborah MARR
HARDWARE ACCELERATOR TEMPLATE AND DESIGN FRAMEWORK FOR IMPLEMENTING RECURRENT NEURAL NETWORKS

Publication number: 20180189638

Abstract: Hardware accelerator templates and design frameworks for implementing recurrent neural networks (RNNs) and variants thereof are described. A design framework module obtains a flow graph for an RNN algorithm. The flow graph identifies operations to be performed to implement the RNN algorithm and further identifies data dependencies between ones of the operations. The operations include matrix operations and vector operations. The design framework module maps the operations of the flow graph to an accelerator hardware template, yielding an accelerator instance comprising register transfer language code that describes how one or more matrix processing units and one or more vector processing units are to be arranged to perform the RNN algorithm. At least one of the one or more MPUs, as part of implementing the RNN algorithm, is to directly provide or directly receive a value from one of the one or more VPUs.

Type: Application

Filed: December 31, 2016

Publication date: July 5, 2018

Inventors: Eriko NURVITADHI, Deborah MARR
COMPUTE ENGINE ARCHITECTURE TO SUPPORT DATA-PARALLEL LOOPS WITH REDUCTION OPERATIONS

Publication number: 20180189110

Abstract: Techniques involving a compute engine architecture to support data-parallel loops with reduction operations are described. In some embodiments, a hardware processor includes a memory unit and a plurality of processing elements (PEs). Each of the PEs is directly coupled via one or more neighbor-to-neighbor links with one or more neighboring PEs so that each PE can receive a value from a neighboring PE, provide a value to a neighboring PE, or both receive a value from one neighboring PE and also provide a value to another neighboring PE. The hardware processor also includes a control engine coupled with the plurality of PEs that is to cause the plurality of PEs to collectively perform a task to generate one or more output values by each performing one or more iterations of a same subtask of the task.

Type: Application

Filed: December 31, 2016

Publication date: July 5, 2018

Inventors: Ganesh VENKATESH, Deborah MARR
MICROARCHITECTURE ENABLING ENHANCED PARALLELISM FOR SPARSE LINEAR ALGEBRA OPERATIONS HAVING WRITE-TO-READ DEPENDENCIES

Publication number: 20180188961

Abstract: Techniques for enabling enhanced parallelism for sparse linear algebra operations having write-to-read dependencies are disclosed. A hardware processor includes a plurality of processing elements, a memory that is heavily-banked into a plurality of banks, and an arbiter. The arbiter is to receive requests from threads executing at the plurality of processing elements seeking to perform operations involving the memory, and to maintain a plurality of lock buffers corresponding to the plurality of banks. Each of the lock buffers is able to track up to a plurality of memory addresses within the corresponding bank that are to be treated as locked in that the values stored at those memory addresses cannot be updated by those of the threads that did not cause the memory addresses to be locked until those memory addresses have been removed from being tracked by the plurality of lock buffers.

Type: Application

Filed: December 31, 2016

Publication date: July 5, 2018

Inventors: Ganesh VENKATESH, Deborah MARR
HETEROGENEOUS HARDWARE ACCELERATOR ARCHITECTURE FOR PROCESSING SPARSE MATRIX DATA WITH SKEWED NON-ZERO DISTRIBUTIONS

Publication number: 20180189239

Abstract: Heterogeneous hardware accelerator architectures for processing sparse matrix data having skewed non-zero distributions are described. An accelerator includes sparse tiles to access data from a first memory over a high bandwidth interface and very/hyper sparse tiles to randomly access data from a second memory over a low-latency interface. The accelerator determines that one or more computational tasks involving a matrix are to be performed, partitions the matrix into a first plurality of blocks that includes one or more sparse sections of the matrix, and a second plurality of blocks that includes sections of the matrix that are very- or hyper-sparse. The accelerator causes the sparse tile(s) to perform one or more matrix operations for the computational task(s) using the first plurality of blocks and further causes the very/hyper sparse tile(s) to perform the one or more matrix operations for the computational task(s) using the second plurality of blocks.

Type: Application

Filed: December 31, 2016

Publication date: July 5, 2018

Inventors: Eriko NURVITADHI, Deborah MARR
Method and apparatus for suspending execution of a thread until a specified memory access occurs

Publication number: 20080034190

Abstract: Techniques for suspending execution of a thread until a specified memory access occurs. In one embodiment, a processor includes multiple execution units capable of executing multiple threads. A first thread includes an instruction that specifies a monitor address. Suspend logic suspends execution of the first thread, and a monitor causes resumption of the first thread in response to an access to the specified monitor address.

Type: Application

Filed: August 8, 2007

Publication date: February 7, 2008

Inventors: Dion Rodgers, Deborah Marr, David Hill, Shiv Kaushik, James Crossland, David Koufaty

prev 1 2