Patents by Inventor Joel Emer

Joel Emer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Deep neural network accelerator with fine-grained parallelism discovery

Patent number: 11966835

Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.

Type: Grant

Filed: January 23, 2019

Date of Patent: April 23, 2024

Assignee: NVIDIA CORP.

Inventors: Ching-En Lee, Yakun Shao, Angshuman Parashar, Joel Emer, Stephen W. Keckler
Efficient Neural Network Accelerator Dataflows

Publication number: 20220076110

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Application

Filed: November 19, 2021

Publication date: March 10, 2022

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
Efficient neural network accelerator dataflows

Patent number: 11270197

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Grant

Filed: November 4, 2019

Date of Patent: March 8, 2022

Assignee: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
EFFICIENT NEURAL NETWORK ACCELERATOR DATAFLOWS

Publication number: 20200293867

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Application

Filed: November 4, 2019

Publication date: September 17, 2020

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
DEEP NEURAL NETWORK ACCELERATOR WITH FINE-GRAINED PARALLELISM DISCOVERY

Publication number: 20190370645

Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.

Type: Application

Filed: January 23, 2019

Publication date: December 5, 2019

Inventors: Ching-En Lee, Yakun Shao, Angshuman Parashar, Joel Emer, Stephen W. Keckler
Hardware apparatuses and methods to control cache line coherence

Patent number: 9740617

Abstract: Methods and apparatuses to control cache line coherence are described. A hardware processor may include a first processor core with a cache to store a cache line, a second set of processor cores that each include a cache to store a copy of the cache line, and cache coherence logic to aggregate in a tag directory an acknowledgment message from each of the second set of processor cores in response to a request from the first processor core to modify the copy of the cache line in each of the second set of processor cores and send a consolidated acknowledgment message to the first processor core.

Type: Grant

Filed: December 23, 2014

Date of Patent: August 22, 2017

Assignee: Intel Corporation

Inventors: Samantika Sury, Simon Steely, Jr., William Hasenplaugh, Joel Emer, David Webb
HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCE

Publication number: 20160179674

Abstract: Methods and apparatuses to control cache line coherence are described. A hardware processor may include a first processor core with a cache to store a cache line, a second set of processor cores that each include a cache to store a copy of the cache line, and cache coherence logic to aggregate in a tag directory an acknowledgment message from each of the second set of processor cores in response to a request from the first processor core to modify the copy of the cache line in each of the second set of processor cores and send a consolidated acknowledgment message to the first processor core.

Type: Application

Filed: December 23, 2014

Publication date: June 23, 2016

Inventors: Samantika Sury, Simon Steely, JR., William Hasenplaugh, Joel Emer, David Webb
Processor scheduling with thread performance estimation on cores of different types

Patent number: 9286128

Abstract: A processor is described having an out-of-order core to execute a first thread and a non-out-of-order core to execute a second thread. The processor also includes statistics collection circuitry to support calculation of the following: the first thread's performance on the out-of-order core; an estimate of the first thread's performance on the non-out-of-order core; the second thread's performance on the non-out-of-order core; an estimate of the second thread's performance on the out-of-order core.

Type: Grant

Filed: March 15, 2013

Date of Patent: March 15, 2016

Assignee: Intel Corporation

Inventors: Aamer Jaleel, Kenzo Van Craeynest, Paolo Narvaez, Joel Emer
Processor Scheduling With Thread Performance Estimation On Core Of Different Type

Publication number: 20140282565

Abstract: A processor is described having an out-of-order core to execute a first thread and a non-out-of-order core to execute a second thread. The processor also includes statistics collection circuitry to support calculation of the following: the first thread's performance on the out-of-order core; an estimate of the first thread's performance on the non-out-of-order core; the second thread's performance on the non-out-of-order core; an estimate of the second thread's performance on the out-of-order core.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Inventors: AAMER JALEEL, KENZO VAN CRAEYNEST, PAOLO NARVAEZ, JOEL EMER
METHOD FOR DETERMINING INSTRUCTION ORDER USING TRIGGERS

Publication number: 20140201506

Abstract: A processing engine includes separate hardware components for control processing and data processing. The instruction execution order in such a processing engine may be efficiently determined in a control processing engine based on inputs received by the control processing engine. For each instruction of a data processing engine: a status of the instruction may be set to “ready” based on a trigger for the instruction and the input received in the control processing engine; and execution of the instruction in the data processing engine may be enabled if the status of the instruction is set to “ready” and at least one processing element of the data processing engine is available. The trigger for each instruction may be a function of one or more predicate register of the control processing engine, FIFO status signals or information regarding tags.

Type: Application

Filed: December 30, 2011

Publication date: July 17, 2014

Inventors: Angshuman Parashar, Michael Pellauer, Michael Adler, Joel Emer
Technique for controlling computing resources

Patent number: 8769201

Abstract: A technique to enable resource allocation optimization within a computer system. In one embodiment, a gradient partition algorithm (GPA) module is used to continually measure performance and adjust allocation to shared resources among a plurality of data classes in order to achieve optimal performance.

Type: Grant

Filed: December 2, 2008

Date of Patent: July 1, 2014

Assignee: Intel Corporation

Inventors: William Hasenplaugh, Joel Emer, Tryggve Fossum, Aamer Jaleel, Simon Steely
Implementing vector memory operations

Patent number: 8707012

Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.

Type: Grant

Filed: October 12, 2012

Date of Patent: April 22, 2014

Assignee: Intel Corporation

Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
Implementing Vector Memory Operations

Publication number: 20130036268

Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.

Type: Application

Filed: October 12, 2012

Publication date: February 7, 2013

Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
Implementing vector memory operations

Patent number: 8316216

Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.

Type: Grant

Filed: October 21, 2009

Date of Patent: November 20, 2012

Assignee: Intel Corporation

Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
Technique for controlling computing resources

Publication number: 20100138609

Abstract: A technique to enable resource allocation optimization within a computer system. In one embodiment, a gradient partition algorithm (GPA) module is used to continually measure performance and adjust allocation to shared resources among a plurality of data classes in order to achieve optimal performance.

Type: Application

Filed: December 2, 2008

Publication date: June 3, 2010

Inventors: William Hasenplaugh, Joel Emer, Tryggve Fossum, Aamer Jaleel, Simon Steely
Implementing Vector Memory Operations

Publication number: 20100042779

Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.

Type: Application

Filed: October 21, 2009

Publication date: February 18, 2010

Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
Implementing vector memory operations

Patent number: 7627735

Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.

Type: Grant

Filed: October 21, 2005

Date of Patent: December 1, 2009

Assignee: Intel Corporation

Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
Apparatus and method for partitioning a shared cache of a chip multi-processor

Patent number: 7558920

Abstract: A method and apparatus for partitioning a shared cache of a chip multi-processor are described. In one embodiment, the method includes a request of a cache block from system memory if a cache miss within a shared cache is detected according to a received request from a processor. Once the cache block is requested, a victim block within the shared cache is selected according to a processor identifier and a request type of the received request. In one embodiment, selection of the victim block according to a processor identifier and request type is based on a partition of a set-associative, shared cache to limit the selection of the victim block from a subset of available cache ways according to the cache partition. Other embodiments are described and claimed.

Type: Grant

Filed: June 30, 2004

Date of Patent: July 7, 2009

Assignee: Intel Corporation

Inventors: Matthew Mattina, Antonio Juan-Hormigo, Joel Emer, Ramon Matas-Navarro
Implementing vector memory operations

Publication number: 20070094477

Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.

Type: Application

Filed: October 21, 2005

Publication date: April 26, 2007

Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
Reducing the uncorrectable error rate in a lockstepped dual-modular redundancy system

Publication number: 20070022348

Abstract: Embodiments of apparatuses and methods for reducing the uncorrectable error rate in a lockstepped dual-modular redundancy system are disclosed. In one embodiment, an apparatus includes two processor cores, a micro-checker, a global checker, and fault logic. The micro-checker is to detect whether a value from a structure in one core matches a value from the corresponding structure in the other core. The global checker is to detect lockstep failures between the two cores. The fault logic is to cause the two cores to be resynchronized if there is a lockstep error but the micro-checker has detected a mismatch.

Type: Application

Filed: June 30, 2005

Publication date: January 25, 2007

Inventors: Paul Racunas, Joel Emer, Arijit Biswas, Shubhendu Mukherjee, Steven Raasch

1 2 next