Patents by Inventor Joel S. Emer

Joel S. Emer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11769040
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
    Type: Grant
    Filed: July 19, 2019
    Date of Patent: September 26, 2023
    Assignee: NVIDIA CORP.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
  • Patent number: 10853276
    Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.
    Type: Grant
    Filed: June 17, 2019
    Date of Patent: December 1, 2020
    Assignee: Intel Corporation
    Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
  • Publication number: 20200082246
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
    Type: Application
    Filed: July 19, 2019
    Publication date: March 12, 2020
    Applicant: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R. Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
  • Publication number: 20190303312
    Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.
    Type: Application
    Filed: June 17, 2019
    Publication date: October 3, 2019
    Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
  • Patent number: 10331583
    Abstract: A processing device for executing distributed memory operations using spatial processing units (SPU) connected by distributed channels is disclosed. A distributed channel may or may not be associated with memory operations, such as load operations or store operations. Distributed channel information is obtained for an algorithm to be executed by a group of spatially distributed processing elements. The group of spatially distributed processing elements can be connected to a shared memory controller. For each distributed channel in the distributed channel information, one or more of the group of spatially distributed processing elements may be associated with the distributed channel based on the algorithm. By associating the spatially distributed processing elements to a distributed channel, the functionality of the processing element can vary depending on the algorithm mapped onto the SPU.
    Type: Grant
    Filed: September 26, 2013
    Date of Patent: June 25, 2019
    Assignee: Intel Corporation
    Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
  • Patent number: 10102124
    Abstract: A micro-architecture may provide a hardware and software of a high bandwidth write command. The micro-architecture may invoke a method to perform the high bandwidth write command. The method may comprise sending a write request from a requester to a record keeping structure. The write request may have a memory address of a memory that stores requested data. The method may further determine copies of the requested data being present in a distributed cache system outside the memory, sending invalidation requests to elements holding copies of the requested data in the distributed cache system, sending a notification to the requester to inform presence of copies of the requested data and sending a write response message after a latest value of the requested data and all invalidation acknowledgements have been received.
    Type: Grant
    Filed: December 28, 2011
    Date of Patent: October 16, 2018
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer, Samantika Subramaniam
  • Patent number: 9588889
    Abstract: Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: March 7, 2017
    Assignee: INTEL CORPORATION
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
  • Patent number: 9418016
    Abstract: A method and apparatus to reduce unnecessary write backs of cached data to a main memory and to optimize the usage of a cache memory tag directory. In one embodiment of the invention, the power consumption of a processor can be saved by eliminating write backs of cache memory lines that has information that has reached its end-of-life. In one embodiment of the invention, when a processing unit is required to clear one or more cache memory lines, it uses a write-zero command to clear the one or more cache memory lines. The processing unit does not perform a write operation to move or pass data values of zero to the one or more cache memory lines. By doing so, it reduces the power consumption of the processing unit.
    Type: Grant
    Filed: December 21, 2010
    Date of Patent: August 16, 2016
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., Joel S. Emer, William C. Hasenplaugh
  • Patent number: 9262327
    Abstract: An apparatus may comprise a cache file having a plurality of cache lines and a hit predictor. The hit predictor may contain a table of counter values indexed with signatures that are associated with the plurality of cache lines. The apparatus may fill cache lines into the cache file with either low or high priority. Low priority lines may be chosen to be replaced by a replacement algorithm before high priority lines. In this way, the cache naturally may contain more high priority lines than low priority ones. This priority filling process may improve the performance of most replacement schemes including the best known schemes which are already doing better than LRU.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: February 16, 2016
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Aamer Jaleel, Joel S. Emer, Carole-Jean Wu
  • Patent number: 9201792
    Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: December 1, 2015
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., Samantika Subramaniam, William C. Hasenplaugh, Joel S. Emer
  • Patent number: 9146871
    Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining a last accessor of the memory address, sending a cache probe to the last accessor, determining the last accessor no longer has a copy of the line; and sending a request for the previously accessed version of the line. The request may bypass the tag-directories and obtain the requested data from memory.
    Type: Grant
    Filed: December 28, 2011
    Date of Patent: September 29, 2015
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
  • Patent number: 9037804
    Abstract: Method and apparatus to efficiently organize data in caches by storing/accessing data of varying sizes in cache lines. A value may be assigned to a field indicating the size of usable data stored in a cache line. If the field indicating the size of the usable data in the cache line indicates a size less than the maximum storage size, a value may be assigned to a field in the cache line indicating which subset of the data in the field to store data is usable data. A cache request may determine whether the size of the usable data in a cache line is equal to the maximum data storage size. If the size of the usable data in the cache line is equal to the maximum data storage size the entire stored data in the cache line may be returned.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: May 19, 2015
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
  • Publication number: 20150089162
    Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.
    Type: Application
    Filed: September 26, 2013
    Publication date: March 26, 2015
    Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
  • Publication number: 20140215162
    Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining a last accessor of the memory address, sending a cache probe to the last accessor, determining the last accessor no longer has a copy of the line; and sending a request for the previously accessed version of the line. The request may bypass the tag-directories and obtain the requested data from memory.
    Type: Application
    Filed: December 28, 2011
    Publication date: July 31, 2014
    Inventors: Simon C. Steeley, JR., William C. Hasenplaugh, Joel S. Emer
  • Publication number: 20140201446
    Abstract: A micro-architecture may provide a hardware and software of a high bandwidth write command. The micro-architecture may invoke a method to perform the high bandwidth write command. The method may comprise sending a write request from a requester to a record keeping structure. The write request may have a memory address of a memory that stores requested data. The method may further determine copies of the requested data being present in a distributed cache system outside the memory, sending invalidation requests to elements holding copies of the requested data in the distributed cache system, sending a notification to the requester to inform presence of copies of the requested data and sending a write response message after a latest value of the requested data and all invalidation acknowledgements have been received.
    Type: Application
    Filed: December 28, 2011
    Publication date: July 17, 2014
    Inventors: Simon C. Steeley, JR., William C. Hasenplaugh, Joel S. Emer, Samantika Subramaniam
  • Patent number: 8769209
    Abstract: An apparatus and method for improving cache performance in a computer system having a multi-level cache hierarchy. For example, one embodiment of a method comprises: selecting a first line in a cache at level N for potential eviction; querying a cache at level M in the hierarchy to determine whether the first cache line is resident in the cache at level M, wherein M<N; in response to receiving an indication that the first cache line is not resident at level M, then evicting the first cache line from the cache at level N; in response to receiving an indication that the first cache line is resident at level M, then retaining the first cache line and choosing a second cache line for potential eviction.
    Type: Grant
    Filed: December 20, 2010
    Date of Patent: July 1, 2014
    Assignee: Intel Corporation
    Inventors: Aamer Jaleel, Simon C. Steely, Jr., Eric R. Borch, Malini K. Bhandaru, Joel S. Emer
  • Publication number: 20140052920
    Abstract: Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.
    Type: Application
    Filed: December 29, 2011
    Publication date: February 20, 2014
    Inventors: Simon C. Steely, JR., William C. Hasenplaugh, Joel S. Emer
  • Publication number: 20140006717
    Abstract: An apparatus may comprise a cache file having a plurality of cache lines and a hit predictor. The hit predictor may contain a table of counter values indexed with signatures that are associated with the plurality of cache lines. The apparatus may fill cache lines into the cache file with either low or high priority. Low priority lines may be chosen to be replaced by a replacement algorithm before high priority lines. In this way, the cache naturally may contain more high priority lines than low priority ones. This priority filling process may improve the performance of most replacement schemes including the best known schemes which are already doing better than LRU.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Applicant: INTEL CORPORATION
    Inventors: Simon C. STEELY, JR., William C. HASENPLAUGH, Aamer JALEEL, Joel S. EMER, Carole-Jean WU
  • Publication number: 20130326147
    Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.
    Type: Application
    Filed: December 29, 2011
    Publication date: December 5, 2013
    Inventors: Simon C. Steely, JR., Samantika Subramaniam, William C. Hasenplaugh, Joel S. Emer
  • Publication number: 20130297883
    Abstract: Method and apparatus to efficiently organize data in caches by storing/accessing data of varying sizes in cache lines. A value may be assigned to a field indicating the size of usable data stored in a cache line. If the field indicating the size of the usable data in the cache line indicates a size less than the maximum storage size, a value may be assigned to a field in the cache line indicating which subset of the data in the field to store data is usable data. A cache request may determine whether the size of the usable data in a cache line is equal to the maximum data storage size. If the size of the usable data in the cache line is equal to the maximum data storage size the entire stored data in the cache line may be returned.
    Type: Application
    Filed: December 29, 2011
    Publication date: November 7, 2013
    Inventors: Simon C. Steely, JR., William C. Hasenplaugh, Joel S. Emer