Patents by Inventor Joel S. Emer
Joel S. Emer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11769040Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.Type: GrantFiled: July 19, 2019Date of Patent: September 26, 2023Assignee: NVIDIA CORP.Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
-
Patent number: 10853276Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.Type: GrantFiled: June 17, 2019Date of Patent: December 1, 2020Assignee: Intel CorporationInventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
-
Publication number: 20200082246Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.Type: ApplicationFiled: July 19, 2019Publication date: March 12, 2020Applicant: NVIDIA Corp.Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R. Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
-
Publication number: 20190303312Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.Type: ApplicationFiled: June 17, 2019Publication date: October 3, 2019Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
-
Patent number: 10331583Abstract: A processing device for executing distributed memory operations using spatial processing units (SPU) connected by distributed channels is disclosed. A distributed channel may or may not be associated with memory operations, such as load operations or store operations. Distributed channel information is obtained for an algorithm to be executed by a group of spatially distributed processing elements. The group of spatially distributed processing elements can be connected to a shared memory controller. For each distributed channel in the distributed channel information, one or more of the group of spatially distributed processing elements may be associated with the distributed channel based on the algorithm. By associating the spatially distributed processing elements to a distributed channel, the functionality of the processing element can vary depending on the algorithm mapped onto the SPU.Type: GrantFiled: September 26, 2013Date of Patent: June 25, 2019Assignee: Intel CorporationInventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
-
Patent number: 10102124Abstract: A micro-architecture may provide a hardware and software of a high bandwidth write command. The micro-architecture may invoke a method to perform the high bandwidth write command. The method may comprise sending a write request from a requester to a record keeping structure. The write request may have a memory address of a memory that stores requested data. The method may further determine copies of the requested data being present in a distributed cache system outside the memory, sending invalidation requests to elements holding copies of the requested data in the distributed cache system, sending a notification to the requester to inform presence of copies of the requested data and sending a write response message after a latest value of the requested data and all invalidation acknowledgements have been received.Type: GrantFiled: December 28, 2011Date of Patent: October 16, 2018Assignee: Intel CorporationInventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer, Samantika Subramaniam
-
Patent number: 9588889Abstract: Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.Type: GrantFiled: December 29, 2011Date of Patent: March 7, 2017Assignee: INTEL CORPORATIONInventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
-
Patent number: 9418016Abstract: A method and apparatus to reduce unnecessary write backs of cached data to a main memory and to optimize the usage of a cache memory tag directory. In one embodiment of the invention, the power consumption of a processor can be saved by eliminating write backs of cache memory lines that has information that has reached its end-of-life. In one embodiment of the invention, when a processing unit is required to clear one or more cache memory lines, it uses a write-zero command to clear the one or more cache memory lines. The processing unit does not perform a write operation to move or pass data values of zero to the one or more cache memory lines. By doing so, it reduces the power consumption of the processing unit.Type: GrantFiled: December 21, 2010Date of Patent: August 16, 2016Assignee: Intel CorporationInventors: Simon C. Steely, Jr., Joel S. Emer, William C. Hasenplaugh
-
Patent number: 9262327Abstract: An apparatus may comprise a cache file having a plurality of cache lines and a hit predictor. The hit predictor may contain a table of counter values indexed with signatures that are associated with the plurality of cache lines. The apparatus may fill cache lines into the cache file with either low or high priority. Low priority lines may be chosen to be replaced by a replacement algorithm before high priority lines. In this way, the cache naturally may contain more high priority lines than low priority ones. This priority filling process may improve the performance of most replacement schemes including the best known schemes which are already doing better than LRU.Type: GrantFiled: June 29, 2012Date of Patent: February 16, 2016Assignee: Intel CorporationInventors: Simon C. Steely, Jr., William C. Hasenplaugh, Aamer Jaleel, Joel S. Emer, Carole-Jean Wu
-
Patent number: 9201792Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.Type: GrantFiled: December 29, 2011Date of Patent: December 1, 2015Assignee: Intel CorporationInventors: Simon C. Steely, Jr., Samantika Subramaniam, William C. Hasenplaugh, Joel S. Emer
-
Patent number: 9146871Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining a last accessor of the memory address, sending a cache probe to the last accessor, determining the last accessor no longer has a copy of the line; and sending a request for the previously accessed version of the line. The request may bypass the tag-directories and obtain the requested data from memory.Type: GrantFiled: December 28, 2011Date of Patent: September 29, 2015Assignee: Intel CorporationInventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
-
Patent number: 9037804Abstract: Method and apparatus to efficiently organize data in caches by storing/accessing data of varying sizes in cache lines. A value may be assigned to a field indicating the size of usable data stored in a cache line. If the field indicating the size of the usable data in the cache line indicates a size less than the maximum storage size, a value may be assigned to a field in the cache line indicating which subset of the data in the field to store data is usable data. A cache request may determine whether the size of the usable data in a cache line is equal to the maximum data storage size. If the size of the usable data in the cache line is equal to the maximum data storage size the entire stored data in the cache line may be returned.Type: GrantFiled: December 29, 2011Date of Patent: May 19, 2015Assignee: Intel CorporationInventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
-
Publication number: 20150089162Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.Type: ApplicationFiled: September 26, 2013Publication date: March 26, 2015Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
-
Publication number: 20140215162Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining a last accessor of the memory address, sending a cache probe to the last accessor, determining the last accessor no longer has a copy of the line; and sending a request for the previously accessed version of the line. The request may bypass the tag-directories and obtain the requested data from memory.Type: ApplicationFiled: December 28, 2011Publication date: July 31, 2014Inventors: Simon C. Steeley, JR., William C. Hasenplaugh, Joel S. Emer
-
Publication number: 20140201446Abstract: A micro-architecture may provide a hardware and software of a high bandwidth write command. The micro-architecture may invoke a method to perform the high bandwidth write command. The method may comprise sending a write request from a requester to a record keeping structure. The write request may have a memory address of a memory that stores requested data. The method may further determine copies of the requested data being present in a distributed cache system outside the memory, sending invalidation requests to elements holding copies of the requested data in the distributed cache system, sending a notification to the requester to inform presence of copies of the requested data and sending a write response message after a latest value of the requested data and all invalidation acknowledgements have been received.Type: ApplicationFiled: December 28, 2011Publication date: July 17, 2014Inventors: Simon C. Steeley, JR., William C. Hasenplaugh, Joel S. Emer, Samantika Subramaniam
-
Patent number: 8769209Abstract: An apparatus and method for improving cache performance in a computer system having a multi-level cache hierarchy. For example, one embodiment of a method comprises: selecting a first line in a cache at level N for potential eviction; querying a cache at level M in the hierarchy to determine whether the first cache line is resident in the cache at level M, wherein M<N; in response to receiving an indication that the first cache line is not resident at level M, then evicting the first cache line from the cache at level N; in response to receiving an indication that the first cache line is resident at level M, then retaining the first cache line and choosing a second cache line for potential eviction.Type: GrantFiled: December 20, 2010Date of Patent: July 1, 2014Assignee: Intel CorporationInventors: Aamer Jaleel, Simon C. Steely, Jr., Eric R. Borch, Malini K. Bhandaru, Joel S. Emer
-
Publication number: 20140052920Abstract: Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.Type: ApplicationFiled: December 29, 2011Publication date: February 20, 2014Inventors: Simon C. Steely, JR., William C. Hasenplaugh, Joel S. Emer
-
Publication number: 20140006717Abstract: An apparatus may comprise a cache file having a plurality of cache lines and a hit predictor. The hit predictor may contain a table of counter values indexed with signatures that are associated with the plurality of cache lines. The apparatus may fill cache lines into the cache file with either low or high priority. Low priority lines may be chosen to be replaced by a replacement algorithm before high priority lines. In this way, the cache naturally may contain more high priority lines than low priority ones. This priority filling process may improve the performance of most replacement schemes including the best known schemes which are already doing better than LRU.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Applicant: INTEL CORPORATIONInventors: Simon C. STEELY, JR., William C. HASENPLAUGH, Aamer JALEEL, Joel S. EMER, Carole-Jean WU
-
Publication number: 20130326147Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.Type: ApplicationFiled: December 29, 2011Publication date: December 5, 2013Inventors: Simon C. Steely, JR., Samantika Subramaniam, William C. Hasenplaugh, Joel S. Emer
-
Publication number: 20130297883Abstract: Method and apparatus to efficiently organize data in caches by storing/accessing data of varying sizes in cache lines. A value may be assigned to a field indicating the size of usable data stored in a cache line. If the field indicating the size of the usable data in the cache line indicates a size less than the maximum storage size, a value may be assigned to a field in the cache line indicating which subset of the data in the field to store data is usable data. A cache request may determine whether the size of the usable data in a cache line is equal to the maximum data storage size. If the size of the usable data in the cache line is equal to the maximum data storage size the entire stored data in the cache line may be returned.Type: ApplicationFiled: December 29, 2011Publication date: November 7, 2013Inventors: Simon C. Steely, JR., William C. Hasenplaugh, Joel S. Emer