Patents by Inventor Simon C. Steely

Simon C. Steely has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9934146
    Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: April 3, 2018
    Assignee: INTEL CORPORATION
    Inventors: Simon C. Steely, Jr., Samantika S. Sury, William C. Hasenplaugh
  • Patent number: 9898408
    Abstract: An apparatus and method are described for a sharing aware snoop filter. For example, one embodiment of a processor comprises: a plurality of caches, each of the caches comprising a plurality of cache lines, at least some of which are to be shared by two or more of the caches; a snoop filter to monitor accesses to the plurality of cache lines shared by the two or more caches, the snoop filter comprising: a primary snoop filter comprising a first plurality of entries, each entry associated with one of the plurality of cache lines and comprising a N unique identifiers to uniquely identify up to N of the plurality of caches currently storing the cache line; an auxiliary snoop filter comprising a second plurality of entries, each entry associated with one of the plurality of cache lines, wherein once a particular cache line has been shared by more than N caches, an entry for that cache line is allocated in the auxiliary snoop filter to uniquely identify one or more additional caches storing the cache line.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: February 20, 2018
    Assignee: Intel Corporation
    Inventors: Samantika S. Sury, Robert G. Blankenship, Simon C. Steely, Jr.
  • Publication number: 20180004510
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Application
    Filed: July 2, 2016
    Publication date: January 4, 2018
    Applicant: Intel Corporation
    Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, JR.
  • Publication number: 20170357586
    Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.
    Type: Application
    Filed: June 13, 2016
    Publication date: December 14, 2017
    Inventors: SAMANTIKA S. SURY, ROBERT G. BLANKENSHIP, SIMON C. STEELY, JR.
  • Publication number: 20170351430
    Abstract: Systems, methods, and apparatuses are directed to requesting access to a memory address; storing an identification of the memory address in a data structure; receiving a first request for access to the memory address, the request comprising a reference to a second processor core; storing the reference to the second processor in the data structure; receiving a second request for access to the memory address, the second request comprising a reference to a third processor core; determining, based on the data structure, that the third processor core is different from the second processor core; and responding to the second request without buffering the second request.
    Type: Application
    Filed: June 1, 2016
    Publication date: December 7, 2017
    Inventors: Robert G. Blankenship, Simon C. Steely, JR., Samantika S. Sury
  • Publication number: 20170286299
    Abstract: An apparatus and method are described for a sharing aware snoop filter. For example, one embodiment of a processor comprises: a plurality of caches, each of the caches comprising a plurality of cache lines, at least some of which are to be shared by two or more of the caches; a snoop filter to monitor accesses to the plurality of cache lines shared by the two or more caches, the snoop filter comprising: a primary snoop filter comprising a first plurality of entries, each entry associated with one of the plurality of cache lines and comprising a N unique identifiers to uniquely identify up to N of the plurality of caches currently storing the cache line; an auxiliary snoop filter comprising a second plurality of entries, each entry associated with one of the plurality of cache lines, wherein once a particular cache line has been shared by more than N caches, an entry for that cache line is allocated in the auxiliary snoop filter to uniquely identify one or more additional caches storing the cache line.
    Type: Application
    Filed: April 1, 2016
    Publication date: October 5, 2017
    Inventors: SAMANTIKA S. SURY, ROBERT G. BLANKENSHIP, SIMON C. STEELY, JR.
  • Patent number: 9734069
    Abstract: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.
    Type: Grant
    Filed: December 11, 2014
    Date of Patent: August 15, 2017
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Samantika S. Sury
  • Patent number: 9588889
    Abstract: Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: March 7, 2017
    Assignee: INTEL CORPORATION
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
  • Patent number: 9418016
    Abstract: A method and apparatus to reduce unnecessary write backs of cached data to a main memory and to optimize the usage of a cache memory tag directory. In one embodiment of the invention, the power consumption of a processor can be saved by eliminating write backs of cache memory lines that has information that has reached its end-of-life. In one embodiment of the invention, when a processing unit is required to clear one or more cache memory lines, it uses a write-zero command to clear the one or more cache memory lines. The processing unit does not perform a write operation to move or pass data values of zero to the one or more cache memory lines. By doing so, it reduces the power consumption of the processing unit.
    Type: Grant
    Filed: December 21, 2010
    Date of Patent: August 16, 2016
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., Joel S. Emer, William C. Hasenplaugh
  • Publication number: 20160170880
    Abstract: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.
    Type: Application
    Filed: December 11, 2014
    Publication date: June 16, 2016
    Inventors: SIMON C. STEELY, JR., WILLIAM C. HASENPLAUGH, SAMANTIKA S. SURY
  • Publication number: 20160092354
    Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.
    Type: Application
    Filed: September 26, 2014
    Publication date: March 31, 2016
    Inventors: Simon C. Steely, JR., Samantika S. Sury, William C. Hasenplaugh
  • Patent number: 9262327
    Abstract: An apparatus may comprise a cache file having a plurality of cache lines and a hit predictor. The hit predictor may contain a table of counter values indexed with signatures that are associated with the plurality of cache lines. The apparatus may fill cache lines into the cache file with either low or high priority. Low priority lines may be chosen to be replaced by a replacement algorithm before high priority lines. In this way, the cache naturally may contain more high priority lines than low priority ones. This priority filling process may improve the performance of most replacement schemes including the best known schemes which are already doing better than LRU.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: February 16, 2016
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Aamer Jaleel, Joel S. Emer, Carole-Jean Wu
  • Patent number: 9251073
    Abstract: A multi core processor implements a cash coherency protocol in which probe messages are address-ordered on a probe channel while responses are un-ordered on a response channel. When a first core generates a read of an address that misses in the first core's cache, a line fill is initiated. If a second core is writing the same address, the second core generates an update on the addressed ordered probe channel. The second core's update may arrive before or after the first core's line fill returns. If the update arrived before the fill returned, a mask is maintained to indicate which portions of the line were modified by the update so that the late arriving line fill only modifies portions of the line that were unaffected by the earlier-arriving update.
    Type: Grant
    Filed: December 31, 2012
    Date of Patent: February 2, 2016
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, William C. Hasenplaugh
  • Patent number: 9201792
    Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: December 1, 2015
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., Samantika Subramaniam, William C. Hasenplaugh, Joel S. Emer
  • Patent number: 9146871
    Abstract: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining a last accessor of the memory address, sending a cache probe to the last accessor, determining the last accessor no longer has a copy of the line; and sending a request for the previously accessed version of the line. The request may bypass the tag-directories and obtain the requested data from memory.
    Type: Grant
    Filed: December 28, 2011
    Date of Patent: September 29, 2015
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
  • Patent number: 9037804
    Abstract: Method and apparatus to efficiently organize data in caches by storing/accessing data of varying sizes in cache lines. A value may be assigned to a field indicating the size of usable data stored in a cache line. If the field indicating the size of the usable data in the cache line indicates a size less than the maximum storage size, a value may be assigned to a field in the cache line indicating which subset of the data in the field to store data is usable data. A cache request may determine whether the size of the usable data in a cache line is equal to the maximum data storage size. If the size of the usable data in the cache line is equal to the maximum data storage size the entire stored data in the cache line may be returned.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: May 19, 2015
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., William C. Hasenplaugh, Joel S. Emer
  • Patent number: 8806147
    Abstract: A system comprises a first node operative to provide a source broadcast requesting data. The first node associates an F-state with a copy of the data in response to receiving the copy of the data from memory and receiving non-data responses from other nodes in the system. The non-data responses include an indication that at least a second node includes a shared copy of the data. The F-state enabling the first node to serve as an ordering point in the system capable of responding to requests from other nodes in the system with a shared copy of the data.
    Type: Grant
    Filed: December 28, 2011
    Date of Patent: August 12, 2014
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Gregory Edward Tierney, St phen R. Van Doren, Simon C. Steely, Jr.
  • Publication number: 20140189251
    Abstract: A multi core processor implements a cash coherency protocol in which probe messages are address-ordered on a probe channel while responses are un-ordered on a response channel. When a first core generates a read of an address that misses in the first core's cache, a line fill is initiated. If a second core is writing the same address, the second core generates an update on the addressed ordered probe channel. The second core's update may arrive before or after the first core's line fill returns. If the update arrived before the fill returned, a mask is maintained to indicate which portions of the line were modified by the update so that the late arriving line fill only modifies portions of the line that were unaffected by the earlier-arriving update.
    Type: Application
    Filed: December 31, 2012
    Publication date: July 3, 2014
    Inventors: Simon C. Steely, JR., William C. Hasenplaugh
  • Patent number: 8769209
    Abstract: An apparatus and method for improving cache performance in a computer system having a multi-level cache hierarchy. For example, one embodiment of a method comprises: selecting a first line in a cache at level N for potential eviction; querying a cache at level M in the hierarchy to determine whether the first cache line is resident in the cache at level M, wherein M<N; in response to receiving an indication that the first cache line is not resident at level M, then evicting the first cache line from the cache at level N; in response to receiving an indication that the first cache line is resident at level M, then retaining the first cache line and choosing a second cache line for potential eviction.
    Type: Grant
    Filed: December 20, 2010
    Date of Patent: July 1, 2014
    Assignee: Intel Corporation
    Inventors: Aamer Jaleel, Simon C. Steely, Jr., Eric R. Borch, Malini K. Bhandaru, Joel S. Emer
  • Publication number: 20140052920
    Abstract: Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.
    Type: Application
    Filed: December 29, 2011
    Publication date: February 20, 2014
    Inventors: Simon C. Steely, JR., William C. Hasenplaugh, Joel S. Emer