Patents by Inventor Steven R. Kunkel

Steven R. Kunkel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7716424
    Abstract: We present a “directory extension” (hereinafter “DX”) to aid in prefetching between proximate levels in a cache hierarchy. The DX may maintain (1) a list of pages which contains recently ejected lines from a given level in the cache hierarchy, and (2) for each page in this list, the identity of a set of ejected lines, provided these lines are prefetchable from, for example, the next level of the cache hierarchy. Given a cache fault to a line within a page in this list, other lines from this page may then be prefetched without the substantial overhead to directory lookup which would otherwise be required.
    Type: Grant
    Filed: November 16, 2004
    Date of Patent: May 11, 2010
    Assignee: International Business Machines Corporation
    Inventors: Peter A. Franaszek, Steven R. Kunkel, Luis Alfonso Lastras Montaño, Aaron C. Sawdey
  • Publication number: 20090319726
    Abstract: A system and method of a region coherence protocol for use in Region Coherence Arrays (RCAs) deployed in clustered shared-memory multiprocessor systems which optimize cache-to-cache transfers by allowing broadcast memory requests to be provided to only a portion of a clustered shared-memory multiprocessor system. Interconnect hierarchy levels can be devised for logical groups of processors, processors on the same chip, processors on chips aggregated into a multichip module, multichip modules on the same printed circuit board, and for processors on other printed circuit boards or in other cabinets.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 24, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jason F. Cantin, Steven R. Kunkel
  • Patent number: 7620776
    Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes.
    Type: Grant
    Filed: December 12, 2007
    Date of Patent: November 17, 2009
    Assignee: International Business Machines Corporation
    Inventors: Jason Frederick Cantin, Steven R. Kunkel
  • Publication number: 20080215821
    Abstract: A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.
    Type: Application
    Filed: April 15, 2008
    Publication date: September 4, 2008
    Inventors: Jason F. Cantin, James S. Fields, Steven R. Kunkel, William J. Starke
  • Publication number: 20080215819
    Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is defined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory read requests to read data in a cache line that is not currently in the shared invalid state are broadcast first to remote nodes.
    Type: Application
    Filed: April 22, 2008
    Publication date: September 4, 2008
    Inventors: Jason Frederick Cantin, Steven R. Kunkel
  • Patent number: 7395376
    Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is declined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory read requests to read data in a cache line that is not currently in tile shared invalid state are broadcast first to remote nodes.
    Type: Grant
    Filed: July 19, 2005
    Date of Patent: July 1, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jason Frederick Cantin, Steven R. Kunkel
  • Publication number: 20080147987
    Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes.
    Type: Application
    Filed: December 12, 2007
    Publication date: June 19, 2008
    Inventors: JASON FREDERICK CANTIN, Steven R. Kunkel
  • Patent number: 7389388
    Abstract: A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.
    Type: Grant
    Filed: February 10, 2005
    Date of Patent: June 17, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jason F. Cantin, James S. Fields, Jr., Steven R. Kunkel, William J. Starke
  • Patent number: 7360032
    Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes.
    Type: Grant
    Filed: July 19, 2005
    Date of Patent: April 15, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jason Frederick Cantin, Steven R. Kunkel
  • Patent number: 7194586
    Abstract: A method and apparatus are provided for implementing a cache state as history of read/write shared data for a cache in a shared memory multiple processor computer system. An invalid temporary state for a cache line is provided in addition to modified, exclusive, shared, and invalid states. The invalid temporary state is entered when a cache releases a modified cache line to another processor. The invalid temporary state is used to enable effective optimizations within cache coherent symmetric multiprocessor (SMP) systems of an SMP caching hierarchy with distributed caches with different caching coherency traffic profiles for both commercial and technical workloads.
    Type: Grant
    Filed: September 20, 2002
    Date of Patent: March 20, 2007
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey Douglas Brown, John David Irish, Steven R. Kunkel
  • Patent number: 7047365
    Abstract: A method and apparatus for purging a cache line from an issuing processor and sending the cache line to the cache of one or more processors in a multi-processor shared memory computer system. The method and apparatus enables cache line data to be moved from one processor to another before the receiving processor needs the data thus preventing the receiving processor from incurring a cache miss event.
    Type: Grant
    Filed: January 22, 2002
    Date of Patent: May 16, 2006
    Assignee: International Business Machines Corporation
    Inventors: Steven R. Kunkel, David Arnold Luick
  • Patent number: 6988186
    Abstract: A queue, such as a first-in first-out queue, is incorporated into a processing device, such as a multithreaded pipeline processor. The queue may store the resources of more than one thread in the processing device such that the entries of one thread may be interspersed among the entries of another thread. The entries of each thread may be identified by a thread identification, a valid marker to indicate if the resources within the entry are valid, and a bank number. For a particular thread, the bank number tracks the number of times a head pointer pertaining to the first entry has passed a tail pointer. In this fashion, empty entries may be used and the resources may be efficiently allocated. In a preferred embodiment, the shared resource queue may be implemented into an in-order multithreaded pipelined processor as a queue storing resources to be dispatched for execution of instructions.
    Type: Grant
    Filed: June 28, 2001
    Date of Patent: January 17, 2006
    Assignee: International Business Machines Corporation
    Inventors: Richard James Eickemeyer, Steven R. Kunkel, Hung Q Le
  • Patent number: 6922753
    Abstract: Method and apparatus for prefetching cache with requested data are described. A processor initiates a read access to main memory for data which is not in the main memory. After the requested data is brought into the main memory, but before the read access is reinitiated, the requested data is prefetched from main memory into the cache subsystem of the processor which will later reinitiate the read access.
    Type: Grant
    Filed: September 26, 2002
    Date of Patent: July 26, 2005
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey D. Brown, John D. Irish, Steven R. Kunkel
  • Patent number: 6839816
    Abstract: Embodiments are provided in which cache updating is described for a computer system having at least a first processor and a second processor having a first cache and a second cache, respectively. When the second processor obtains from the first processor a lock to a shared memory region, the first cache pushes to the second cache cache lines for the addresses in the shared memory region accessed by the first processor while the first processor had the lock.
    Type: Grant
    Filed: February 26, 2002
    Date of Patent: January 4, 2005
    Assignee: International Business Machines Corporation
    Inventors: John Michael Borkenhagen, Steven R. Kunkel
  • Patent number: 6728842
    Abstract: Embodiments are provided in which cache update is implemented by using a counter table having a plurality of entries to keep track of different modified cache lines of a cache of a processor. If a cache line of the cache is modified by the processor and the original content of the cache line came from a cache of another processor, a counter in the counter table restarts and reaches a predetermined value (e.g., overflows) triggering the broadcast of the modified cache line so that the cache of the other processor can snarf a copy of the modified cache line. As a result, when the other processor reads from a memory address matching that of the cache line, the cache of the other processor already has the most current copy for the matching memory address to feed the processor. Therefore, a cache read miss is avoided and system performance is improved.
    Type: Grant
    Filed: February 1, 2002
    Date of Patent: April 27, 2004
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey D. Brown, Steven R. Kunkel, David A. Luick
  • Publication number: 20040064648
    Abstract: Method and apparatus for prefetching cache with requested data are described. A processor initiates a read access to main memory for data which is not in the main memory. After the requested data is brought into the main memory, but before the read access is reinitiated, the requested data is prefetched from main memory into the cache subsystem of the processor which will later reinitiate the read access.
    Type: Application
    Filed: September 26, 2002
    Publication date: April 1, 2004
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey D. Brown, John D. Irish, Steven R. Kunkel
  • Publication number: 20040059877
    Abstract: A method and apparatus are provided for implementing a cache state as history of read/write shared data for a cache in a shared memory multiple processor computer system. An invalid temporary state for a cache line is provided in addition to modified, exclusive, shared, and invalid states. The invalid temporary state is entered when a cache releases a modified cache line to another processor. The invalid temporary state is used to enable effective optimizations within cache coherent symmetric multiprocessor (SMP) systems of an SMP caching hierarchy with distributed caches with different caching coherency traffic profiles for both commercial and technical workloads.
    Type: Application
    Filed: September 20, 2002
    Publication date: March 25, 2004
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey Douglas Brown, John David Irish, Steven R. Kunkel
  • Publication number: 20030182539
    Abstract: It has been determined that, in a superscalar computer processor, executing load instructions issued along an incorrectly predicted path of a conditional branch instruction eventually reduces the number of cache misses observed on the correct branch path. Executing these wrong-path loads provides an indirect prefetching effect. If the processor has a small L1 data cache, however, this prefetching pollutes the cache causing an overall slowdown in performance. By storing the execution results of mispredicted paths in memory, such as in a wrong path cache, the pollution is eliminated. A wrong path cache can improve processor performance up to 17% in simulations using a 32 KB data cache. A fully-associative eight-entry wrong path cache in parallel with a 4 KB direct-mapped data cache allows the execution of wrong path loads to produce an average processor speedup of 46%. The wrong path cache also results in 16% better speedup compared to the baseline processor equipped with a victim cache of the same size.
    Type: Application
    Filed: March 20, 2002
    Publication date: September 25, 2003
    Applicant: International Business Machines Corporation
    Inventors: Steven R. Kunkel, David J. Lilja, Resit Sendag
  • Publication number: 20030163642
    Abstract: Embodiments are provided in which cache updating is described for a computer system having at least a first processor and a second processor having a first cache and a second cache, respectively. When the second processor obtains from the first processor a lock to a shared memory region, the first cache pushes to the second cache cache lines for the addresses in the shared memory region accessed by the first processor while the first processor had the lock.
    Type: Application
    Filed: February 26, 2002
    Publication date: August 28, 2003
    Applicant: International Business Machines Corporation
    Inventors: John Michael Borkenhagen, Steven R. Kunkel
  • Publication number: 20030149846
    Abstract: Embodiments are provided in which cache update is implemented by using a counter table having a plurality of entries to keep track of different modified cache lines of a cache of a processor. If a cache line of the cache is modified by the processor and the original content of the cache line came from a cache of another processor, a counter in the counter table restarts and reaches a predetermined value (e.g., overflows) triggering the broadcast of the modified cache line so that the cache of the other processor can snarf a copy of the modified cache line. As a result, when the other processor reads from a memory address matching that of the cache line, the cache of the other processor already has the most current copy for the matching memory address to feed the processor. Therefore, a cache read miss is avoided and system performance is improved.
    Type: Application
    Filed: February 1, 2002
    Publication date: August 7, 2003
    Applicant: International Business Machines Corporation
    Inventors: Jeffrey D. Brown, Steven R. Kunkel, David A. Luick