Patents by Inventor Steven R. Kunkel
Steven R. Kunkel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7716424Abstract: We present a “directory extension” (hereinafter “DX”) to aid in prefetching between proximate levels in a cache hierarchy. The DX may maintain (1) a list of pages which contains recently ejected lines from a given level in the cache hierarchy, and (2) for each page in this list, the identity of a set of ejected lines, provided these lines are prefetchable from, for example, the next level of the cache hierarchy. Given a cache fault to a line within a page in this list, other lines from this page may then be prefetched without the substantial overhead to directory lookup which would otherwise be required.Type: GrantFiled: November 16, 2004Date of Patent: May 11, 2010Assignee: International Business Machines CorporationInventors: Peter A. Franaszek, Steven R. Kunkel, Luis Alfonso Lastras Montaño, Aaron C. Sawdey
-
Publication number: 20090319726Abstract: A system and method of a region coherence protocol for use in Region Coherence Arrays (RCAs) deployed in clustered shared-memory multiprocessor systems which optimize cache-to-cache transfers by allowing broadcast memory requests to be provided to only a portion of a clustered shared-memory multiprocessor system. Interconnect hierarchy levels can be devised for logical groups of processors, processors on the same chip, processors on chips aggregated into a multichip module, multichip modules on the same printed circuit board, and for processors on other printed circuit boards or in other cabinets.Type: ApplicationFiled: June 24, 2008Publication date: December 24, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jason F. Cantin, Steven R. Kunkel
-
Patent number: 7620776Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes.Type: GrantFiled: December 12, 2007Date of Patent: November 17, 2009Assignee: International Business Machines CorporationInventors: Jason Frederick Cantin, Steven R. Kunkel
-
Publication number: 20080215821Abstract: A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.Type: ApplicationFiled: April 15, 2008Publication date: September 4, 2008Inventors: Jason F. Cantin, James S. Fields, Steven R. Kunkel, William J. Starke
-
Publication number: 20080215819Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is defined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory read requests to read data in a cache line that is not currently in the shared invalid state are broadcast first to remote nodes.Type: ApplicationFiled: April 22, 2008Publication date: September 4, 2008Inventors: Jason Frederick Cantin, Steven R. Kunkel
-
Patent number: 7395376Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is declined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory read requests to read data in a cache line that is not currently in tile shared invalid state are broadcast first to remote nodes.Type: GrantFiled: July 19, 2005Date of Patent: July 1, 2008Assignee: International Business Machines CorporationInventors: Jason Frederick Cantin, Steven R. Kunkel
-
Publication number: 20080147987Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes.Type: ApplicationFiled: December 12, 2007Publication date: June 19, 2008Inventors: JASON FREDERICK CANTIN, Steven R. Kunkel
-
Patent number: 7389388Abstract: A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.Type: GrantFiled: February 10, 2005Date of Patent: June 17, 2008Assignee: International Business Machines CorporationInventors: Jason F. Cantin, James S. Fields, Jr., Steven R. Kunkel, William J. Starke
-
Patent number: 7360032Abstract: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes.Type: GrantFiled: July 19, 2005Date of Patent: April 15, 2008Assignee: International Business Machines CorporationInventors: Jason Frederick Cantin, Steven R. Kunkel
-
Patent number: 7194586Abstract: A method and apparatus are provided for implementing a cache state as history of read/write shared data for a cache in a shared memory multiple processor computer system. An invalid temporary state for a cache line is provided in addition to modified, exclusive, shared, and invalid states. The invalid temporary state is entered when a cache releases a modified cache line to another processor. The invalid temporary state is used to enable effective optimizations within cache coherent symmetric multiprocessor (SMP) systems of an SMP caching hierarchy with distributed caches with different caching coherency traffic profiles for both commercial and technical workloads.Type: GrantFiled: September 20, 2002Date of Patent: March 20, 2007Assignee: International Business Machines CorporationInventors: Jeffrey Douglas Brown, John David Irish, Steven R. Kunkel
-
Patent number: 7047365Abstract: A method and apparatus for purging a cache line from an issuing processor and sending the cache line to the cache of one or more processors in a multi-processor shared memory computer system. The method and apparatus enables cache line data to be moved from one processor to another before the receiving processor needs the data thus preventing the receiving processor from incurring a cache miss event.Type: GrantFiled: January 22, 2002Date of Patent: May 16, 2006Assignee: International Business Machines CorporationInventors: Steven R. Kunkel, David Arnold Luick
-
Patent number: 6988186Abstract: A queue, such as a first-in first-out queue, is incorporated into a processing device, such as a multithreaded pipeline processor. The queue may store the resources of more than one thread in the processing device such that the entries of one thread may be interspersed among the entries of another thread. The entries of each thread may be identified by a thread identification, a valid marker to indicate if the resources within the entry are valid, and a bank number. For a particular thread, the bank number tracks the number of times a head pointer pertaining to the first entry has passed a tail pointer. In this fashion, empty entries may be used and the resources may be efficiently allocated. In a preferred embodiment, the shared resource queue may be implemented into an in-order multithreaded pipelined processor as a queue storing resources to be dispatched for execution of instructions.Type: GrantFiled: June 28, 2001Date of Patent: January 17, 2006Assignee: International Business Machines CorporationInventors: Richard James Eickemeyer, Steven R. Kunkel, Hung Q Le
-
Patent number: 6922753Abstract: Method and apparatus for prefetching cache with requested data are described. A processor initiates a read access to main memory for data which is not in the main memory. After the requested data is brought into the main memory, but before the read access is reinitiated, the requested data is prefetched from main memory into the cache subsystem of the processor which will later reinitiate the read access.Type: GrantFiled: September 26, 2002Date of Patent: July 26, 2005Assignee: International Business Machines CorporationInventors: Jeffrey D. Brown, John D. Irish, Steven R. Kunkel
-
Patent number: 6839816Abstract: Embodiments are provided in which cache updating is described for a computer system having at least a first processor and a second processor having a first cache and a second cache, respectively. When the second processor obtains from the first processor a lock to a shared memory region, the first cache pushes to the second cache cache lines for the addresses in the shared memory region accessed by the first processor while the first processor had the lock.Type: GrantFiled: February 26, 2002Date of Patent: January 4, 2005Assignee: International Business Machines CorporationInventors: John Michael Borkenhagen, Steven R. Kunkel
-
Patent number: 6728842Abstract: Embodiments are provided in which cache update is implemented by using a counter table having a plurality of entries to keep track of different modified cache lines of a cache of a processor. If a cache line of the cache is modified by the processor and the original content of the cache line came from a cache of another processor, a counter in the counter table restarts and reaches a predetermined value (e.g., overflows) triggering the broadcast of the modified cache line so that the cache of the other processor can snarf a copy of the modified cache line. As a result, when the other processor reads from a memory address matching that of the cache line, the cache of the other processor already has the most current copy for the matching memory address to feed the processor. Therefore, a cache read miss is avoided and system performance is improved.Type: GrantFiled: February 1, 2002Date of Patent: April 27, 2004Assignee: International Business Machines CorporationInventors: Jeffrey D. Brown, Steven R. Kunkel, David A. Luick
-
Publication number: 20040064648Abstract: Method and apparatus for prefetching cache with requested data are described. A processor initiates a read access to main memory for data which is not in the main memory. After the requested data is brought into the main memory, but before the read access is reinitiated, the requested data is prefetched from main memory into the cache subsystem of the processor which will later reinitiate the read access.Type: ApplicationFiled: September 26, 2002Publication date: April 1, 2004Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jeffrey D. Brown, John D. Irish, Steven R. Kunkel
-
Publication number: 20040059877Abstract: A method and apparatus are provided for implementing a cache state as history of read/write shared data for a cache in a shared memory multiple processor computer system. An invalid temporary state for a cache line is provided in addition to modified, exclusive, shared, and invalid states. The invalid temporary state is entered when a cache releases a modified cache line to another processor. The invalid temporary state is used to enable effective optimizations within cache coherent symmetric multiprocessor (SMP) systems of an SMP caching hierarchy with distributed caches with different caching coherency traffic profiles for both commercial and technical workloads.Type: ApplicationFiled: September 20, 2002Publication date: March 25, 2004Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jeffrey Douglas Brown, John David Irish, Steven R. Kunkel
-
Publication number: 20030182539Abstract: It has been determined that, in a superscalar computer processor, executing load instructions issued along an incorrectly predicted path of a conditional branch instruction eventually reduces the number of cache misses observed on the correct branch path. Executing these wrong-path loads provides an indirect prefetching effect. If the processor has a small L1 data cache, however, this prefetching pollutes the cache causing an overall slowdown in performance. By storing the execution results of mispredicted paths in memory, such as in a wrong path cache, the pollution is eliminated. A wrong path cache can improve processor performance up to 17% in simulations using a 32 KB data cache. A fully-associative eight-entry wrong path cache in parallel with a 4 KB direct-mapped data cache allows the execution of wrong path loads to produce an average processor speedup of 46%. The wrong path cache also results in 16% better speedup compared to the baseline processor equipped with a victim cache of the same size.Type: ApplicationFiled: March 20, 2002Publication date: September 25, 2003Applicant: International Business Machines CorporationInventors: Steven R. Kunkel, David J. Lilja, Resit Sendag
-
Publication number: 20030163642Abstract: Embodiments are provided in which cache updating is described for a computer system having at least a first processor and a second processor having a first cache and a second cache, respectively. When the second processor obtains from the first processor a lock to a shared memory region, the first cache pushes to the second cache cache lines for the addresses in the shared memory region accessed by the first processor while the first processor had the lock.Type: ApplicationFiled: February 26, 2002Publication date: August 28, 2003Applicant: International Business Machines CorporationInventors: John Michael Borkenhagen, Steven R. Kunkel
-
Publication number: 20030149846Abstract: Embodiments are provided in which cache update is implemented by using a counter table having a plurality of entries to keep track of different modified cache lines of a cache of a processor. If a cache line of the cache is modified by the processor and the original content of the cache line came from a cache of another processor, a counter in the counter table restarts and reaches a predetermined value (e.g., overflows) triggering the broadcast of the modified cache line so that the cache of the other processor can snarf a copy of the modified cache line. As a result, when the other processor reads from a memory address matching that of the cache line, the cache of the other processor already has the most current copy for the matching memory address to feed the processor. Therefore, a cache read miss is avoided and system performance is improved.Type: ApplicationFiled: February 1, 2002Publication date: August 7, 2003Applicant: International Business Machines CorporationInventors: Jeffrey D. Brown, Steven R. Kunkel, David A. Luick