Patents by Inventor Jeffrey A. Stuecheli

Jeffrey A. Stuecheli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dynamic write priority based on virtual write queue high water mark for set associative cache using cache cleaner when modified sets exceed threshold

Patent number: 9355035

Abstract: A set associative cache is managed by a memory controller which places writeback instructions for modified (dirty) cache lines into a virtual write queue, determines when the number of the sets containing a modified cache line is greater than a high water mark, and elevates a priority of the writeback instructions over read operations. The controller can return the priority to normal when the number of modified sets is less than a low water mark. In an embodiment wherein the system memory device includes rank groups, the congruence classes can be mapped based on the rank groups. The number of writes pending in a rank group exceeding a different threshold can additionally be a requirement to trigger elevation of writeback priority. A dirty vector can be used to provide an indication that corresponding sets contain a modified cache line, particularly in least-recently used segments of the corresponding sets.

Type: Grant

Filed: November 18, 2013

Date of Patent: May 31, 2016

Assignee: GLOBALFOUNDRIES Inc.

Inventors: Benjiman L. Goodman, Jody B. Joyner, Stephen J. Powell, William J. Starke, Jeffrey A. Stuecheli
COMPARISON-BASED SORT IN AN ARRAY PROCESSOR

Publication number: 20160124900

Abstract: A method for sorting data in an array processor. Each of a first tier of processing elements in the array processor receives data inputs from a load streaming unit. Each of the first tier processing elements compares input data portions received from the load streaming unit, wherein the input data portions are stored for processing in respective queues. The first tier processing elements select one of the input data portions to be an output data portion based on the comparison, and in response to the selection, remove a corresponding queue entry and request next input data from the load streaming unit. Each of the first tier processing elements further provides the output data portion as an input data portion to a second tier processing element that generates output data based on a comparison of output data received from at least two first tier processing elements.

Type: Application

Filed: June 3, 2015

Publication date: May 5, 2016

Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
COMPARISON-BASED SORT IN AN ARRAY PROCESSOR

Publication number: 20160124755

Abstract: An array processor includes a managing element having a load streaming unit coupled to multiple processing elements. The load streaming unit provides input data portions to each of a first subset of processing elements and receives output data from each of a second subset of the processing elements based on a comparatively sorted combination of the input data portions. Each processing element is configurable by the managing element to compare input data portions received from the load streaming unit or two or more of the other processing elements. Each processing unit can further select an input data portion to be output data based on the comparison, and in response to selecting the input data portion, remove a queue entry corresponding to the selected input data portion. Each processing element can provide the selected output data portion to the managing element or as an input to one of the processing elements.

Type: Application

Filed: October 31, 2014

Publication date: May 5, 2016

Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
RECONFIGURABLE ARRAY PROCESSOR FOR PATTERN MATCHING

Publication number: 20160085721

Abstract: Various implementations of a method, system, and computer program product for pattern matching using a reconfigurable array processor are disclosed. In one embodiment, a processor array manager of the reconfigurable array processor receives an input data stream for pattern matching and generates a tokenized input data stream from the input data stream. A different portion of the tokenized input data stream is provided to each of a plurality of processing elements of the reconfigurable array processor. Each processing element can compare the received portion of the tokenized input data stream against one or more reference patterns to generate an intermediate result that indicates whether the portion of the tokenized input data stream matches a reference pattern. The processor array manager can combine the intermediate results received from each processing element to yield a final result that indicates whether the input data stream includes a reference pattern.

Type: Application

Filed: January 21, 2015

Publication date: March 24, 2016

Inventors: Bulent Abali, Ganesh Balakrishnan, Bartholomew Blaner, Peter A. Sandon, Jeffrey A. Stuecheli
RECONFIGURABLE ARRAY PROCESSOR FOR PATTERN MATCHING

Publication number: 20160085720

Abstract: Various implementations of a method, system, and computer program product for pattern matching using a reconfigurable array processor are disclosed. In one embodiment, a processor array manager of the reconfigurable array processor receives an input data stream for pattern matching and generates a tokenized input data stream from the input data stream. A different portion of the tokenized input data stream is provided to each of a plurality of processing elements of the reconfigurable array processor. Each processing element can compare the received portion of the tokenized input data stream against one or more reference patterns to generate an intermediate result that indicates whether the portion of the tokenized input data stream matches a reference pattern. The processor array manager can combine the intermediate results received from each processing element to yield a final result that indicates whether the input data stream includes a reference pattern.

Type: Application

Filed: September 22, 2014

Publication date: March 24, 2016

Inventors: Bulent Abali, Ganesh Balakrishnan, Bartholomew Blaner, Peter A. Sandon, Jeffrey A. Stuecheli
Provision of early data from a lower level cache memory

Patent number: 9286220

Abstract: In response to snooping a read-type memory access request of a requestor on a system fabric of a data processing system, a memory channel interface forwards the request to a memory buffer and starts a timer. In response to the forwarded request, the memory buffer performs a lookup of a target address of the request in a memory controller cache. In response to the target address hitting in a coherence state permitting provision of early data, the memory buffer provides a response indicating early data and provides a copy of a target memory block of the request to the memory channel interface. The memory channel interface, responsive to receipt prior to expiration of the timer of the response indicating early data, transmits the copy of the target memory block to the requestor via the system fabric prior to receiving a combined response of the data processing system to the request.

Type: Grant

Filed: September 25, 2013

Date of Patent: March 15, 2016

Assignee: International Business Machines Corporation

Inventors: John T. Hollaway, Jr., Charles F. Marino, Eric E. Retter, Jeffrey A. Stuecheli
Coherent attached processor proxy supporting coherence state update in presence of dispatched master

Patent number: 9256537

Abstract: A coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request specifying a target address in the primary coherent system from an attached processor (AP) external to the primary coherent system. The CAPP includes a CAPP directory of contents of a cache memory in the AP that holds copies of memory blocks belonging to a coherent address space of the primary coherent system. In response to the memory access request, the CAPP performs a first determination of a coherence state for the target address and allocates a master machine to service the memory access request in accordance with the first determination. Thereafter, during allocation of the master machine, the CAPP updates the coherence state and performs a second determination of the coherence state. The master machine services the memory access request in accordance with the second determination.

Type: Grant

Filed: February 14, 2013

Date of Patent: February 9, 2016

Assignee: International Business Machines Corporation

Inventors: Bartholomew Blaner, David W. Cummings, Michael S. Siegel, Jeffrey A. Stuecheli
DATA PREFETCH RAMP IMPLEMENATION BASED ON MEMORY UTILIZATION

Publication number: 20160034400

Abstract: A technique for data prefetching for a multi-core chip includes determining memory utilization of the multi-core chip. In response to the memory utilization of the multi-core chip exceeding a first level, data prefetching for the multi-core chip is modified from a first data prefetching arrangement to a second data prefetching arrangement to minimize unused prefetched cache lines. In response to the memory utilization of the multi-core chip not exceeding the first level, the first data prefetching arrangement is maintained. The first and second data prefetching arrangements are different.

Type: Application

Filed: July 29, 2014

Publication date: February 4, 2016

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: JASON NATHANIEL DALE, MILES R. DOOLEY, RICHARD J. EICKEMEYER, JR., JOHN BARRY GRISWELL, JR., FRANCIS PATRICK O'CONNELL, JEFFREY A. STUECHELI
Early data tag to allow data CRC bypass via a speculative memory data return protocol

Patent number: 9231618

Abstract: A bypass mechanism allows a memory controller to transmit requested data to an interconnect before the data's error code has been decoded, e.g., a cyclical redundancy check (CRC). The tag, tag CRC, data, and data CRC are pipelined from DRAM in four frames, each having multiple clock cycles. The tag includes a bypass bit indicating whether data transmission to the interconnect should begin before CRC decoding. After receiving the tag CRC, the controller decodes it and reserves a request machine which sends a transmit request signal to inform the interconnect that data is available. Once the transmit request is granted by the interconnect, the controller can immediately start sending the data, before decoding the data CRC. So long as no error is found, the controller completes transmission of the data to the interconnect, including providing an indication that the data as transmitted is error-free.

Type: Grant

Filed: December 6, 2013

Date of Patent: January 5, 2016

Assignee: International Business Machines Corporation

Inventors: Benjiman L. Goodman, Harrison M. McCreary, Stephen J. Powell, William J. Starke, Jeffrey A. Stuecheli
Least-recently-used (LRU) to first-dirty-member distance-maintaining cache cleaning scheduler

Patent number: 9218292

Abstract: A technique for scheduling cache cleaning operations maintains a clean distance between a set of least-recently-used (LRU) clean lines and the LRU dirty (modified) line for each congruence class in the cache. The technique is generally employed at a victim cache at the highest-order level of the cache memory hierarchy, so that write-backs to system memory are scheduled to avoid having to generate a write-back in response to a cache miss in the next lower-order level of the cache memory hierarchy. The clean distance can be determined by counting all of the LRU clean lines in each congruence class that have a reference count that is less than or equal to the reference count of the LRU dirty line.

Type: Grant

Filed: June 18, 2013

Date of Patent: December 22, 2015

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Benjiman L. Goodman, Jody B. Joyner, Stephen J. Powell, Aaron C. Sawdey, Jeffrey A. Stuecheli
TECHNIQUES FOR PRESERVING AN INVALID GLOBAL DOMAIN INDICATION WHEN INSTALLING A SHARED CACHE LINE IN A CACHE

Publication number: 20150363317

Abstract: A technique for operating a memory system for a node includes interrogating, by a cache, an associated cache directory to determine whether a shared cache line to be installed in the cache is associated with an invalid global state in the cache. The invalid global state specifies that a version of the shared cache line has been intervened off-node. In response to the shared cache line being in the invalid global state the cache spawns a castout invalid global command for the shared cache line. The shared cache line is installed in the cache. A coherence state for the shared cache line is updated in the associated cache directory to indicate the shared cache line is shared.

Type: Application

Filed: June 17, 2014

Publication date: December 17, 2015

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: GUY L. GUTHRIE, HIEN MINH LE, JEFFREY A. STUECHELI, PHILLIP G. WILLIAMS
PRESERVING AN INVALID GLOBAL DOMAIN INDICATION WHEN INSTALLING A SHARED CACHE LINE IN A CACHE

Publication number: 20150363316

Abstract: A technique for operating a memory system for a node includes interrogating, by a cache, an associated cache directory to determine whether a shared cache line to be installed in the cache is associated with an invalid global state in the cache. The invalid global state specifies that a version of the shared cache line has been intervened off-node. In response to the shared cache line being in the invalid global state the cache spawns a castout invalid global command for the shared cache line. The shared cache line is installed in the cache. A coherence state for the shared cache line is updated in the associated cache directory to indicate the shared cache line is shared.

Type: Application

Filed: June 10, 2015

Publication date: December 17, 2015

Inventors: GUY L. GUTHRIE, HIEN MINH LE, JEFFREY A. STUECHELI, PHILLIP G. WILLIAMS
Least-recently-used (LRU) to first-dirty-member distance-maintaining cache cleaning scheduler

Patent number: 9213647

Abstract: A technique for scheduling cache cleaning operations maintains a clean distance between a set of least-recently-used (LRU) clean lines and the LRU dirty (modified) line for each congruence class in the cache. The technique is generally employed at a victim cache at the highest-order level of the cache memory hierarchy, so that write-backs to system memory are scheduled to avoid having to generate a write-back in response to a cache miss in the next lower-order level of the cache memory hierarchy. The clean distance can be determined by counting all of the LRU clean lines in each congruence class that have a reference count that is less than or equal to the reference count of the LRU dirty line.

Type: Grant

Filed: September 23, 2013

Date of Patent: December 15, 2015

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Benjiman L. Goodman, Jody B. Joyner, Stephen J. Powell, Aaron C. Sawdey, Jeffrey A. Stuecheli
Coherent attached processor proxy having hybrid directory

Patent number: 9208091

Abstract: A coherent attached processor proxy (CAPP) includes transport logic having a first interface configured to support communication with a system fabric of a primary coherent system and a second interface configured to support communication with an attached processor (AP) that is external to the primary coherent system and that includes a cache memory that holds copies of memory blocks belonging to a coherent address space of the primary coherent system. The CAPP further includes one or more master machines that initiate memory access requests on the system fabric of the primary coherent system on behalf of the AP, one or more snoop machines that service requests snooped on the system fabric, and a CAPP directory having a precise directory having a plurality of entries each associated with a smaller data granule and a coarse directory having a plurality of entries each associated with a larger data granule.

Type: Grant

Filed: June 19, 2013

Date of Patent: December 8, 2015

Assignee: GLOBALFOUNDRIES INC.

Inventors: Bartholomew Blaner, Michael S. Siegel, Jeffrey A. Stuecheli, Charles F. Marino
Coherent attached processor proxy having hybrid directory

Patent number: 9208092

Abstract: A coherent attached processor proxy (CAPP) includes transport logic having a first interface configured to support communication with a system fabric of a primary coherent system and a second interface configured to support communication with an attached processor (AP) that is external to the primary coherent system and that includes a cache memory that holds copies of memory blocks belonging to a coherent address space of the primary coherent system. The CAPP further includes one or more master machines that initiate memory access requests on the system fabric of the primary coherent system on behalf of the AP, one or more snoop machines that service requests snooped on the system fabric, and a CAPP directory having a precise directory having a plurality of entries each associated with a smaller data granule and a coarse directory having a plurality of entries each associated with a larger data granule.

Type: Grant

Filed: September 24, 2013

Date of Patent: December 8, 2015

Assignee: GLOBALFOUNDRIES INC.

Inventors: Bartholomew Blaner, Michael S. Siegel, Jeffrey A. Stuecheli, Charles F. Marino
INTERCOMPONENT DATA COMMUNICATION

Publication number: 20150347334

Abstract: A request to send a first message from a first component to a second component is received at an arbiter. The first component is located in a first time zone and the second component is located in a second time zone. The arbiter determines that the second component is located in the second time zone. It is determined that the second time zone can be communicated with via one or more communications channels in a first direction. It is determined whether bandwidth is available on the one or more communications channels in the first direction. If bandwidth is available on the one or more communications channels in the first direction, a data path between the first component and the one or more communications channels in the first direction is created and the request is granted. Otherwise, the grant of the request is delayed.

Type: Application

Filed: May 30, 2014

Publication date: December 3, 2015

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Robert C. Dixon, Lonny J. Lambrecht, Charles F. Marino, Jeffrey A. Stuecheli
INTERCOMPONENT DATA COMMUNICATION

Publication number: 20150347343

Abstract: A request to send a first message from a first component to a second component is received at an arbiter. The first component is located in a first time zone and the second component is located in a second time zone. The arbiter determines that the second component is located in the second time zone. It is determined that the second time zone can be communicated with via one or more communications channels in a first direction. It is determined whether bandwidth is available on the one or more communications channels in the first direction. If bandwidth is available on the one or more communications channels in the first direction, a data path between the first component and the one or more communications channels in the first direction is created and the request is granted. Otherwise, the grant of the request is delayed.

Type: Application

Filed: September 12, 2014

Publication date: December 3, 2015

Inventors: Robert C. Dixon, Lonny J. Lambrecht, Charles F. Marino, Jeffrey A. Stuecheli
INTERCOMPONENT DATA COMMUNICATION

Publication number: 20150347333

Abstract: A request to send a message from a first component, located on a first processor, to a second component, located on a second processor, is received. It is determined that the second processor can be communicated with via a first bidirectional communication path. It is determined that bandwidth is available on the first bidirectional communication path. It is determined that bandwidth is available on a second bidirectional communication path. In response to a determination that bandwidth is available on the second bidirectional communication path, a data path is created between the first component and the second bidirectional communication path and the request to send the message to the second component is granted. In response to a determination that bandwidth is not available on the first bidirectional communication path or on the second bidirectional communication path, the grant of the request to send the message to the second component is delayed.

Type: Application

Filed: May 30, 2014

Publication date: December 3, 2015

Applicant: International Business Machines Corporation

Inventors: Robert C. Dixon, Lonny J. Lambrecht, Charles F. Marino, Jeffrey A. Stuecheli
INTERCOMPONENT DATA COMMUNICATION

Publication number: 20150347340

Abstract: A request to send a message from a first component, located on a first processor, to a second component, located on a second processor, is received. It is determined that the second processor can be communicated with via a first bidirectional communication path. It is determined that bandwidth is available on the first bidirectional communication path. It is determined that bandwidth is available on a second bidirectional communication path. In response to a determination that bandwidth is available on the second bidirectional communication path, a data path is created between the first component and the second bidirectional communication path and the request to send the message to the second component is granted. In response to a determination that bandwidth is not available on the first bidirectional communication path or on the second bidirectional communication path, the grant of the request to send the message to the second component is delayed.

Type: Application

Filed: September 12, 2014

Publication date: December 3, 2015

Inventors: Robert C. Dixon, Lonny J. Lambrecht, Charles F. Marino, Jeffrey A. Stuecheli
Selective cache-to-cache lateral castouts

Patent number: 9189403

Abstract: A data processing system includes first and second processing units and a system memory. The first processing unit has first upper and first lower level caches, and the second processing unit has second upper and lower level caches. In response to a data request, a victim cache line to be castout from the first lower level cache is selected, and the first lower level cache selects between performing a lateral castout (LCO) of the victim cache line to the second lower level cache and a castout of the victim cache line to the system memory based upon a confidence indicator associated with the victim cache line. In response to selecting an LCO, the first processing unit issues an LCO command on the interconnect fabric and removes the victim cache line from the first lower level cache, and the second lower level cache holds the victim cache line.

Type: Grant

Filed: December 30, 2009

Date of Patent: November 17, 2015

Assignee: International Business Machines Corporation

Inventors: Guy L. Guthrie, William J. Starke, Jeffrey Stuecheli, Derek E. Williams, Thomas R. Puzak

prev … 4 5 6 7 8 9 10 11 12 … next