Patents by Inventor Jeffrey A. Stuecheli
Jeffrey A. Stuecheli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9355035Abstract: A set associative cache is managed by a memory controller which places writeback instructions for modified (dirty) cache lines into a virtual write queue, determines when the number of the sets containing a modified cache line is greater than a high water mark, and elevates a priority of the writeback instructions over read operations. The controller can return the priority to normal when the number of modified sets is less than a low water mark. In an embodiment wherein the system memory device includes rank groups, the congruence classes can be mapped based on the rank groups. The number of writes pending in a rank group exceeding a different threshold can additionally be a requirement to trigger elevation of writeback priority. A dirty vector can be used to provide an indication that corresponding sets contain a modified cache line, particularly in least-recently used segments of the corresponding sets.Type: GrantFiled: November 18, 2013Date of Patent: May 31, 2016Assignee: GLOBALFOUNDRIES Inc.Inventors: Benjiman L. Goodman, Jody B. Joyner, Stephen J. Powell, William J. Starke, Jeffrey A. Stuecheli
-
Publication number: 20160124900Abstract: A method for sorting data in an array processor. Each of a first tier of processing elements in the array processor receives data inputs from a load streaming unit. Each of the first tier processing elements compares input data portions received from the load streaming unit, wherein the input data portions are stored for processing in respective queues. The first tier processing elements select one of the input data portions to be an output data portion based on the comparison, and in response to the selection, remove a corresponding queue entry and request next input data from the load streaming unit. Each of the first tier processing elements further provides the output data portion as an input data portion to a second tier processing element that generates output data based on a comparison of output data received from at least two first tier processing elements.Type: ApplicationFiled: June 3, 2015Publication date: May 5, 2016Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
-
Publication number: 20160124755Abstract: An array processor includes a managing element having a load streaming unit coupled to multiple processing elements. The load streaming unit provides input data portions to each of a first subset of processing elements and receives output data from each of a second subset of the processing elements based on a comparatively sorted combination of the input data portions. Each processing element is configurable by the managing element to compare input data portions received from the load streaming unit or two or more of the other processing elements. Each processing unit can further select an input data portion to be output data based on the comparison, and in response to selecting the input data portion, remove a queue entry corresponding to the selected input data portion. Each processing element can provide the selected output data portion to the managing element or as an input to one of the processing elements.Type: ApplicationFiled: October 31, 2014Publication date: May 5, 2016Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
-
Publication number: 20160085721Abstract: Various implementations of a method, system, and computer program product for pattern matching using a reconfigurable array processor are disclosed. In one embodiment, a processor array manager of the reconfigurable array processor receives an input data stream for pattern matching and generates a tokenized input data stream from the input data stream. A different portion of the tokenized input data stream is provided to each of a plurality of processing elements of the reconfigurable array processor. Each processing element can compare the received portion of the tokenized input data stream against one or more reference patterns to generate an intermediate result that indicates whether the portion of the tokenized input data stream matches a reference pattern. The processor array manager can combine the intermediate results received from each processing element to yield a final result that indicates whether the input data stream includes a reference pattern.Type: ApplicationFiled: January 21, 2015Publication date: March 24, 2016Inventors: Bulent Abali, Ganesh Balakrishnan, Bartholomew Blaner, Peter A. Sandon, Jeffrey A. Stuecheli
-
Publication number: 20160085720Abstract: Various implementations of a method, system, and computer program product for pattern matching using a reconfigurable array processor are disclosed. In one embodiment, a processor array manager of the reconfigurable array processor receives an input data stream for pattern matching and generates a tokenized input data stream from the input data stream. A different portion of the tokenized input data stream is provided to each of a plurality of processing elements of the reconfigurable array processor. Each processing element can compare the received portion of the tokenized input data stream against one or more reference patterns to generate an intermediate result that indicates whether the portion of the tokenized input data stream matches a reference pattern. The processor array manager can combine the intermediate results received from each processing element to yield a final result that indicates whether the input data stream includes a reference pattern.Type: ApplicationFiled: September 22, 2014Publication date: March 24, 2016Inventors: Bulent Abali, Ganesh Balakrishnan, Bartholomew Blaner, Peter A. Sandon, Jeffrey A. Stuecheli
-
Patent number: 9286220Abstract: In response to snooping a read-type memory access request of a requestor on a system fabric of a data processing system, a memory channel interface forwards the request to a memory buffer and starts a timer. In response to the forwarded request, the memory buffer performs a lookup of a target address of the request in a memory controller cache. In response to the target address hitting in a coherence state permitting provision of early data, the memory buffer provides a response indicating early data and provides a copy of a target memory block of the request to the memory channel interface. The memory channel interface, responsive to receipt prior to expiration of the timer of the response indicating early data, transmits the copy of the target memory block to the requestor via the system fabric prior to receiving a combined response of the data processing system to the request.Type: GrantFiled: September 25, 2013Date of Patent: March 15, 2016Assignee: International Business Machines CorporationInventors: John T. Hollaway, Jr., Charles F. Marino, Eric E. Retter, Jeffrey A. Stuecheli
-
Coherent attached processor proxy supporting coherence state update in presence of dispatched master
Patent number: 9256537Abstract: A coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request specifying a target address in the primary coherent system from an attached processor (AP) external to the primary coherent system. The CAPP includes a CAPP directory of contents of a cache memory in the AP that holds copies of memory blocks belonging to a coherent address space of the primary coherent system. In response to the memory access request, the CAPP performs a first determination of a coherence state for the target address and allocates a master machine to service the memory access request in accordance with the first determination. Thereafter, during allocation of the master machine, the CAPP updates the coherence state and performs a second determination of the coherence state. The master machine services the memory access request in accordance with the second determination.Type: GrantFiled: February 14, 2013Date of Patent: February 9, 2016Assignee: International Business Machines CorporationInventors: Bartholomew Blaner, David W. Cummings, Michael S. Siegel, Jeffrey A. Stuecheli -
Publication number: 20160034400Abstract: A technique for data prefetching for a multi-core chip includes determining memory utilization of the multi-core chip. In response to the memory utilization of the multi-core chip exceeding a first level, data prefetching for the multi-core chip is modified from a first data prefetching arrangement to a second data prefetching arrangement to minimize unused prefetched cache lines. In response to the memory utilization of the multi-core chip not exceeding the first level, the first data prefetching arrangement is maintained. The first and second data prefetching arrangements are different.Type: ApplicationFiled: July 29, 2014Publication date: February 4, 2016Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: JASON NATHANIEL DALE, MILES R. DOOLEY, RICHARD J. EICKEMEYER, JR., JOHN BARRY GRISWELL, JR., FRANCIS PATRICK O'CONNELL, JEFFREY A. STUECHELI
-
Patent number: 9231618Abstract: A bypass mechanism allows a memory controller to transmit requested data to an interconnect before the data's error code has been decoded, e.g., a cyclical redundancy check (CRC). The tag, tag CRC, data, and data CRC are pipelined from DRAM in four frames, each having multiple clock cycles. The tag includes a bypass bit indicating whether data transmission to the interconnect should begin before CRC decoding. After receiving the tag CRC, the controller decodes it and reserves a request machine which sends a transmit request signal to inform the interconnect that data is available. Once the transmit request is granted by the interconnect, the controller can immediately start sending the data, before decoding the data CRC. So long as no error is found, the controller completes transmission of the data to the interconnect, including providing an indication that the data as transmitted is error-free.Type: GrantFiled: December 6, 2013Date of Patent: January 5, 2016Assignee: International Business Machines CorporationInventors: Benjiman L. Goodman, Harrison M. McCreary, Stephen J. Powell, William J. Starke, Jeffrey A. Stuecheli
-
Patent number: 9218292Abstract: A technique for scheduling cache cleaning operations maintains a clean distance between a set of least-recently-used (LRU) clean lines and the LRU dirty (modified) line for each congruence class in the cache. The technique is generally employed at a victim cache at the highest-order level of the cache memory hierarchy, so that write-backs to system memory are scheduled to avoid having to generate a write-back in response to a cache miss in the next lower-order level of the cache memory hierarchy. The clean distance can be determined by counting all of the LRU clean lines in each congruence class that have a reference count that is less than or equal to the reference count of the LRU dirty line.Type: GrantFiled: June 18, 2013Date of Patent: December 22, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Benjiman L. Goodman, Jody B. Joyner, Stephen J. Powell, Aaron C. Sawdey, Jeffrey A. Stuecheli
-
Publication number: 20150363317Abstract: A technique for operating a memory system for a node includes interrogating, by a cache, an associated cache directory to determine whether a shared cache line to be installed in the cache is associated with an invalid global state in the cache. The invalid global state specifies that a version of the shared cache line has been intervened off-node. In response to the shared cache line being in the invalid global state the cache spawns a castout invalid global command for the shared cache line. The shared cache line is installed in the cache. A coherence state for the shared cache line is updated in the associated cache directory to indicate the shared cache line is shared.Type: ApplicationFiled: June 17, 2014Publication date: December 17, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: GUY L. GUTHRIE, HIEN MINH LE, JEFFREY A. STUECHELI, PHILLIP G. WILLIAMS
-
Publication number: 20150363316Abstract: A technique for operating a memory system for a node includes interrogating, by a cache, an associated cache directory to determine whether a shared cache line to be installed in the cache is associated with an invalid global state in the cache. The invalid global state specifies that a version of the shared cache line has been intervened off-node. In response to the shared cache line being in the invalid global state the cache spawns a castout invalid global command for the shared cache line. The shared cache line is installed in the cache. A coherence state for the shared cache line is updated in the associated cache directory to indicate the shared cache line is shared.Type: ApplicationFiled: June 10, 2015Publication date: December 17, 2015Inventors: GUY L. GUTHRIE, HIEN MINH LE, JEFFREY A. STUECHELI, PHILLIP G. WILLIAMS
-
Patent number: 9213647Abstract: A technique for scheduling cache cleaning operations maintains a clean distance between a set of least-recently-used (LRU) clean lines and the LRU dirty (modified) line for each congruence class in the cache. The technique is generally employed at a victim cache at the highest-order level of the cache memory hierarchy, so that write-backs to system memory are scheduled to avoid having to generate a write-back in response to a cache miss in the next lower-order level of the cache memory hierarchy. The clean distance can be determined by counting all of the LRU clean lines in each congruence class that have a reference count that is less than or equal to the reference count of the LRU dirty line.Type: GrantFiled: September 23, 2013Date of Patent: December 15, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Benjiman L. Goodman, Jody B. Joyner, Stephen J. Powell, Aaron C. Sawdey, Jeffrey A. Stuecheli
-
Patent number: 9208091Abstract: A coherent attached processor proxy (CAPP) includes transport logic having a first interface configured to support communication with a system fabric of a primary coherent system and a second interface configured to support communication with an attached processor (AP) that is external to the primary coherent system and that includes a cache memory that holds copies of memory blocks belonging to a coherent address space of the primary coherent system. The CAPP further includes one or more master machines that initiate memory access requests on the system fabric of the primary coherent system on behalf of the AP, one or more snoop machines that service requests snooped on the system fabric, and a CAPP directory having a precise directory having a plurality of entries each associated with a smaller data granule and a coarse directory having a plurality of entries each associated with a larger data granule.Type: GrantFiled: June 19, 2013Date of Patent: December 8, 2015Assignee: GLOBALFOUNDRIES INC.Inventors: Bartholomew Blaner, Michael S. Siegel, Jeffrey A. Stuecheli, Charles F. Marino
-
Patent number: 9208092Abstract: A coherent attached processor proxy (CAPP) includes transport logic having a first interface configured to support communication with a system fabric of a primary coherent system and a second interface configured to support communication with an attached processor (AP) that is external to the primary coherent system and that includes a cache memory that holds copies of memory blocks belonging to a coherent address space of the primary coherent system. The CAPP further includes one or more master machines that initiate memory access requests on the system fabric of the primary coherent system on behalf of the AP, one or more snoop machines that service requests snooped on the system fabric, and a CAPP directory having a precise directory having a plurality of entries each associated with a smaller data granule and a coarse directory having a plurality of entries each associated with a larger data granule.Type: GrantFiled: September 24, 2013Date of Patent: December 8, 2015Assignee: GLOBALFOUNDRIES INC.Inventors: Bartholomew Blaner, Michael S. Siegel, Jeffrey A. Stuecheli, Charles F. Marino
-
Publication number: 20150347334Abstract: A request to send a first message from a first component to a second component is received at an arbiter. The first component is located in a first time zone and the second component is located in a second time zone. The arbiter determines that the second component is located in the second time zone. It is determined that the second time zone can be communicated with via one or more communications channels in a first direction. It is determined whether bandwidth is available on the one or more communications channels in the first direction. If bandwidth is available on the one or more communications channels in the first direction, a data path between the first component and the one or more communications channels in the first direction is created and the request is granted. Otherwise, the grant of the request is delayed.Type: ApplicationFiled: May 30, 2014Publication date: December 3, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Robert C. Dixon, Lonny J. Lambrecht, Charles F. Marino, Jeffrey A. Stuecheli
-
Publication number: 20150347343Abstract: A request to send a first message from a first component to a second component is received at an arbiter. The first component is located in a first time zone and the second component is located in a second time zone. The arbiter determines that the second component is located in the second time zone. It is determined that the second time zone can be communicated with via one or more communications channels in a first direction. It is determined whether bandwidth is available on the one or more communications channels in the first direction. If bandwidth is available on the one or more communications channels in the first direction, a data path between the first component and the one or more communications channels in the first direction is created and the request is granted. Otherwise, the grant of the request is delayed.Type: ApplicationFiled: September 12, 2014Publication date: December 3, 2015Inventors: Robert C. Dixon, Lonny J. Lambrecht, Charles F. Marino, Jeffrey A. Stuecheli
-
Publication number: 20150347333Abstract: A request to send a message from a first component, located on a first processor, to a second component, located on a second processor, is received. It is determined that the second processor can be communicated with via a first bidirectional communication path. It is determined that bandwidth is available on the first bidirectional communication path. It is determined that bandwidth is available on a second bidirectional communication path. In response to a determination that bandwidth is available on the second bidirectional communication path, a data path is created between the first component and the second bidirectional communication path and the request to send the message to the second component is granted. In response to a determination that bandwidth is not available on the first bidirectional communication path or on the second bidirectional communication path, the grant of the request to send the message to the second component is delayed.Type: ApplicationFiled: May 30, 2014Publication date: December 3, 2015Applicant: International Business Machines CorporationInventors: Robert C. Dixon, Lonny J. Lambrecht, Charles F. Marino, Jeffrey A. Stuecheli
-
Publication number: 20150347340Abstract: A request to send a message from a first component, located on a first processor, to a second component, located on a second processor, is received. It is determined that the second processor can be communicated with via a first bidirectional communication path. It is determined that bandwidth is available on the first bidirectional communication path. It is determined that bandwidth is available on a second bidirectional communication path. In response to a determination that bandwidth is available on the second bidirectional communication path, a data path is created between the first component and the second bidirectional communication path and the request to send the message to the second component is granted. In response to a determination that bandwidth is not available on the first bidirectional communication path or on the second bidirectional communication path, the grant of the request to send the message to the second component is delayed.Type: ApplicationFiled: September 12, 2014Publication date: December 3, 2015Inventors: Robert C. Dixon, Lonny J. Lambrecht, Charles F. Marino, Jeffrey A. Stuecheli
-
Patent number: 9189403Abstract: A data processing system includes first and second processing units and a system memory. The first processing unit has first upper and first lower level caches, and the second processing unit has second upper and lower level caches. In response to a data request, a victim cache line to be castout from the first lower level cache is selected, and the first lower level cache selects between performing a lateral castout (LCO) of the victim cache line to the second lower level cache and a castout of the victim cache line to the system memory based upon a confidence indicator associated with the victim cache line. In response to selecting an LCO, the first processing unit issues an LCO command on the interconnect fabric and removes the victim cache line from the first lower level cache, and the second lower level cache holds the victim cache line.Type: GrantFiled: December 30, 2009Date of Patent: November 17, 2015Assignee: International Business Machines CorporationInventors: Guy L. Guthrie, William J. Starke, Jeffrey Stuecheli, Derek E. Williams, Thomas R. Puzak