Patents by Inventor William E. Speight

William E. Speight has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks

Publication number: 20090064166

Abstract: A system and method for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.

Type: Application

Filed: August 28, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
System and Method for Routing Information Through a Data Processing System Implementing a Multi-Tiered Full-Graph Interconnect Architecture

Publication number: 20090063814

Abstract: A method, computer program product, and system are provided for routing information through the data processing system. Data is received at a source processor within a set of processors that is to be transmitted to a destination processor, where the data includes address information. A first determination is performed as to whether the destination processor is within a same processor book as the source processor based on the address information. A second determination is performed as to whether the destination processor is within a same supernode as the source processor based on the address information if the destination processor is not within the same processor book. A routing path is identified for the data based on results of the first determination, the second determination, and one or more routing table data structures. The data is then transmitted from the source processor along the identified routing path toward the destination processor.

Type: Application

Filed: August 27, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
System and Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks By Modifying Tasks

Publication number: 20090064168

Abstract: A system and method are provided for providing hardware based dynamic load balancing of message passing interface (MPI) tasks by modifying tasks. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. Thus, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.

Type: Application

Filed: August 28, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
Method for Data Processing Using a Multi-Tiered Full-Graph Interconnect Architecture

Publication number: 20090064139

Abstract: A method is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted.

Type: Application

Filed: August 27, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, Edward J. Seminaro, William E. Speight
Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks

Publication number: 20090064165

Abstract: A method for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.

Type: Application

Filed: August 28, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
System and Method for Performing Collective Operations Using Software Setup and Partial Software Execution at Leaf Nodes in a Multi-Tiered Full-Graph Interconnect Architecture

Publication number: 20090063816

Abstract: A method, computer program product, and system are provided for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.

Type: Application

Filed: August 27, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
System and Computer Program Product for Modifying an Operation of One or More Processors Executing Message Passing Interface Tasks

Publication number: 20090063885

Abstract: A system and computer program product for modifying an operation of one or more processors executing message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.

Type: Application

Filed: August 28, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
System and Method for Providing Full Hardware Support of Collective Operations in a Multi-Tiered Full-Graph Interconnect Architecture

Publication number: 20090063815

Abstract: A method, computer program product, and system are provided for performing collective operations. In hardware of a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In hardware of the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.

Type: Application

Filed: August 27, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
System for Data Processing Using a Multi-Tiered Full-Graph Interconnect Architecture

Publication number: 20090063811

Abstract: A system is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted.

Type: Application

Filed: August 27, 2007

Publication date: March 5, 2009

Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, Edward J. Seminaro, William E. Speight
DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING IMPROVED BRANCH TARGET ADDRESS CACHE

Publication number: 20090049286

Abstract: A processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a level one branch target address cache (BTAC) and a level two BTAC each having a respective plurality of entries each associating at least a tag with a predicted branch target address. The branch logic accesses the level one and level two BTACs in parallel with a tag portion of a first instruction fetch address to obtain a first predicted branch target address from the level one BTAC for use as a second instruction fetch address in a first processor clock cycle and a second predicted branch target address from the level two BTAC for use as a third instruction fetch address in a later second processor clock cycle.

Type: Application

Filed: August 13, 2007

Publication date: February 19, 2009

Inventors: David S. Levitan, William E. Speight, Lixin Zhang
Dynamically adjusting a pre-fetch distance to enable just-in-time prefetching within a processing system

Patent number: 7487297

Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.

Type: Grant

Filed: June 6, 2006

Date of Patent: February 3, 2009

Assignee: International Business Machines Corporation

Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
System and method for contention-based cache performance optimization

Patent number: 7380068

Abstract: A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.

Type: Grant

Filed: October 27, 2005

Date of Patent: May 27, 2008

Assignee: International Business Machines Corporation

Inventors: Hazim Shafi, William E. Speight
Data Processing System, Processor and Method of Data Processing Having Improved Branch Target Address Cache

Publication number: 20080120496

Abstract: A processor includes an execution unit and instruction sequencing logic that fetches instructions for execution. The instruction sequencing logic includes a branch target address cache having a branch target buffer containing a plurality of entries each associating at least a portion of a branch instruction address with a predicted branch target address. The branch target address cache accesses the branch target buffer using a branch instruction address to obtain a predicted branch target address for use as an instruction fetch address. The branch target address cache also includes a filter buffer that buffers one or more candidate branch target address predictions. The filter buffer associates a respective confidence indication indicative of predictive accuracy with each candidate branch target address prediction. The branch target address cache promotes candidate branch target address predictions from the filter buffer to the branch target buffer based upon their respective confidence indications.

Type: Application

Filed: November 17, 2006

Publication date: May 22, 2008

Inventors: Jeffrey P. Bradford, Richard W. Doing, Richard J. Eickemeyer, Wael R. El-Essawy, Douglas R. Logan, Balaram Sinharoy, William E. Speight, Lixin Zhang
Efficient Multiple-Table Reference Prediction Mechanism

Publication number: 20080016330

Abstract: A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.

Type: Application

Filed: July 13, 2006

Publication date: January 17, 2008

Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
Just-In-Time Prefetching

Publication number: 20070283101

Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.

Type: Application

Filed: June 6, 2006

Publication date: December 6, 2007

Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
System and method of managing cache hierarchies with adaptive mechanisms

Patent number: 7281092

Abstract: A system and method of managing cache hierarchies with adaptive mechanisms. A preferred embodiment of the present invention includes, in response to selecting a data block for eviction from a memory cache (the source cache) out of a collection of memory caches, examining a data structure to determine whether an entry exists that indicates that the data block has been evicted from the source memory cache, or another peer cache, to a slower cache or memory and subsequently retrieved from the slower cache or memory into the source memory cache or other peer cache. Also, a preferred embodiment of the present invention includes, in response to determining the entry exists in the data structure, selecting a peer memory cache out of the collection of memory caches at the same level in the hierarchy to receive the data block from the source memory cache upon eviction.

Type: Grant

Filed: June 2, 2005

Date of Patent: October 9, 2007

Assignee: International Business Machines Corporation

Inventors: Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang

prev 1 2 3 4 5