Patents by Inventor William E. Speight

William E. Speight has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20090064166
    Abstract: A system and method for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063814
    Abstract: A method, computer program product, and system are provided for routing information through the data processing system. Data is received at a source processor within a set of processors that is to be transmitted to a destination processor, where the data includes address information. A first determination is performed as to whether the destination processor is within a same processor book as the source processor based on the address information. A second determination is performed as to whether the destination processor is within a same supernode as the source processor based on the address information if the destination processor is not within the same processor book. A routing path is identified for the data based on results of the first determination, the second determination, and one or more routing table data structures. The data is then transmitted from the source processor along the identified routing path toward the destination processor.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090064168
    Abstract: A system and method are provided for providing hardware based dynamic load balancing of message passing interface (MPI) tasks by modifying tasks. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. Thus, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090064139
    Abstract: A method is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, Edward J. Seminaro, William E. Speight
  • Publication number: 20090064165
    Abstract: A method for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063816
    Abstract: A method, computer program product, and system are provided for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063885
    Abstract: A system and computer program product for modifying an operation of one or more processors executing message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063815
    Abstract: A method, computer program product, and system are provided for performing collective operations. In hardware of a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In hardware of the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063811
    Abstract: A system is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, Edward J. Seminaro, William E. Speight
  • Publication number: 20090049286
    Abstract: A processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a level one branch target address cache (BTAC) and a level two BTAC each having a respective plurality of entries each associating at least a tag with a predicted branch target address. The branch logic accesses the level one and level two BTACs in parallel with a tag portion of a first instruction fetch address to obtain a first predicted branch target address from the level one BTAC for use as a second instruction fetch address in a first processor clock cycle and a second predicted branch target address from the level two BTAC for use as a third instruction fetch address in a later second processor clock cycle.
    Type: Application
    Filed: August 13, 2007
    Publication date: February 19, 2009
    Inventors: David S. Levitan, William E. Speight, Lixin Zhang
  • Patent number: 7487297
    Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.
    Type: Grant
    Filed: June 6, 2006
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
  • Patent number: 7380068
    Abstract: A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.
    Type: Grant
    Filed: October 27, 2005
    Date of Patent: May 27, 2008
    Assignee: International Business Machines Corporation
    Inventors: Hazim Shafi, William E. Speight
  • Publication number: 20080120496
    Abstract: A processor includes an execution unit and instruction sequencing logic that fetches instructions for execution. The instruction sequencing logic includes a branch target address cache having a branch target buffer containing a plurality of entries each associating at least a portion of a branch instruction address with a predicted branch target address. The branch target address cache accesses the branch target buffer using a branch instruction address to obtain a predicted branch target address for use as an instruction fetch address. The branch target address cache also includes a filter buffer that buffers one or more candidate branch target address predictions. The filter buffer associates a respective confidence indication indicative of predictive accuracy with each candidate branch target address prediction. The branch target address cache promotes candidate branch target address predictions from the filter buffer to the branch target buffer based upon their respective confidence indications.
    Type: Application
    Filed: November 17, 2006
    Publication date: May 22, 2008
    Inventors: Jeffrey P. Bradford, Richard W. Doing, Richard J. Eickemeyer, Wael R. El-Essawy, Douglas R. Logan, Balaram Sinharoy, William E. Speight, Lixin Zhang
  • Publication number: 20080016330
    Abstract: A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.
    Type: Application
    Filed: July 13, 2006
    Publication date: January 17, 2008
    Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
  • Publication number: 20070283101
    Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.
    Type: Application
    Filed: June 6, 2006
    Publication date: December 6, 2007
    Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
  • Patent number: 7281092
    Abstract: A system and method of managing cache hierarchies with adaptive mechanisms. A preferred embodiment of the present invention includes, in response to selecting a data block for eviction from a memory cache (the source cache) out of a collection of memory caches, examining a data structure to determine whether an entry exists that indicates that the data block has been evicted from the source memory cache, or another peer cache, to a slower cache or memory and subsequently retrieved from the slower cache or memory into the source memory cache or other peer cache. Also, a preferred embodiment of the present invention includes, in response to determining the entry exists in the data structure, selecting a peer memory cache out of the collection of memory caches at the same level in the hierarchy to receive the data block from the source memory cache upon eviction.
    Type: Grant
    Filed: June 2, 2005
    Date of Patent: October 9, 2007
    Assignee: International Business Machines Corporation
    Inventors: Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang