Patents by Inventor Lakshminarayana B. Arimilli

Lakshminarayana B. Arimilli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20090064165
    Abstract: A method for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063444
    Abstract: A method, computer program product, and system are provided for selecting, from a plurality of routes through the data processing system, a direct route for transmitting data. Data that includes address information is received at a first processor that is to be transmitted to a destination processor. Using routing table data structures, direct route entries are identified that correspond to direct routes for transmitting data. An accessed priority table data structure comprises a priority entry for each entry in the routing table data structures. The priority entry specifies a priority of a corresponding entry in the routing table data structures. A direct route entry is selected that corresponds to a direct route from the routing table data structures, based on specified priorities. Then the data is transmitted from the first processor to the destination processor using a path corresponding to the selected direct route entry.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony
  • Publication number: 20090063816
    Abstract: A method, computer program product, and system are provided for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063814
    Abstract: A method, computer program product, and system are provided for routing information through the data processing system. Data is received at a source processor within a set of processors that is to be transmitted to a destination processor, where the data includes address information. A first determination is performed as to whether the destination processor is within a same processor book as the source processor based on the address information. A second determination is performed as to whether the destination processor is within a same supernode as the source processor based on the address information if the destination processor is not within the same processor book. A routing path is identified for the data based on results of the first determination, the second determination, and one or more routing table data structures. The data is then transmitted from the source processor along the identified routing path toward the destination processor.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063445
    Abstract: A method, computer program product, and system are provided for selecting, from a plurality of routes through the data processing system, an indirect route for transmitting data. Data that includes address information is received at a first processor that is to be transmitted to a destination processor. Using routing table data structures, indirect route entries are identified that correspond to indirect routes for transmitting data. An accessed priority table data structure comprises a priority entry for each entry in the routing table data structures. The priority entry specifies a priority of a corresponding entry in the routing table data structures. An indirect route entry is selected that corresponds to an indirect route from the routing table data structures, based on specified priorities. Then the data is transmitted from the first processor to the destination processor using a path corresponding to the selected indirect route entry.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony
  • Publication number: 20090063728
    Abstract: A method, computer program product, and system are provided for transmitting data in a data network. A first processor of the data network receives data to be transmitted to a second processor within the data network. A determination is made if the data has previously been routed through an indirect communication link from a source processor, the indirect communication link being a communication link that does not directly couple the source processor to a final destination processor which is to receive the data. A communication link is selected over which to transmit the data from the first processor to the second processor based on results of determining if the data has previously been routed through an indirect communication link. Finally, the data is transmitted from the first processor to the second processor using the selected communication link.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony
  • Publication number: 20090063443
    Abstract: A method, computer program product, and system are provided for dynamically routing data through the data processing system. Data is received at a first processor that is to be transmitted to a destination processor. The data that is received includes address information. A lookup is performed in routing table data structures based on the address information to identify candidate paths through which the data is routed to the destination processor. A determination is made as to whether any of the candidate paths are not able to be used to route the data to the destination processor based on a setting of at least one identifier. A path is selected from the identified candidate paths for routing of the data based on a setting of the at least one identifier. Then, the data is transmitted from the first processor along the selected path toward the destination processor.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony
  • Publication number: 20090064166
    Abstract: A system and method for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090064167
    Abstract: A system and method are provided for performing setup operations for receiving a different amount of data while processors are performing message passing interface (MPI) tasks. Mechanisms for adjusting the balance of processing workloads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. An MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, setup operations may be performed while processors are performing MPI tasks to prepare for receiving different sized portions of data in a subsequent computation cycle based on the history.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090064168
    Abstract: A system and method are provided for providing hardware based dynamic load balancing of message passing interface (MPI) tasks by modifying tasks. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. Thus, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063817
    Abstract: A method, computer program product, and system are provided for packet coalescing in virtual channels of a data processing system. A first processor bundles original data to be transmitted to a destination processor, the original data provided by a first source processor. The first processor transmits the bundle of data to a second processor along a path to the destination processor. The second processor determines if the second processor has additional data destined for the same destination processor, the additional data being provided by a second source processor that is different from the first source processor. Responsive to the second processor having additional data, the second processor unbundles the original data, adds the additional data to the original data, and rebundles the data along with the additional data. Then the second processor transmits the rebundled data to at least one other processor along the path to the destination processor.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony
  • Publication number: 20090063891
    Abstract: A method, computer program product, and system are provided for providing reliability of communication. A first processor determines a current state of links coupled to ports of a first processor of the data processing system. Each port of the first processor comprises a plurality of links to a corresponding port on a second processor of the data processing system. The current state of the links indicates a level of error associated with each link. The first processor determines, for each link, if a level of error associated with the link exceeds a threshold. For each link whose level of error exceeds the threshold, the first processor tags the link with an error identifier in a switch associated with the ports of the first processor. The first processor reduces a level of usage for transmitting data on ports associated with links tagged with the error identifier.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony
  • Publication number: 20090064139
    Abstract: A method is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, Edward J. Seminaro, William E. Speight
  • Publication number: 20090063886
    Abstract: A system for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.
    Type: Application
    Filed: August 31, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis
  • Publication number: 20090064140
    Abstract: A method, computer program product, and system are provided for transmitting data from a first processor of a data processing system to a second processor of the data processing system. In one or more switches, a set of virtual channels is created, the one or more switches comprising, for each processor, a corresponding switch in the one or more switches. The data is transmitted from the first processor to the second processor through a path comprising a subset of processors of a set of processors in the data processing system. In each processor of the subset of processors, the data is stored in a virtual channel of a corresponding switch before transmitting the data to a next processor. The virtual channel of the corresponding switch in which the data is stored corresponds to a position of the processor in the path through which the data is transmitted.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony
  • Publication number: 20090063885
    Abstract: A system and computer program product for modifying an operation of one or more processors executing message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
    Type: Application
    Filed: August 28, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063815
    Abstract: A method, computer program product, and system are provided for performing collective operations. In hardware of a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In hardware of the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, William E. Speight
  • Publication number: 20090063811
    Abstract: A system is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, Edward J. Seminaro, William E. Speight
  • Publication number: 20090063880
    Abstract: A method, computer program product, and system are provided performing a Message Passing Interface (MPI) job. A first processor chip receives a set of arrival signals from a set of processor chips executing tasks of the MPI job in the data processing system. The arrival signals identify when a processor chip executes a synchronization operation for synchronizing the tasks for the MPI job. Responsive to receiving the set of arrival signals from the set of processor chips, the first processor chip identifies a fastest processor chip of the set of processor chips whose arrival signal arrived first. An operation of the fastest processor chip is modified based on the identification of the fastest processor chip. The set of processor chips comprises processor chips that are in one of a same processor book or a different processor book of the data processing system.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Inventors: Lakshminarayana B Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony