Patents by Inventor Lakshminarayana B. Arimilli

Lakshminarayana B. Arimilli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20090199201
    Abstract: In a global shared memory (GSM) environment, an initiating task at a first node with a host fabric interface (HFI) uses epochs to provide reliability of transmission of packets via a network fabric to a target task. The HFI generates a packet for the initiating task addressed to the target task, and automatically inserts a current epoch of the initiating task into the packet. A copy of the current epoch is maintained by the target task, which accepts for processing only packets having the correct epoch, unless the packet is tagged for guaranteed-once delivery. When a packet delivery is accepted, the target task sends a notification to the initiating task. If the initiating task does not receive the notification of delivery for the issued packet, the initiating task updates the epoch at both the target node and the initiating node and re-transmits the packet.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Hanhong Xue
  • Publication number: 20090198912
    Abstract: A method of data processing in a cache memory includes caching a plurality of cache lines of data in a corresponding plurality of entries in a cache array, where each of the plurality of cache lines includes multiple data granules. For each of the plurality of cache entries, a plurality of line coherency state fields indicates an associated coherency state applicable to two or more data granules. For at least a particular cache line among the plurality of cache lines, a granule coherency state field indicates a coherency state for a particular granule of the multiple data granules in the particular cache line, where the coherency state field indicated by the granule coherency state field differs from that indicated for the particular cache line by its line coherency state field.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: LAKSHMINARAYANA B. ARIMILLI, Ravi K. Arimilli, Jerry D. Lewis, Warren E. Maule
  • Publication number: 20090198918
    Abstract: A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, William J. Starke, Hanhong Xue
  • Publication number: 20090199182
    Abstract: A method for providing global notification of completion of a global shared memory (GSM) operation during processing by a target task executing at a target node of a distributed system. The distributed system has at least one other node on which an initiating task that generated the GSM operation is homed. The target task receives the GSM operation from the initiating task, via a host fabric interface (HFI) window assigned to the target task. The task initiates execution of the GSM operation on the target node. The task detects completion of the execution of the GSM operation on the target node, and issues a global notification to at least the initiating task. The global notification indicates the completion of the execution of the GSM operation to one or more tasks of a single job distributed across multiple processing nodes.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Gheorghe C. Cascaval, Ramakrishnan Rajamony
  • Publication number: 20090198960
    Abstract: According to a method of data processing in a multiprocessor data processing system, in response to a processor request to read a target granule of a target cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a partial read request that requests permission to read only the target granule of the target cache line. In response to a combined response to the partial read request indicating success, the combined response representing a system-wide response to the partial read request, the processing unit receives the target granule of the target cache line, supplies the target granule to a requesting processor core, and updates a coherency state of the target granule while retaining a coherency state of at least one other granule of the target cache line.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. ARIMILLI, Ravi K. ARIMILLI, Jerry D. Lewis, Warren E. MAULE
  • Publication number: 20090199194
    Abstract: A method and data processing system for tracking global shared memory (GSM) operations to and from a local node configured with a host fabric interface (HFI) coupled to a network fabric. During task/job initialization, the system OS assigns HFI window(s) to handle the GSM packet generation and GSM packet receipt and processing for each local task. HFI processing logic automatically tags each GSM packet generated by the HFI window with a global job identifier (ID) of the job to which the local task is affiliated. The job ID is embedded within each GSM packet placed on the network fabric. On receipt of a GSM packet from the network fabric, the HFI logic retrieves the embedded job ID and compares the embedded job ID with the ID within the HFI window(s). GSM packets are forwarded to an HFI window only when the embedded job ID matches the HFI window's job ID.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Publication number: 20090198849
    Abstract: A memory lock mechanism within a multi-processor system is disclosed. A lock control section is initially assigned to a data block within a system memory of the multiprocessor system. In response to a request for accessing the data block by a processing unit within the multiprocessor system, a determination is made by a memory controller whether or not the lock control section of the data block has been set. If the lock control section of the data block has been set, the request for accessing the data block is denied. Otherwise, if the lock control section of the data block has not been set, the lock control section of the data block is set, and the request for accessing the data block is allowed.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Guy L. Guthrie, William J. Starke
  • Publication number: 20090198916
    Abstract: A method for supporting low-overhead memory locks within a multi-processor system is disclosed. A lock control section is initially assigned to a data block within a system memory of the multiprocessor system. In response to a request for accessing the data block by a processing unit within the multiprocessor system, a determination is made by a memory controller whether or not the lock control section of the data block has been set. If the lock control section of the data block has been set, the request for accessing the data block is ignored. Otherwise, if the lock control section of the data block has not been set, the lock control section of the data block is set, and the request for accessing the data block is allowed.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Guy L. Guthrie, Edward J. Seminaro, William J. Starke
  • Publication number: 20090198914
    Abstract: According to at least one embodiment, a method of data processing in a multiprocessor data processing system includes a requesting processing unit initiating an interconnect operation including a memory access request that indicates an acceptability of a variable amount of data to service the interconnect request for data. In response to snooping the memory access request on an interconnect, a snooper selects an amount of data to supply to the requesting processing unit and transmits the selected amount of data to the requesting processing unit. The requesting processing unit receives the selected amount of data and utilizes at least some of the selected amount of data to service a processor request.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: LAKSHMINARAYANA B. ARIMILLI, Ravi K. Arimilli, Jerry D. Lewis, Warren E. Maule
  • Publication number: 20090198958
    Abstract: A system and method for performing dynamic request routing based on broadcast source request information are provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide source request information to each of the other processor chips in the system. The source request information identifies the number of active source requests sent by the processor chip that originated the heartbeat signal. The source request information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis
  • Publication number: 20090199046
    Abstract: A host fabric interface (HFI) enables debugging of global shared memory (GSM) operations received at a local node from a network fabric. The local node has a memory management unit (MMU), which provides an effective address to real address (EA-to-RA) translation table that is utilized by the HFI to evaluate when EAs of GSM operations/data from a received GSM packet is memory-mapped to RAs of the local memory. The HFI retrieves the EA associated with a GSM operation/data within a received GSM packet. The HFI forwards the EA to the MMU, which determines when the EA is mapped to RAs within the local memory for the local task. The HFI processing logic enables processing of the GSM packet only when the EA of the GSM operation/data within the GSM packet is an EA that has a local RA translation. Non-matching EAs result in an error condition that requires debugging.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Publication number: 20090199209
    Abstract: A target task ensures complete delivery of a global shared memory (GSM) message from an originating task to the target task. The target task's HFI receives a first of multiple GSM packets generated from a single GSM message sent from the originating task. The HFI logic assigns a sequence number and corresponding tuple to track receipt of the complete GSM message. The sequence number is unique relative to other sequence numbers assigned to GSM messages that have not been completely received from the initiating task. The HFI updates a count value within the tuple, which comprises the sequence number and the count value for the first GSM packet and for each subsequent GSM packet received for the GSM message. The HFI determines when receipt of the GSM message is complete by comparing the count value with a count total retrieved from the packet header.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Publication number: 20090199191
    Abstract: In a global shared memory (GSM) environment, a method provides local notification of completion of a global shared memory (GSM) operation processed by a first task executing at a local node of the distributed system. The system includes multiple nodes on which different tasks of a single job execute and perform GSM operations that are received from a second task via a via host fabric interface (HFI) and associated HFR window assigned to the first tasks. The local task initiates execution of a GSM operation on the local node. The task then monitors for and detects a completion of the execution of the GSM operation on the local node. When the task detects completion of the execution of the GSM operation, the task issues an internal notification to inform the locally-executing tasks of the completion of the GSM operation.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Gheorghe C. Cascaval, Ramakrishnan Rajamony
  • Publication number: 20090198695
    Abstract: A locking mechanism for supporting distributed computing within a multiprocessor system is disclosed. A lock control section and a stage control section are assigned to a data block within a system memory. In response to a request for accessing the data block by a processing unit, a determination is made by a memory controller whether or not the lock control section of the data block has been set. If the lock control section of the data block has been set, the access request is denied. Otherwise, if the lock control section of the data block has not been set, another determination is made whether or not a current processing stage of the requesting processing unit matches a processing stage indicated by the stage control section. If the current processing stage of the requesting processing unit does not match the processing stage indicated by the stage control section, the access request is denied; otherwise, the access request is allowed.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Guy L. Guthrie, William J. Starke
  • Publication number: 20090198956
    Abstract: A system and method are provided for implementing a two-tier full-graph interconnect architecture. In order to implement a two-tier full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the two-tier full-graph interconnect architecture. Data is then transmitted from one processor to another within the two-tier full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor chip identifier associated with a target processor to which the data is to be transmitted.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Ramakrishnan Rajamony, Edward J. Seminaro, William E. Speight
  • Publication number: 20090198915
    Abstract: A method of data processing in a processing unit supported by a memory hierarchy includes the processing unit performing a plurality of memory accesses to the memory hierarchy. The plurality of memory accesses includes one or more memory accesses targeting a full cache line of data. The processing unit monitors utilization of data accessed by the plurality of memory accesses, and based upon the utilization of the data, dynamically alters a memory access mode of operation so that a subsequent storage-modifying memory access targets less than a full cache line of data.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: LAKSHMINARAYANA B. ARIMILLI, Ravi K. Arimilli, Jerry D. Lewis, Warren E. Maule
  • Publication number: 20090198920
    Abstract: A processing unit within a multiprocessor system adapted to support memory locks is disclosed. In response to a request for accessing a data block being denied when a lock control section of the data block has been set by a memory controller, a timer countdown is started within a processing unit within a multiprocessor system. The requesting processing unit can relinquish the access request once the timer countdown has reached a timeout period.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Guy L. Guthrie, William J. Starke
  • Publication number: 20090199195
    Abstract: A method for issuing global shared memory (GSM) operations from an originating task on a first node coupled to a network fabric of a distributed network via a host fabric interface (HFI). The originating task generates a GSM command within an effective address (EA) space. The task then places the GSM command within a send FIFO. The send FIFO is a portion of real memory having real addresses (RA) that are memory mapped to EAs of a globally executing job. The originating task maintains a local EA-to-RA mapping of only a portion of the real address space of the globally executing job. The task enables the HFI to retrieve the GSM command from the send FIFO into an HFI window allocated to the originating task. The HFI window generates a corresponding GSM packet containing GSM operations and/or data, and the HFI window issues the GSM packet to the network fabric.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Publication number: 20090198762
    Abstract: A method and a data processing system for completing checkpoint processing of a distributed job with local tasks communicating with other remote tasks via a host fabric interface (HFI) and assigned HFI window. Each HFI window has a send count and a receive count, which tracks GSM messages that are sent from and received at the HFI window. When a checkpoint is initiated by a master task, each local task forwards the send count and the receive count to the master task. The master task sums the respective counts and then compares the totals to each other. When the send count total is equal to the receive count total, the tasks are permitted to continue processing. However, when the send count total is not equal to the receive count total, the master task notifies each task of the job to rollback to a previous checkpoint or kill the job execution.
    Type: Application
    Filed: February 1, 2008
    Publication date: August 6, 2009
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Publication number: 20090070617
    Abstract: A method for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.
    Type: Application
    Filed: September 11, 2007
    Publication date: March 12, 2009
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Bernard C. Drerup, Jody B. Joyner, Jerry D. Lewis