Patents by Inventor Hanhong Xue

Hanhong Xue has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8756270
    Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
    Type: Grant
    Filed: April 24, 2012
    Date of Patent: June 17, 2014
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
  • Patent number: 8751655
    Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
    Type: Grant
    Filed: March 29, 2010
    Date of Patent: June 10, 2014
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
  • Patent number: 8607199
    Abstract: A technique for debugging code during runtime includes providing, from an outside process, a trigger to a daemon. In this case, the trigger is associated with a registered callback function. The trigger is then provided, from the daemon, to one or more designated tasks of a job. The registered callback function (that is associated with the trigger) is then executed by the one or more designated tasks. Execution results of the executed registered callback function are then returned (from the one or more designated tasks) to the daemon.
    Type: Grant
    Filed: December 16, 2009
    Date of Patent: December 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Chulho Kim, Hanhong Xue, Tsai-Yang Jea, Hung Q. Thai
  • Publication number: 20130247069
    Abstract: In a parallel computer executing a parallel application, where the parallel computer includes a number of compute nodes, with each compute node including one or more computer processors, the parallel application including a number of processes, and one or more of the processes executing a barrier operation, creating a checkpoint of a parallel application includes: maintaining, by each computer processor, global barrier operation state information, the global barrier operation state information includes an aggregation of each process's barrier operation state information; invoking, for each process of the parallel application, a checkpoint handler; saving, by each process's checkpoint handler as part of a checkpoint for the parallel application, the process's barrier operation state information; and exiting, by each process, the checkpoint handler.
    Type: Application
    Filed: March 15, 2012
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Wen Chen, Tsai-Yang Jea, William P. Lepera, Serban C. Maerean, Hung Q. Thai, Hanhong Xue, Zhi Zhang
  • Publication number: 20130232262
    Abstract: A message flow controller limits a process from passing a new message in a reliable message passing layer from a source node to at least one destination node while a total number of in-flight messages for the process meets a first level limit. The message flow controller limits the new message from passing from the source node to a particular destination node from among a plurality of destination nodes while a total number of in-flight messages to the particular destination node meets a second level limit. Responsive to the total number of in-flight messages to the particular destination node not meeting the second level limit, the message flow controller only sends a new packet from among at least one packet for the new message to the particular destination node while a total number of in-flight packets for the new message is less than a third level limit.
    Type: Application
    Filed: April 5, 2013
    Publication date: September 5, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: UMAN CHAN, DERYCK X. HONG, TSAI-YANG JEA, CHULHO KIM, ZENON J. PIATEK, HUNG Q. THAI, ABHINAV VISHNU, HANHONG XUE
  • Patent number: 8484307
    Abstract: A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: July 9, 2013
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, William J. Starke, Hanhong Xue
  • Publication number: 20130152101
    Abstract: A job may be divided into multiple tasks that may execute in parallel on one or more compute nodes. The tasks executing on the same compute node may be coordinated using barrier synchronization. However, to perform barrier synchronization, the tasks use (or attach) to a barrier synchronization register which establishes a common checkpoint for each of the tasks. A leader task may use a shared memory region to publish to follower tasks the location of the barrier synchronization register—i.e., a barrier synchronization register ID. The follower tasks may then monitor the shared memory to determine the barrier synchronization register ID. The leader task may also use a count to ensure all the tasks attach to the BSR. This advantageously avoids any task-to-task communication which may reduce overhead and improve performance.
    Type: Application
    Filed: December 8, 2011
    Publication date: June 13, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tsai-Yang Jea, William P. LePera, Hanhong Xue, Zhi Zhang
  • Patent number: 8452888
    Abstract: A message flow controller limits a process from passing a new message in a reliable message passing layer from a source node to at least one destination node while a total number of in-flight messages for the process meets a first level limit. The message flow controller limits the new message from passing from the source node to a particular destination node from among a plurality of destination nodes while a total number of in-flight messages to the particular destination node meets a second level limit. Responsive to the total number of in-flight messages to the particular destination node not meeting the second level limit, the message flow controller only sends a new packet from among at least one packet for the new message to the particular destination node while a total number of in-flight packets for the new message is less than a third level limit.
    Type: Grant
    Filed: July 22, 2010
    Date of Patent: May 28, 2013
    Assignee: International Business Machines Corporation
    Inventors: Uman Chan, Deryck X Hong, Tsai-Yang Jea, Chulho Kim, Zenon J Piatek, Hung Q Thai, Abhinav Vishnu, Hanhong Xue
  • Patent number: 8417778
    Abstract: A mechanism is provided for collective acceleration unit tree flow control forms a logical tree (sub-network) among those processors and transfers “collective” packets on this tree. The system supports many collective trees, and each collective acceleration unit (CAU) includes resources to support a subset of the trees. Each CAU has limited buffer space, and the connection between two CAUs is not completely reliable. Therefore, to address the challenge of collective packets traversing on the tree without colliding with each other for buffer space and guaranteeing the end-to-end packet delivery, each CAU in the system effectively flow controls the packets, detects packet loss, and retransmits lost packets.
    Type: Grant
    Filed: December 17, 2009
    Date of Patent: April 9, 2013
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Jody B. Joyner, Paul F. Lecocq, Hanhong Xue
  • Publication number: 20130067479
    Abstract: A parallel computer executes a number of tasks, each task includes a number of endpoints and the endpoints are configured to support collective operations. In such a parallel computer, establishing a group of endpoints receiving a user specification of a set of endpoints included in a global collection of endpoints, where the user specification defines the set in accordance with a predefined virtual representation of the endpoints, the predefined virtual representation is a data structure setting forth an organization of tasks and endpoints included in the global collection of endpoints and the user specification defines the set of endpoints without a user specification of a particular endpoint; and defining a group of endpoints in dependence upon the predefined virtual representation of the endpoints and the user specification.
    Type: Application
    Filed: September 13, 2011
    Publication date: March 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith, Hanhong Xue
  • Patent number: 8356151
    Abstract: A method performed in a data processing system initiates an asynchronous memory move (AMM) operation, whereby a processor performs a move of data in virtual address space from a first effective address to a second effective address and forwards parameters of the AMM operation to asynchronous memory mover logic for completion of the physical movement of data from a first memory location to a second memory location. The processor executes a second operation, which checks a status of the completion of the data move and returns a notification indicating the status. The notification indicates a status, which includes one of: data move in progress; data move totally done; data move partially done; data move cannot be performed; and occurrence of a translation look-aside buffer invalidate entry (TLBIE) operation. The processor initiates one or more actions in response to the notification received.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: January 15, 2013
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Robert S. Blackmore, Ronald N. Kalla, Chulho Kim, Balaram Sinharoy, Hanhong Xue
  • Patent number: 8327101
    Abstract: A data processing system includes a mechanism for completing an asynchronous memory move (AMM) operation in which the processor receives an AMM ST instruction and processes a processor-level move of data in virtual address space and an asynchronous memory mover then completes a physical move of the data within the real address space (memory). A status/control field of the AMM ST instruction includes an indication of a requested treatment of the lower level cache(s) on completion of the AMM operation. When the status/control field indicates an update to at least one cache should be performed, the asynchronous memory mover automatically forwards a copy of the data from the data move to the lower level cache, and triggers an update of a coherency state for a cache line in which the copy of the data is placed.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: December 4, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Robert S. Blackmore, Chulho Kim, Balaram Sinharoy, Hanhong Xue
  • Publication number: 20120296915
    Abstract: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
    Type: Application
    Filed: April 24, 2012
    Publication date: November 22, 2012
    Applicant: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Bernard C. Drerup, Paul F. Lecocq, Hanhong Xue
  • Patent number: 8275963
    Abstract: A distributed data processing system includes: (1) a first node with a processor, a first memory, and asynchronous memory mover logic; and connection mechanism that connects (2) a second node having a second memory. The processor includes processing logic for completing a cross-node asynchronous memory move (AMM) operation, wherein the processor performs a move of data in virtual address space from a first effective address to a second effective address, and the asynchronous memory mover logic completes a physical move of the data from a first memory location in the first memory having a first real address to a second memory location in the second memory having a second real address. The data is transmitted via the connection mechanism connecting the two nodes independent of the processor.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: September 25, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Robert S. Blackmore, Chulho Kim, Balaram Sinharoy, Hanhong Xue
  • Patent number: 8275947
    Abstract: A method and data processing system for tracking global shared memory (GSM) operations to and from a local node configured with a host fabric interface (HFI) coupled to a network fabric. During task/job initialization, the system OS assigns HFI window(s) to handle the GSM packet generation and GSM packet receipt and processing for each local task. HFI processing logic automatically tags each GSM packet generated by the HFI window with a global job identifier (ID) of the job to which the local task is affiliated. The job ID is embedded within each GSM packet placed on the network fabric. On receipt of a GSM packet from the network fabric, the HFI logic retrieves the embedded job ID and compares the embedded job ID with the ID within the HFI window(s). GSM packets are forwarded to an HFI window only when the embedded job ID matches the HFI window's job ID.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: September 25, 2012
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Patent number: 8245004
    Abstract: A data processing system includes a set of architected registers within which the processor places state and other information to communicate with the asynchronous memory mover in order to initiate and control an AMM operation. The asynchronous memory mover performs an asynchronous memory move (AMM) operation in response to receiving a set of parameters within the architected registers, which parameters are associated with an AMM store instruction executed by the processor to initiates a move of data in virtual space before placing the information in the architected registers. The architected registers are processor architected registers, defined on a per thread basis by a compiler, or memory mapped architected registers allocated for communicating with the asynchronous memory mover during a bind and subsequent execution of an application. The architected registers are also utilized to store state information to enable a restore to a point before execution of the AMM operation.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: August 14, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Robert S. Blackmore, Chulho Kim, Balaram Sinharoy, Hanhong Xue
  • Patent number: 8214604
    Abstract: A method and data processing system for performing fence operations within a global shared memory (GSM) environment having a local task executing on a processor and providing GSM commands for processing by a host fabric interface (HFI) window that is allocated to the task. The HFI window has one or more registers for use during local fence operations. A first register tracks a first count of task-issued GSM commands, and a second register tracks a second count of GSM operations being processed by the HFI. The processing logic detects a locally-issued fence operation, and responds by performing a series of operations, including: automatically stopping the task from issuing additional GSM commands; monitoring for completion of all the task-issued GSM commands at the HFI; and triggering a resumption of issuance of GSM commands by the task when the completion of all previous task-issued GSM commands is registered by the HFI.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: July 3, 2012
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Armilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Patent number: 8200910
    Abstract: A method for issuing global shared memory (GSM) operations from an originating task on a first node coupled to a network fabric of a distributed network via a host fabric interface (HFI). The originating task generates a GSM command within an effective address (EA) space. The task then places the GSM command within a send FIFO. The send FIFO is a portion of real memory having real addresses (RA) that are memory mapped to EAs of a globally executing job. The originating task maintains a local EA-to-RA mapping of only a portion of the real address space of the globally executing job. The task enables the HFI to retrieve the GSM command from the send FIFO into an HFI window allocated to the originating task. The HFI window generates a corresponding GSM packet containing GSM operations and/or data, and the HFI window issues the GSM packet to the network fabric.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: June 12, 2012
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Patent number: 8146094
    Abstract: A target task ensures complete delivery of a global shared memory (GSM) message from an originating task to the target task. The target task's HFI receives a first of multiple GSM packets generated from a single GSM message sent from the originating task. The HFI logic assigns a sequence number and corresponding tuple to track receipt of the complete GSM message. The sequence number is unique relative to other sequence numbers assigned to GSM messages that have not been completely received from the initiating task. The HFI updates a count value within the tuple, which comprises the sequence number and the count value for the first GSM packet and for each subsequent GSM packet received for the GSM message. The HFI determines when receipt of the GSM message is complete by comparing the count value with a count total retrieved from the packet header.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Robert S. Blackmore, Chulho Kim, Ramakrishnan Rajamony, Hanhong Xue
  • Patent number: 8141084
    Abstract: This present invention provides a portable user space application release/reacquire of adapter resources for a given job on a node using information in a network resource table. The information in the network resource table is obtained when a user space application is loaded by some resource manager. The present invention provides a portable solution that will work for any interconnect where adapter resources need to be freed and reacquired without having to write a specific function in the device driver. In the present invention, the preemption request is done on a job basis using a key or “job key” that was previously loaded when the user space application or job originally requested the adapter resources. This is done for each OS instance where the job is run.
    Type: Grant
    Filed: April 7, 2008
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Richard J. Coppinger, Deryck X. Hong, Chulho Kim, Hanhong Xue