Patents by Inventor David L. Darrington

David L. Darrington has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20100122199
    Abstract: A hybrid node of a High Performance Computing (HPC) cluster uses accelerator nodes for checkpointing to increase overall efficiency of the multi-node computing system. The host node or processor node reads/writes checkpoint data to the accelerators. After offloading the checkpoint data to the accelerators, the host processor can continue processing while the accelerators communicate the checkpoint data with the host or wait for the next checkpoint. The accelerators may also perform dynamic compression and decompression of the checkpoint data to reduce the checkpoint size and reduce network loading. The accelerators may also communicate with other node accelerators to compare checkpoint data to reduce the amount of checkpoint data stored to the host.
    Type: Application
    Filed: November 13, 2008
    Publication date: May 13, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David L. Darrington, Matthew W. Markland, Philip James Sanders, Richard Michael Shok
  • Publication number: 20100095100
    Abstract: A method, apparatus, and program product checkpoint an application in a parallel computing system of the type that includes a plurality of hybrid nodes. Each hybrid node includes a host element and a plurality of accelerator elements. Each host element may include at least one multithreaded processor, and each accelerator element may include at least one multi-element processor. In a first hybrid node from among the plurality of hybrid nodes, checkpointing the application includes executing at least a portion of the application in the host element and at least one accelerator element and, in response to receiving a command to checkpoint the application, checkpointing the host element separately from the at least one accelerator element.
    Type: Application
    Filed: October 9, 2008
    Publication date: April 15, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David L. Darrington, Matthew W. Markland, Philip James Sanders, Richard Michael Shok
  • Publication number: 20100095152
    Abstract: A method, apparatus, and program product checkpoint an application in a parallel computing system of the type that includes a plurality of hybrid nodes. Each hybrid node includes a host element and a plurality of accelerator elements. Each host element may include at least one multithreaded processor, and each accelerator element may include at least one multi-element processor. In a first hybrid node from among the plurality of hybrid nodes, checkpointing the application includes executing at least a portion of the application in the host element, configuring and executing at least one computation kernel in at least one accelerator element, and, in response to receiving a command to checkpoint the application, checkpointing the host element separately from the at least one accelerator element upon which the at least one computation kernel is executing.
    Type: Application
    Filed: October 9, 2008
    Publication date: April 15, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David L. Darrington, Matthew W. Markland, Philip James Sanders, Richard Michael Shok
  • Publication number: 20100085870
    Abstract: A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.
    Type: Application
    Filed: October 2, 2008
    Publication date: April 8, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eric L. Barsness, David L. Darrington, Amanda E. Peters, John M. Santosuosso
  • Publication number: 20100085871
    Abstract: A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.
    Type: Application
    Filed: October 2, 2008
    Publication date: April 8, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eric L. Barsness, David L. Darrington, Amanda E. Peters, John M. Santosuosso
  • Publication number: 20100063971
    Abstract: Systems and articles of manufacture for managing annotations made for a variety of different type data objects manipulated (e.g., created, edited, and viewed) by a variety of different type applications are provided. Some embodiments allow users collaborating on a project to create, view, and edit annotations from within the applications used to manipulate the annotated data objects, which may facilitate and encourage the capturing and sharing of tacit knowledge through annotations. Further, annotations may be stored separate from the application data they describe, decoupling the tacit knowledge captured in the annotations from the applications used to manipulate the annotated data.
    Type: Application
    Filed: November 16, 2009
    Publication date: March 11, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian J. Cragun, David L. Darrington, Lonnie A. McCullough
  • Patent number: 7647484
    Abstract: An apparatus, program product and method sample at different times nodes that are performing similar work. Performance data associated with first and second node subsets performing the similar work are sampled at different times, e.g., in a round-robin fashion, and in accordance with a given sampling rate. The performance data is analyzed. Nodes whose performance suffers as a result of a sampling operation may be identified and removed from a subsequent operation.
    Type: Grant
    Filed: February 23, 2007
    Date of Patent: January 12, 2010
    Assignee: International Business Machines Corporation
    Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda E. Peters, John Matthew Santosuosso
  • Patent number: 7644254
    Abstract: A method and apparatus for dynamically rerouting node processes on the compute nodes of a massively parallel computer system using hint bits to route around failed nodes or congested networks without restarting applications executing on the system. When a node has a failure or there are indications that it may fail, the application software on the system is suspended while the data on the failed node is moved to a backup node. The torus network traffic is routed around the failed node and traffic for the failed node is rerouted to the backup node. The application can then resume operation without restarting from the beginning.
    Type: Grant
    Filed: April 18, 2007
    Date of Patent: January 5, 2010
    Assignee: International Business Machines Corporation
    Inventors: David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, Albert Sidelnik, Brian Edward Smith, Brent Allen Swartz
  • Publication number: 20090320023
    Abstract: A process on a highly distributed parallel computing system is disclosed. When a first compute node in a first pool is ready to hand-off a task to second pool for further processing, the first compute node may first determine whether a node is available in the second pool. If no node is available from the second pool, then the first compute node may begin performing a primary task assigned to the second pool of nodes, up to the point where a service available exclusively to the nodes of the second pool is required. In the interim, however, one of the nodes of the second pool may become available. Alternatively, an application program running on a compute node may be configured with an exception handling routine that catches exceptions and migrates the application to a compute node where a necessary service is available, as such exceptions occur.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 24, 2009
    Inventors: Eric L. Barsness, David L. Darrington, Amanda E. Peters, John M. Santosuosso
  • Publication number: 20090319662
    Abstract: A process on a highly distributed parallel computing system is disclosed. When a first compute node in a first pool is ready to hand-off a task to second pool for further processing, the first compute node may first determine whether a node is available in the second pool. If no node is available from the second pool, then the first compute node may begin performing a primary task assigned to the second pool of nodes, up to the point where a service available exclusively to the nodes of the second pool is required. In the interim, however, one of the nodes of the second pool may become available. Alternatively, an application program running on a compute node may be configured with an exception handling routine that catches exceptions and migrates the application to a compute node where a necessary service is available, as such exceptions occur.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 24, 2009
    Inventors: Eric L. Barsness, David L. Darrington, Amanda E. Peters, John M. Santosuosso
  • Publication number: 20090320003
    Abstract: Embodiments of the invention enable application programs running across multiple compute nodes of a highly-parallel system to compile source code into native instructions, and subsequently share the optimizations used to compile the source code with other nodes. For example, determining what optimizations to use may consume significant processing power and memory on a node. In cases where multiple nodes exhibit similar characteristics, it is possible that these nodes may use the same set of optimizations when compiling similar pieces of code. Therefore, when one node compiles source code into native instructions, it may share the optimizations used with other similar nodes, thereby removing the burden for the other nodes to figure out which optimizations to use. Thus, while one node may suffer a performance hit for determining the necessary optimizations, other nodes may be saved from this burden by simply using the optimizations provided to them.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 24, 2009
    Inventors: Eric L. Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
  • Publication number: 20090320008
    Abstract: Embodiments of the invention enable application programs running across multiple compute nodes of a highly-parallel system to compile source code into native instructions, and subsequently share the optimizations used to compile the source code with other nodes. For example, determining what optimizations to use may consume significant processing power and memory on a node. In cases where multiple nodes exhibit similar characteristics, it is possible that these nodes may use the same set of optimizations when compiling similar pieces of code. Therefore, when one node compiles source code into native instructions, it may share the optimizations used with other similar nodes, thereby removing the burden for the other nodes to figure out which optimizations to use. Thus, while one node may suffer a performance hit for determining the necessary optimizations, other nodes may be saved from this burden by simply using the optimizations provided to them.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 24, 2009
    Inventors: Eric L Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
  • Publication number: 20090319621
    Abstract: Embodiments of the invention provide for controlling message flow across a parallel computer system having multiple compute nodes by selectively grouping compute nodes of such a system into node pools and assigning message flow control policies to nodes in the node pools. The message flow control policies specify logging and/or tracing activities to be performed by instances of applications running on nodes assigned to the node pools. As the application is executed, logging and/or tracing messages are generated on the compute nodes according to message flow control policies assigned to the nodes. Optionally, the message flow is analyzed, the message flow control policies are adjusted, and duplicate messages are eliminated.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 24, 2009
    Inventors: Eric L. Barsness, David L. Darrington, Amanda Peters, John M. Santosuosso
  • Publication number: 20090313636
    Abstract: Methods, apparatus, and products are disclosed for executing an application on a parallel computer that include: executing, by a current compute node, a current task of the application, including producing results; determining, by the current compute node in dependence upon current network characteristics and application characteristics, whether to transfer the results to a next compute node for further processing by a next task on the next compute node or to execute the next task for further processing of the results on the current compute node; transferring, by the current compute node, the results to the next compute node for further processing by the next task on the next compute node if the determination specifies transferring the results to the next node; and executing, by the current compute node, the next task for further processing of the results if the determination specifies executing the next task on the current compute node.
    Type: Application
    Filed: June 16, 2008
    Publication date: December 17, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eric L. Barsness, Michael B. Brutman, David L. Darrington, Amanda E. Peters, John M. Santosuosso
  • Publication number: 20090313452
    Abstract: A method and apparatus creates and manages persistent memory (PM) in a multi-node computing system. A PM Manager in the service node creates and manages pools of nodes with various sizes of PM. A node manager uses the pools of nodes to load applications to the nodes according to the size of the available PM. The PM Manager can dynamically adjust the size of the PM according to the needs of the applications based on historical use or as determined by a system administrator. The PM Manager works with an operating system kernel on the nodes to provide persistent memory for application data and system metadata. The PM Manager uses the persistent memory to load applications to preserve data from one application to the next. Also, the data preserved in persistent memory may be system metadata such as file system data that will be available to subsequent applications.
    Type: Application
    Filed: October 29, 2007
    Publication date: December 17, 2009
    Inventors: Eric Lawrence Barsness, David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, John Matthew Santosuosso
  • Publication number: 20090307290
    Abstract: A database spread over multiple nodes allows each node to store a journal recording changes made to the database and also allows a journaling component to manage the memory space available for journaling. Two threshold size values may be specified for the journal. The first threshold value specifies a journal size at which to being pruning the journal on a given node. A journal pruning algorithm may be used to identify journal entries that may be removed. For example, once a given transaction completes (i.e., commits) the journal entries related to that transaction may be pruned from the journal. The second threshold value specifies the maximum size of the journal. After reaching this size, journal entries may be written to disk instead of the in-memory journal.
    Type: Application
    Filed: June 10, 2008
    Publication date: December 10, 2009
    Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
  • Publication number: 20090307466
    Abstract: A method, apparatus, and program product share a resource in a computing system that includes a plurality of computing cores. A request from a second execution context (“EC”) to lock the resource currently locked by a first EC on a first core causes replication of the second EC as a third EC on a third core. The first and third ECs are executed substantially concurrently. When the first EC modifies the resource, the third EC is restarted after the resource has been modified. Alternately, a first EC is configured in a first core and shadowed as a second EC in a second core. In response to a blocked lock request, the first EC is halted and the second EC continues. After granting a lock, it is determined whether a conflict has occurred and the first and second EC are particularly synchronized to each other in response to that determination.
    Type: Application
    Filed: June 10, 2008
    Publication date: December 10, 2009
    Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
  • Publication number: 20090307287
    Abstract: A database spread over multiple nodes allows each node to store a journal recording changes made to the database and also allows a journaling component to manage the memory space available for journaling. Two threshold size values may be specified for the journal. The first threshold value specifies a journal size at which to being pruning the journal on a given node. A journal pruning algorithm may be used to identify journal entries that may be removed. For example, once a given transaction completes (i.e., commits) the journal entries related to that transaction may be pruned from the journal. The second threshold value specifies the maximum size of the journal. After reaching this size, journal entries may be written to disk instead of the in-memory journal.
    Type: Application
    Filed: June 10, 2008
    Publication date: December 10, 2009
    Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
  • Patent number: 7631169
    Abstract: A method and apparatus for fault recovery of on a parallel computer system from a soft failure without ending an executing job on a partition of nodes. In preferred embodiments a failed hardware recovery mechanism on a service node uses a heartbeat monitor to determine when a node failure occurs. Where possible, the failed node is reset and re-loaded with software without ending the software job being executed by the partition containing the failed node.
    Type: Grant
    Filed: February 2, 2007
    Date of Patent: December 8, 2009
    Assignee: International Business Machines Corporation
    Inventors: David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, Albert Sidelnik
  • Publication number: 20090300752
    Abstract: The disclosure herein provides data security on a parallel computer system using virtual private networks connecting the nodes of the system. A mechanism sets up access control data in the nodes that describes a number of security classes. Each security class is associated with a virtual network. Each user on the system is associated with one of the security classes. Each database object to be protected is given an attribute of a security class. Database objects are loaded into the system nodes that match the security class of the database object. When a query executes on the system, the query is sent to a particular class or set of classes such that the query is only seen by those nodes that are authorized by the equivalent security class. In this way, the network is used to isolate data from users that do not have proper authorization to access the data.
    Type: Application
    Filed: May 27, 2008
    Publication date: December 3, 2009
    Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso