Patents by Inventor David L. Darrington

David L. Darrington has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Using Accelerators in a Hybrid Architecture for System Checkpointing

Publication number: 20100122199

Abstract: A hybrid node of a High Performance Computing (HPC) cluster uses accelerator nodes for checkpointing to increase overall efficiency of the multi-node computing system. The host node or processor node reads/writes checkpoint data to the accelerators. After offloading the checkpoint data to the accelerators, the host processor can continue processing while the accelerators communicate the checkpoint data with the host or wait for the next checkpoint. The accelerators may also perform dynamic compression and decompression of the checkpoint data to reduce the checkpoint size and reduce network loading. The accelerators may also communicate with other node accelerators to compare checkpoint data to reduce the amount of checkpoint data stored to the host.

Type: Application

Filed: November 13, 2008

Publication date: May 13, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David L. Darrington, Matthew W. Markland, Philip James Sanders, Richard Michael Shok
Checkpointing A Hybrid Architecture Computing System

Publication number: 20100095100

Abstract: A method, apparatus, and program product checkpoint an application in a parallel computing system of the type that includes a plurality of hybrid nodes. Each hybrid node includes a host element and a plurality of accelerator elements. Each host element may include at least one multithreaded processor, and each accelerator element may include at least one multi-element processor. In a first hybrid node from among the plurality of hybrid nodes, checkpointing the application includes executing at least a portion of the application in the host element and at least one accelerator element and, in response to receiving a command to checkpoint the application, checkpointing the host element separately from the at least one accelerator element.

Type: Application

Filed: October 9, 2008

Publication date: April 15, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David L. Darrington, Matthew W. Markland, Philip James Sanders, Richard Michael Shok
Checkpointing A Hybrid Architecture Computing System

Publication number: 20100095152

Abstract: A method, apparatus, and program product checkpoint an application in a parallel computing system of the type that includes a plurality of hybrid nodes. Each hybrid node includes a host element and a plurality of accelerator elements. Each host element may include at least one multithreaded processor, and each accelerator element may include at least one multi-element processor. In a first hybrid node from among the plurality of hybrid nodes, checkpointing the application includes executing at least a portion of the application in the host element, configuring and executing at least one computation kernel in at least one accelerator element, and, in response to receiving a command to checkpoint the application, checkpointing the host element separately from the at least one accelerator element upon which the at least one computation kernel is executing.

Type: Application

Filed: October 9, 2008

Publication date: April 15, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David L. Darrington, Matthew W. Markland, Philip James Sanders, Richard Michael Shok
GLOBAL DETECTION OF RESOURCE LEAKS IN A MULTI-NODE COMPUTER SYSTEM

Publication number: 20100085870

Abstract: A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.

Type: Application

Filed: October 2, 2008

Publication date: April 8, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eric L. Barsness, David L. Darrington, Amanda E. Peters, John M. Santosuosso
RESOURCE LEAK RECOVERY IN A MULTI-NODE COMPUTER SYSTEM

Publication number: 20100085871

Abstract: A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.

Type: Application

Filed: October 2, 2008

Publication date: April 8, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eric L. Barsness, David L. Darrington, Amanda E. Peters, John M. Santosuosso
UNIVERSAL ANNOTATION CONFIGURATION AND DEPLOYMENT

Publication number: 20100063971

Abstract: Systems and articles of manufacture for managing annotations made for a variety of different type data objects manipulated (e.g., created, edited, and viewed) by a variety of different type applications are provided. Some embodiments allow users collaborating on a project to create, view, and edit annotations from within the applications used to manipulate the annotated data objects, which may facilitate and encourage the capturing and sharing of tacit knowledge through annotations. Further, annotations may be stored separate from the application data they describe, decoupling the tacit knowledge captured in the annotations from the applications used to manipulate the annotated data.

Type: Application

Filed: November 16, 2009

Publication date: March 11, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian J. Cragun, David L. Darrington, Lonnie A. McCullough
Low-impact performance sampling within a massively parallel computer

Patent number: 7647484

Abstract: An apparatus, program product and method sample at different times nodes that are performing similar work. Performance data associated with first and second node subsets performing the similar work are sampled at different times, e.g., in a round-robin fashion, and in accordance with a given sampling rate. The performance data is analyzed. Nodes whose performance suffers as a result of a sampling operation may be identified and removed from a subsequent operation.

Type: Grant

Filed: February 23, 2007

Date of Patent: January 12, 2010

Assignee: International Business Machines Corporation

Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda E. Peters, John Matthew Santosuosso
Routing data packets with hint bit for each six orthogonal directions in three dimensional torus computer system set to avoid nodes in problem list

Patent number: 7644254

Abstract: A method and apparatus for dynamically rerouting node processes on the compute nodes of a massively parallel computer system using hint bits to route around failed nodes or congested networks without restarting applications executing on the system. When a node has a failure or there are indications that it may fail, the application software on the system is suspended while the data on the failed node is moved to a backup node. The torus network traffic is routed around the failed node and traffic for the failed node is rerouted to the backup node. The application can then resume operation without restarting from the beginning.

Type: Grant

Filed: April 18, 2007

Date of Patent: January 5, 2010

Assignee: International Business Machines Corporation

Inventors: David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, Albert Sidelnik, Brian Edward Smith, Brent Allen Swartz
Process Migration Based on Service Availability in a Multi-Node Environment

Publication number: 20090320023

Abstract: A process on a highly distributed parallel computing system is disclosed. When a first compute node in a first pool is ready to hand-off a task to second pool for further processing, the first compute node may first determine whether a node is available in the second pool. If no node is available from the second pool, then the first compute node may begin performing a primary task assigned to the second pool of nodes, up to the point where a service available exclusively to the nodes of the second pool is required. In the interim, however, one of the nodes of the second pool may become available. Alternatively, an application program running on a compute node may be configured with an exception handling routine that catches exceptions and migrates the application to a compute node where a necessary service is available, as such exceptions occur.

Type: Application

Filed: June 24, 2008

Publication date: December 24, 2009

Inventors: Eric L. Barsness, David L. Darrington, Amanda E. Peters, John M. Santosuosso
Process Migration Based on Exception Handling in a Multi-Node Environment

Publication number: 20090319662

Abstract: A process on a highly distributed parallel computing system is disclosed. When a first compute node in a first pool is ready to hand-off a task to second pool for further processing, the first compute node may first determine whether a node is available in the second pool. If no node is available from the second pool, then the first compute node may begin performing a primary task assigned to the second pool of nodes, up to the point where a service available exclusively to the nodes of the second pool is required. In the interim, however, one of the nodes of the second pool may become available. Alternatively, an application program running on a compute node may be configured with an exception handling routine that catches exceptions and migrates the application to a compute node where a necessary service is available, as such exceptions occur.

Type: Application

Filed: June 24, 2008

Publication date: December 24, 2009

Inventors: Eric L. Barsness, David L. Darrington, Amanda E. Peters, John M. Santosuosso
Sharing Compiler Optimizations in a Multi-Node System

Publication number: 20090320003

Abstract: Embodiments of the invention enable application programs running across multiple compute nodes of a highly-parallel system to compile source code into native instructions, and subsequently share the optimizations used to compile the source code with other nodes. For example, determining what optimizations to use may consume significant processing power and memory on a node. In cases where multiple nodes exhibit similar characteristics, it is possible that these nodes may use the same set of optimizations when compiling similar pieces of code. Therefore, when one node compiles source code into native instructions, it may share the optimizations used with other similar nodes, thereby removing the burden for the other nodes to figure out which optimizations to use. Thus, while one node may suffer a performance hit for determining the necessary optimizations, other nodes may be saved from this burden by simply using the optimizations provided to them.

Type: Application

Filed: June 24, 2008

Publication date: December 24, 2009

Inventors: Eric L. Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
Sharing Compiler Optimizations in a Multi-Node System

Publication number: 20090320008

Abstract: Embodiments of the invention enable application programs running across multiple compute nodes of a highly-parallel system to compile source code into native instructions, and subsequently share the optimizations used to compile the source code with other nodes. For example, determining what optimizations to use may consume significant processing power and memory on a node. In cases where multiple nodes exhibit similar characteristics, it is possible that these nodes may use the same set of optimizations when compiling similar pieces of code. Therefore, when one node compiles source code into native instructions, it may share the optimizations used with other similar nodes, thereby removing the burden for the other nodes to figure out which optimizations to use. Thus, while one node may suffer a performance hit for determining the necessary optimizations, other nodes may be saved from this burden by simply using the optimizations provided to them.

Type: Application

Filed: June 24, 2008

Publication date: December 24, 2009

Inventors: Eric L Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
Message Flow Control in a Multi-Node Computer System

Publication number: 20090319621

Abstract: Embodiments of the invention provide for controlling message flow across a parallel computer system having multiple compute nodes by selectively grouping compute nodes of such a system into node pools and assigning message flow control policies to nodes in the node pools. The message flow control policies specify logging and/or tracing activities to be performed by instances of applications running on nodes assigned to the node pools. As the application is executed, logging and/or tracing messages are generated on the compute nodes according to message flow control policies assigned to the nodes. Optionally, the message flow is analyzed, the message flow control policies are adjusted, and duplicate messages are eliminated.

Type: Application

Filed: June 24, 2008

Publication date: December 24, 2009

Inventors: Eric L. Barsness, David L. Darrington, Amanda Peters, John M. Santosuosso
Executing An Application On A Parallel Computer

Publication number: 20090313636

Abstract: Methods, apparatus, and products are disclosed for executing an application on a parallel computer that include: executing, by a current compute node, a current task of the application, including producing results; determining, by the current compute node in dependence upon current network characteristics and application characteristics, whether to transfer the results to a next compute node for further processing by a next task on the next compute node or to execute the next task for further processing of the results on the current compute node; transferring, by the current compute node, the results to the next compute node for further processing by the next task on the next compute node if the determination specifies transferring the results to the next node; and executing, by the current compute node, the next task for further processing of the results if the determination specifies executing the next task on the current compute node.

Type: Application

Filed: June 16, 2008

Publication date: December 17, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eric L. Barsness, Michael B. Brutman, David L. Darrington, Amanda E. Peters, John M. Santosuosso
MANAGEMENT OF PERSISTENT MEMORY IN A MULTI-NODE COMPUTER SYSTEM

Publication number: 20090313452

Abstract: A method and apparatus creates and manages persistent memory (PM) in a multi-node computing system. A PM Manager in the service node creates and manages pools of nodes with various sizes of PM. A node manager uses the pools of nodes to load applications to the nodes according to the size of the available PM. The PM Manager can dynamically adjust the size of the PM according to the needs of the applications based on historical use or as determined by a system administrator. The PM Manager works with an operating system kernel on the nodes to provide persistent memory for application data and system metadata. The PM Manager uses the persistent memory to load applications to preserve data from one application to the next. Also, the data preserved in persistent memory may be system metadata such as file system data that will be available to subsequent applications.

Type: Application

Filed: October 29, 2007

Publication date: December 17, 2009

Inventors: Eric Lawrence Barsness, David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, John Matthew Santosuosso
Database Journaling in a Multi-Node Environment

Publication number: 20090307290

Abstract: A database spread over multiple nodes allows each node to store a journal recording changes made to the database and also allows a journaling component to manage the memory space available for journaling. Two threshold size values may be specified for the journal. The first threshold value specifies a journal size at which to being pruning the journal on a given node. A journal pruning algorithm may be used to identify journal entries that may be removed. For example, once a given transaction completes (i.e., commits) the journal entries related to that transaction may be pruned from the journal. The second threshold value specifies the maximum size of the journal. After reaching this size, journal entries may be written to disk instead of the in-memory journal.

Type: Application

Filed: June 10, 2008

Publication date: December 10, 2009

Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
Resource Sharing Techniques in a Parallel Processing Computing System

Publication number: 20090307466

Abstract: A method, apparatus, and program product share a resource in a computing system that includes a plurality of computing cores. A request from a second execution context (“EC”) to lock the resource currently locked by a first EC on a first core causes replication of the second EC as a third EC on a third core. The first and third ECs are executed substantially concurrently. When the first EC modifies the resource, the third EC is restarted after the resource has been modified. Alternately, a first EC is configured in a first core and shadowed as a second EC in a second core. In response to a blocked lock request, the first EC is halted and the second EC continues. After granting a lock, it is determined whether a conflict has occurred and the first and second EC are particularly synchronized to each other in response to that determination.

Type: Application

Filed: June 10, 2008

Publication date: December 10, 2009

Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
Database Journaling in a Multi-Node Environment

Publication number: 20090307287

Abstract: A database spread over multiple nodes allows each node to store a journal recording changes made to the database and also allows a journaling component to manage the memory space available for journaling. Two threshold size values may be specified for the journal. The first threshold value specifies a journal size at which to being pruning the journal on a given node. A journal pruning algorithm may be used to identify journal entries that may be removed. For example, once a given transaction completes (i.e., commits) the journal entries related to that transaction may be pruned from the journal. The second threshold value specifies the maximum size of the journal. After reaching this size, journal entries may be written to disk instead of the in-memory journal.

Type: Application

Filed: June 10, 2008

Publication date: December 10, 2009

Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
Fault recovery on a massively parallel computer system to handle node failures without ending an executing job

Patent number: 7631169

Abstract: A method and apparatus for fault recovery of on a parallel computer system from a soft failure without ending an executing job on a partition of nodes. In preferred embodiments a failed hardware recovery mechanism on a service node uses a heartbeat monitor to determine when a node failure occurs. Where possible, the failed node is reset and re-loaded with software without ending the software job being executed by the partition containing the failed node.

Type: Grant

Filed: February 2, 2007

Date of Patent: December 8, 2009

Assignee: International Business Machines Corporation

Inventors: David L. Darrington, Patrick Joseph McCarthy, Amanda Peters, Albert Sidelnik
UTILIZING VIRTUAL PRIVATE NETWORKS TO PROVIDE OBJECT LEVEL SECURITY ON A MULTI-NODE COMPUTER SYSTEM

Publication number: 20090300752

Abstract: The disclosure herein provides data security on a parallel computer system using virtual private networks connecting the nodes of the system. A mechanism sets up access control data in the nodes that describes a number of security classes. Each security class is associated with a virtual network. Each user on the system is associated with one of the security classes. Each database object to be protected is given an attribute of a security class. Database objects are loaded into the system nodes that match the security class of the database object. When a query executes on the system, the query is sent to a particular class or set of classes such that the query is only seen by those nodes that are authorized by the equivalent security class. In this way, the network is used to isolate data from users that do not have proper authorization to access the data.

Type: Application

Filed: May 27, 2008

Publication date: December 3, 2009

Inventors: Eric Lawrence Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso

prev … 2 3 4 5 6 7 8 next