Patents by Inventor Michael Blocksome

Michael Blocksome has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20110296137
    Abstract: A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110296139
    Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110289177
    Abstract: Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer; depositing, by the root node in a memory region designated for storing broadcast data length, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; and triggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node.
    Type: Application
    Filed: May 19, 2010
    Publication date: November 24, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110288848
    Abstract: Embodiments of the invention provide a method of calculating performance counter data for a computer simulator, while minimizing the performance costs associated with cycle-accurate simulation. A callback may be associated with the instructions of a user program and, when the instructions are executed, the associated callbacks may be executed as well. Upon execution, the callbacks may calculate performance counter data related to the associated instruction.
    Type: Application
    Filed: May 21, 2010
    Publication date: November 24, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. ARCHER, Michael BLOCKSOME, Joseph D. RATTERMAN, Brian E. SMITH
  • Publication number: 20110271263
    Abstract: Compiling software for a hierarchical distributed processing system including providing to one or more compiling nodes software to be compiled, wherein at least a portion of the software to be compiled is to be executed by one or more other nodes; compiling, by the compiling node, the software; maintaining, by the compiling node, any compiled software to be executed on the compiling node; selecting, by the compiling node, one or more nodes in a next tier of the hierarchy of the distributed processing system in dependence upon whether any compiled software is for the selected node or the selected node's descendants; sending to the selected node only the compiled software to be executed by the selected node or selected node's descendant.
    Type: Application
    Filed: April 29, 2010
    Publication date: November 3, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110271006
    Abstract: Systems, methods and articles of manufacture are disclosed for effecting a desired collective operation on a parallel computing system that includes multiple compute nodes. The compute nodes may pipeline multiple collective operations to effect the desired collective operation. To select protocols suitable for the multiple collective operations, the compute nodes may also perform additional collective operations. The compute nodes may pipeline the multiple collective operations and/or the additional collective operations to effect the desired collective operation more efficiently.
    Type: Application
    Filed: April 29, 2010
    Publication date: November 3, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael Blocksome, Bob R. Cernohous, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110270942
    Abstract: Systems, methods and articles of manufacture are disclosed for performing a collective operation on a parallel computing system that includes multiple compute nodes and multiple networks connecting the compute nodes. Each of the networks may have different characteristics. A source node may broadcast a DMA descriptor over a first network to a target node, to initialize the collective operation. The target node may perform the collective operation over a second network and using the broadcast DMA descriptor.
    Type: Application
    Filed: April 28, 2010
    Publication date: November 3, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: CHARLES J. ARCHER, MICHAEL BLOCKSOME, JOSEPH D. RATTERMAN, BRIAN E. SMITH
  • Publication number: 20110258245
    Abstract: A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
    Type: Application
    Filed: April 14, 2010
    Publication date: October 20, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael A. Blocksome, Daniel A. Faraj
  • Publication number: 20110258281
    Abstract: Embodiments of the invention provide a method for querying performance counter data on a massively parallel computing system, while minimizing the costs associated with interrupting computer processors and limited memory resources. DMA descriptors may be inserted into an injection FIFO of a remote compute node in the massively parallel computing system. Upon executing the DMA operations described by the DMA descriptors, performance counter data may be transferred from the remote compute node to a destination node.
    Type: Application
    Filed: April 15, 2010
    Publication date: October 20, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8041969
    Abstract: Methods, apparatus, and products are disclosed for reducing power consumption while performing collective operations on a plurality of compute nodes that include: receiving, by each compute node, instructions to perform a type of collective operation; selecting, by each compute node from a plurality of collective operations for the collective operation type, a particular collective operation in dependence upon power consumption characteristics for each of the plurality of collective operations; and executing, by each compute node, the selected collective operation.
    Type: Grant
    Filed: May 27, 2008
    Date of Patent: October 18, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Amanda E. Peters, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8037213
    Abstract: Methods, apparatus, and products are disclosed for replenishing data descriptors in a Direct Memory Access (‘DMA’) injection first-in-first-out (‘FIFO’) buffer that include: determining, by a messaging module on an origin compute node, whether a number of data descriptors in a DMA injection FIFO buffer exceeds a predetermined threshold, each data descriptor specifying an application message for transmission to a target compute node; queuing, by the messaging module, a plurality of new data descriptors in a pending descriptor queue if the number of the data descriptors in the DMA injection FIFO buffer exceeds the predetermined threshold; establishing, by the messaging module, interrupt criteria that specify when to replenish the injection FIFO buffer with the plurality of new data descriptors in the pending descriptor queue; and injecting, by the messaging module, the plurality of new data descriptors into the injection FIFO buffer in dependence upon the interrupt criteria.
    Type: Grant
    Filed: May 30, 2007
    Date of Patent: October 11, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Bob R. Cernohous, Philip Heidelberger, Sameer Kumar, Jeffrey J. Parker
  • Patent number: 8032899
    Abstract: Methods, apparatus, and products are disclosed for providing policy-based operating system services in a hypervisor on a computing system. The computing system includes at least one compute node. The compute node includes an operating system and a hypervisor. The operating system includes a kernel. The hypervisor comprising a kernel proxy and a plurality of operating system services of a service type. Providing policy-based operating system services in a hypervisor on a computing system includes establishing, on the compute node, a kernel policy specifying one of the operating system services of the service type for use by the kernel proxy, and accessing, by the kernel proxy, the specified operating system service. The computing system may also be implemented as a distributed computing system that includes one or more operating system service nodes. One or more of the operating system services may be distributed among the operating system service nodes.
    Type: Grant
    Filed: October 26, 2006
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Albert Sidelnik, Brian E. Smith
  • Patent number: 8032892
    Abstract: A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Michael Blocksome, Dong Chen, Mark E. Giampapa, Philip Heidelberger, Sameer Kumar, Jeffrey J. Parker
  • Publication number: 20110239003
    Abstract: Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer.
    Type: Application
    Filed: March 29, 2010
    Publication date: September 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Gary R. Ricard, Brian E. Smith
  • Publication number: 20110238949
    Abstract: Distributed administration of a lock for an operational group of compute nodes in a hierarchical tree structured network including assigning the root node of the operational group to send acknowledgments for lock requests, the root lock administration module comprising a module of automated computing machinery; receiving a lock request assigned to a particular node from a child node; determining whether another request from another child is directly ahead in an acknowledgement queue; if a request from another child is directly ahead in the acknowledgement queue, putting the lock request for the particular node in the acknowledgement queue until the lock request directly ahead in the acknowledgement queue is satisfied and when the lock request ahead in the queue is satisfied, sending the particular node for whom the lock request is assigned a message acknowledging the particular node has the lock; and if a request from another child is not directly ahead in a queue, sending to the particular node for whom the
    Type: Application
    Filed: March 29, 2010
    Publication date: September 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110238950
    Abstract: Performing a scattery operation on a hierarchical tree network optimized for collective operations including receiving, by the scattery module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scattery module installed on the node, the portion of the data for the node; determining, by the scattery module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; and sending, by the scattery module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.
    Type: Application
    Filed: March 29, 2010
    Publication date: September 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8018951
    Abstract: Methods, systems, and products are disclosed for pacing a data transfer between compute nodes on a parallel computer that include: transferring, by an origin compute node, a chunk of an application message to a target compute node; sending, by the origin compute node, a pacing request to a target direct memory access (‘DMA’) engine on the target compute node using a remote get DMA operation; determining, by the origin compute node, whether a pacing response to the pacing request has been received from the target DMA engine; and transferring, by the origin compute node, a next chunk of the application message if the pacing response to the pacing request has been received from the target DMA engine.
    Type: Grant
    Filed: July 12, 2007
    Date of Patent: September 13, 2011
    Assignee: International Business Machines Corporation
    Inventor: Michael A. Blocksome
  • Publication number: 20110219208
    Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC).
    Type: Application
    Filed: January 10, 2011
    Publication date: September 8, 2011
    Applicant: International Business Machines Corporation
    Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Brian Smith, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
  • Publication number: 20110173287
    Abstract: Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.
    Type: Application
    Filed: September 30, 2008
    Publication date: July 14, 2011
    Inventors: Michael A. Blocksome, Dong Chen, Thomas Gooding, Philip Heidelberger, Jeff Parker
  • Patent number: 7966618
    Abstract: Methods, apparatus, and products are disclosed for controlling data transfers from an origin compute node to a target compute node that include: receiving, by an application messaging module on the target compute node, an indication of a data transfer from an origin compute node to the target compute node; and administering, by the application messaging module on the target compute node, the data transfer using one or more messaging primitives of a system messaging module in dependence upon the indication.
    Type: Grant
    Filed: May 29, 2007
    Date of Patent: June 21, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith