Patents by Inventor Martin Ohmacht

Martin Ohmacht has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120210073
    Abstract: An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.
    Type: Application
    Filed: February 11, 2011
    Publication date: August 16, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Alan Gara, Martin Ohmacht, Vijayalakshmi Srinivasan
  • Publication number: 20120185672
    Abstract: Performing a series of successive synchronizing operations by a core on data shared by a plurality of cores may include a first core indicating an upcoming synchronizing operation on shared data. A second memory layer stores the shared data and tracks the first core's ownership of the shared data. The second memory layer is shared via coherency operations among the first core and one or more second cores. The first core may perform one or more synchronization operations on the shared data without requiring interaction from the second memory layer.
    Type: Application
    Filed: January 18, 2011
    Publication date: July 19, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alan Gara, Martin Ohmacht, Burkhard Steinmacher-Burow, Robert W. Wisniewski
  • Patent number: 8161248
    Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
    Type: Grant
    Filed: November 24, 2010
    Date of Patent: April 17, 2012
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Phillip Heidelberger, Dirk Hoenicke, Martin Ohmacht
  • Patent number: 8140925
    Abstract: An apparatus and method for evaluating a state of an electronic or integrated circuit (IC), each IC including one or more processor elements for controlling operations of IC sub-units, and each the IC supporting multiple frequency clock domains. The method comprises: generating a synchronized set of enable signals in correspondence with one or more IC sub-units for starting operation of one or more IC sub-units according to a determined timing configuration; counting, in response to one signal of the synchronized set of enable signals, a number of main processor IC clock cycles; and, upon attaining a desired clock cycle number, generating a stop signal for each unique frequency clock domain to synchronously stop a functional clock for each respective frequency clock domain; and, upon synchronously stopping all on-chip functional clocks on all frequency clock domains in a deterministic fashion, scanning out data values at a desired IC chip state.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ralph E. Bellofatto, Matthew R. Ellavsky, Alan G. Gara, Mark E. Giampapa, Thomas M. Gooding, Rudolf A. Haring, Lance G. Hehenberger, Martin Ohmacht
  • Patent number: 8122197
    Abstract: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: February 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Paul W. Coteus, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk Hoenicke, Martin Ohmacht
  • Patent number: 8108738
    Abstract: An apparatus and method for providing a data eye monitor. The data eye monitor apparatus utilizes an inverter/latch string circuit and a set of latches to save the data eye for providing an infinite persistent data eye. In operation, incoming read data signals are adjusted in the first stage individually and latched to provide the read data to the requesting unit. The data is also simultaneously fed into a balanced XOR tree to combine the transitions of all incoming read data signals into a single signal. This signal is passed along a delay chain and tapped at constant intervals. The tap points are fed into latches, capturing the transitions at a delay element interval resolution. Using XORs, differences between adjacent taps and therefore transitions are detected. The eye is defined by segments that show no transitions over a series of samples. The eye size and position can be used to readjust the delay of incoming signals and/or to control environment parameters like voltage, clock speed and temperature.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: January 31, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alan G. Gara, James A. Marcella, Martin Ohmacht
  • Patent number: 8103836
    Abstract: A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit.
    Type: Grant
    Filed: May 23, 2008
    Date of Patent: January 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Alan G. Gara, Mark E. Giampapa, Philip Heidelberger, Dirk I. Hoenicke, Martin Ohmacht, Valentina Salapura, Pavlos M. Vranas
  • Patent number: 8103832
    Abstract: Method and apparatus of prefetching streams of varying prefetch depth dynamically changes the depth of prefetching so that the number of multiple streams as well as the hit rate of a single stream are optimized. The method and apparatus in one aspect monitor a plurality of load requests from a processing unit for data in a prefetch buffer, determine an access pattern associated with the plurality of load requests and adjust a prefetch depth according to the access pattern.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: January 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alan Gara, Martin Ohmacht, Valentina Salapura, Krishnan Sugavanam, Dirk Hoenicke
  • Patent number: 8103910
    Abstract: A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: January 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Alan Gara, Mark E. Giampapa, Philip Heidelberger, Martin Ohmacht, Burkhard Steinmacher-Burow, Krishnan Sugavanam
  • Publication number: 20110219191
    Abstract: In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
    Type: Application
    Filed: January 18, 2011
    Publication date: September 8, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Daniel Ahn, Luis H. Ceze, Alan Gara, Martin Ohmacht, Zhuang Xiaotong
  • Publication number: 20110219187
    Abstract: In a multiprocessor system, with conflict checking implemented in a directory lookup of a shared cache memory, a reader set encoding permits dynamic recordation of read accesses. The reader set encoding includes an indication of a portion of a line read, for instance by indicating boundaries of read accesses. Different encodings may apply to different types of speculative execution.
    Type: Application
    Filed: January 18, 2011
    Publication date: September 8, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alan Gara, Martin Ohmacht
  • Publication number: 20110219381
    Abstract: A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
    Type: Application
    Filed: January 18, 2011
    Publication date: September 8, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Daniel Ahn, Luis H. Ceze, Dong Chen, Alan Gara, Philip Heidelberger, Martin Ohmacht
  • Publication number: 20110219208
    Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC).
    Type: Application
    Filed: January 10, 2011
    Publication date: September 8, 2011
    Applicant: International Business Machines Corporation
    Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Brian Smith, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
  • Publication number: 20110208894
    Abstract: A multiprocessor system includes nodes. Each node includes a data path that includes a core, a TLB, and a first level cache implementing disambiguation. The system also includes at least one second level cache and a main memory. For thread memory access requests, the core uses an address associated with an instruction format of the core. The first level cache uses an address format related to the size of the main memory plus an offset corresponding to hardware thread meta data. The second level cache uses a physical main memory address plus software thread meta data to store the memory access request. The second level cache accesses the main memory using the physical address with neither the offset nor the thread meta data after resolving speculation. In short, this system includes mapping of a virtual address to a different physical addresses for value disambiguation for different threads.
    Type: Application
    Filed: January 4, 2011
    Publication date: August 25, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alan Gala, Martin Ohmacht
  • Publication number: 20110202731
    Abstract: In a cache memory, energy and other efficiencies can be realized by saving a result of a cache directory lookup for sequential accesses to a same memory address. Where the cache is a point of coherence for speculative execution in a multiprocessor system, with directory lookups serving as the point of conflict detection, such saving becomes particularly advantageous.
    Type: Application
    Filed: January 4, 2011
    Publication date: August 18, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Martin Ohmacht
  • Patent number: 8001401
    Abstract: An apparatus and method for controlling power usage in a computer includes a plurality of computers communicating with a local control device, and a power source supplying power to the local control device and the computer. A plurality of sensors communicate with the computer for ascertaining power usage of the computer, and a system control device communicates with the computer for controlling power usage of the computer.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: August 16, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ralph E. Bellofatto, Paul W. Coteus, Paul G. Crumley, Alan G. Gara, Mark E. Giampapa, Thomas M. Gooding, Rudolf A. Haring, Mark G. Megerian, Martin Ohmacht, Don D. Reed, Richard A. Swetz, Todd Takken
  • Publication number: 20110179229
    Abstract: A system, method and computer program product for performing various store-operate instructions in a parallel computing environment that includes a plurality of processors and at least one cache memory device. A queue in the system receives, from a processor, a store-operate instruction that specifies under which condition a cache coherence operation is to be invoked. A hardware unit in the system runs the received store-operate instruction. The hardware unit evaluates whether a result of the running the received store-operate instruction satisfies the condition. The hardware unit invokes a cache coherence operation on a cache memory address associated with the received store-operate instruction if the result satisfies the condition. Otherwise, the hardware unit does not invoke the cache coherence operation on the cache memory device.
    Type: Application
    Filed: January 7, 2011
    Publication date: July 21, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dong Chen, Philip Heidelberger, Sameer Kumar, Martin Ohmacht, Burkhard Steinmacher-Burow
  • Publication number: 20110173357
    Abstract: A system and method and computer program product for reducing the latency of signals communicated through a crossbar switch, the method including using at slave arbitration logic devices associated with Slave devices for which access is requested from one or more Master devices, two or more priority vector signals cycled among their use every clock cycle for selecting one of the requesting Master devices and updates the respective priority vector signal used every clock cycle. Similarly, each Master for which access is requested from one or more Slave devices, can have two or more priority vectors and can cycle among their use every clock cycle to further reduce latency and increase throughput performance via the crossbar.
    Type: Application
    Filed: January 8, 2010
    Publication date: July 14, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Martin Ohmacht, Krishnan Sugavanam
  • Publication number: 20110173394
    Abstract: A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
    Type: Application
    Filed: January 7, 2011
    Publication date: July 14, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alan Gara, Martin Ohmacht
  • Publication number: 20110173488
    Abstract: A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, the non-volatile memory is a pluggable flash memory card.
    Type: Application
    Filed: January 10, 2011
    Publication date: July 14, 2011
    Applicant: International Business Machines Corporation
    Inventors: Matthias A. Blumrich, Dong Chen, Thomas M. Cipolla, Paul W. Coteus, Alan Gara, Philip Heidelberger, Mark J. Jeanson, Gerard V. Kopcsay, Martin Ohmacht, Todd E. Takken