For A Parallel Or Multiprocessor System Patents (Class 717/149)

Loop compiling (Class 717/150)

Thread debugging device, thread debugging method and information storage medium

Patent number: 8136097

Abstract: A thread debugging device which can provide reliable debugging is provided when at least one thread is debugged among a plurality of threads which are executed in association with each other. According to the thread debugging device, a target computer (20) executes at least some processing of at least one target thread to be debugged among the plurality of threads, and further executes non-target threads, which are threads other than the at least one target thread among the plurality of threads, during execution of the at least one target thread while restricting access by the non-target threads to at least some hardware resources of the computer (20).

Type: Grant

Filed: October 25, 2006

Date of Patent: March 13, 2012

Assignee: Sony Computer Entertainment Inc.

Inventors: Yousuke Konishi, Shinichiro Mikami, Makoto Ishii, Yasuyuki Kinoshita, Atsuhiko Fujimoto, Masayuki Takahashi
Systems and methods for determining compute kernels for an application in a parallel-processing computer system

Patent number: 8136104

Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.

Type: Grant

Filed: March 5, 2007

Date of Patent: March 13, 2012

Assignee: Google Inc.

Inventors: Matthew N. Papakipos, Brian K. Grant, Morgan S. McGuire, Christopher G. Demetriou
Software pipelining using one or more vector registers

Patent number: 8136107

Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.

Type: Grant

Filed: October 24, 2007

Date of Patent: March 13, 2012

Assignee: International Business Machines Corporation

Inventor: Ayal Zaks
Method to exploit superword-level parallelism using semi-isomorphic packing

Patent number: 8136105

Abstract: A computer program product is provided for extracting SIMD parallelism. The computer program product includes instructions for providing a stream of input code comprising basic blocks; identifying pairs of statements that are semi-isomorphic with respect to each other within a basic block; iteratively combining into packs, pairs of statements that are semi-isomorphic with respect to each other, and combining packs into combined packs; collecting packs whose statements can be scheduled together for processing; and generating SIMD instructions for each pack to provide for extracting the SIMD parallelism..

Type: Grant

Filed: September 29, 2006

Date of Patent: March 13, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
Enabling graphical notation for parallel programming

Patent number: 8127283

Abstract: In one embodiment, the present invention includes a method for developing of a parallel program by specifying graphical representations for input data objects into a parallel computation code segment, specifying graphical representations for parallel program schemes, each including at least one graphical representation of an operator to perform an operation on an data object, determining if any of the parallel program schemes include at least one alternative computation, and unrolling the corresponding parallel program schemes and generating alternative parallel program scheme fragments therefrom. Other embodiments are described and claimed.

Type: Grant

Filed: September 5, 2007

Date of Patent: February 28, 2012

Assignee: Intel Corporation

Inventors: Yuriy E. Sheynin, Alexey Y. Syschikov
Sharing compiler optimizations in a multi-node system

Patent number: 8122441

Abstract: Embodiments of the invention enable application programs running across multiple compute nodes of a highly-parallel system to compile source code into native instructions, and subsequently share the optimizations used to compile the source code with other nodes. For example, determining what optimizations to use may consume significant processing power and memory on a node. In cases where multiple nodes exhibit similar characteristics, it is possible that these nodes may use the same set of optimizations when compiling similar pieces of code. Therefore, when one node compiles source code into native instructions, it may share the optimizations used with other similar nodes, thereby removing the burden for the other nodes to figure out which optimizations to use. Thus, while one node may suffer a performance hit for determining the necessary optimizations, other nodes may be saved from this burden by simply using the optimizations provided to them.

Type: Grant

Filed: June 24, 2008

Date of Patent: February 21, 2012

Assignee: International Business Machines Corporation

Inventors: Eric L. Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
Method and system for array optimization

Patent number: 8122442

Abstract: A method for transforming access to a structure array, that includes compiling source code, wherein compiling the source code includes identifying the structure array in the source code, performing an object safety analysis to determine whether the structure array is safe for transformation, wherein the object safety analysis includes an inter-procedural alias class analysis, performing a profitability analysis on the structure array when the structure array is safe for transformation, wherein the profitability analysis includes selecting a transformation from a plurality of transformations, wherein the plurality of transformations includes a pointer based fully splitting transformation, a pointer based partially splitting transformation, and an address based fully splitting transformation, and performing the selected transformation on the structure array, and storing the compiled code.

Type: Grant

Filed: January 31, 2008

Date of Patent: February 21, 2012

Assignee: Oracle America, Inc.

Inventor: Jin Lin
Pairing of spills for parallel registers

Patent number: 8108847

Abstract: A system can include an analyzer module configured to analyze spill code generated by a register allocator to determine that register spill instructions can be paired, wherein paired register spill instructions relate to corresponding register locations in each of a first register set and a second register set and that no instructions between said register spill instructions modify any of said register spill instructions; a rewriter module configured to, based on the determining, modify said register spill instructions as a parallel register spill instruction; and a storage module configured to configure storage of associated register spills in memory so said register spills can be loaded back in parallel into corresponding registers of said first and second register sets based on said modified parallel register spill instruction, wherein the configuration of storage includes allocation of space on a memory stack such that the register spills are double word aligned.

Type: Grant

Filed: June 4, 2008

Date of Patent: January 31, 2012

Assignee: International Business Machines Corporation

Inventor: Christopher Lapkowski
Parallel programming computing system to dynamically allocate program portions

Patent number: 8108845

Abstract: A computing system receives a program created by a technical computing environment, analyzes the program, generates multiple program portions based on the analysis of the program, dynamically allocates the multiple program portions to multiple software units of execution for parallel programming, receives multiple results associated with the multiple program portions from the multiple software units of execution, and provides the multiple results or a single result to the program.

Type: Grant

Filed: May 15, 2007

Date of Patent: January 31, 2012

Assignee: The Mathworks, Inc.

Inventors: John N. Little, Joseph F. Hicklin, Jocelyn Luke Martin, Nausheen B. Moulana, Halldor N. Stefansson, Loren Dean, Roy E. Lurie, Stephen C. Johnson, Penelope L. Anderson, Michael E. Karr, Jason A. Kinchen
Compiling scalar code for a single instruction multiple data (SIMD) execution engine

Patent number: 8108846

Abstract: A mechanism is provided for performing scalar operations using a SIMD data parallel execution unit. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined.

Type: Grant

Filed: May 28, 2008

Date of Patent: January 31, 2012

Assignee: International Business Machines Corporation

Inventor: Michael K. Gschwind
System and method for adaptive run-time reconfiguration for a reconfigurable instruction set co-processor architecture

Patent number: 8108838

Abstract: A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.

Type: Grant

Filed: May 15, 2008

Date of Patent: January 31, 2012

Assignee: International Business Machines Corporation

Inventors: Sameh W. Asaad, Richard Gerard Hofmann
Systems and methods for dynamically choosing a processing element for a compute kernel

Patent number: 8108844

Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.

Type: Grant

Filed: March 5, 2007

Date of Patent: January 31, 2012

Assignee: Google Inc.

Inventors: William Y. Crutchfield, Brian K. Grant, Matthew N. Papakipos
METHOD AND A COMPUTER PROGRAM PRODUCT FOR AUTOMATICALLY GENERATING A USER INTERFACE FOR AN APPLICATION PROGRAM

Publication number: 20120023479

Abstract: A method and a computer program product include the steps of receiving, into a computing machine, a business logic (BL) source code for an application program, the BL source code comprising at least a class, an object and a method or function. Steps transform, in the computing machine, the BL source code into an executable BL. Steps acquire schema information from the BL source code. Steps generate, in the computing machine, a user interface (UI) source code for a form using the schema information for the class. Steps transform, in the computing machine, the UI source code into at least one executable UI.

Type: Application

Filed: July 21, 2010

Publication date: January 26, 2012

Inventor: Gregory S. Moress
Mechanism to restrict parallelization of loops

Patent number: 8104030

Abstract: A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.

Type: Grant

Filed: December 21, 2005

Date of Patent: January 24, 2012

Assignee: International Business Machines Corporation

Inventors: Raul Esteban Silvera, Priya Unnikrishnan, Guansong Zhang
Method for JIT compiler to optimize repetitive synchronization

Patent number: 8104028

Abstract: Repetitive synchronization in program code is optimized through lock coarsening that is performed subject to a number of constraints. Using a forward pass over the program code followed by a backward pass, region extent bits may be determined that identify the points in the program where object locking can be coarsened. The program code may then be modified to realize coarsened locking regions determined based on the region extent bits. Alternatively, previously determined value numbers may provide much of the information collected by the two passes. In such a case, a single pass over the program code may locate features that limit lock coarsening opportunities. A set of synchronization operations that can be removed may then be determined and used when modifying the program code to coarsen locking regions.

Type: Grant

Filed: March 31, 2009

Date of Patent: January 24, 2012

Assignee: International Business Machines Corporation

Inventors: Mark Graham Stoodley, Vijay Sundaresan
INDEXABLE TYPE TRANSFORMATIONS

Publication number: 20120005662

Abstract: A high level programming language provides an extensible set of transformations for use on indexable types in a data parallel processing environment. A compiler for the language implements each transformation as a map from indexable types to allow each transformation to be applied to other transformations. At compile time, the compiler identifies sequences of the transformations on each indexable type in data parallel source code and generates data parallel executable code to implement the sequences as a combined operation at runtime using the transformation maps. The compiler also incorporates optimizations that are based on the sequences of transformations into the data parallel executable code.

Type: Application

Filed: June 30, 2010

Publication date: January 5, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Paul F. Ringseth, Weirong Zhu, Rick Molloy, Charles D. Callahan, II, Yosseff Levanoni, Lingli Zhang
Dynamically partitioning processing across a plurality of heterogeneous processors

Patent number: 8091078

Abstract: A program is into at least two object files: one object file for each of the supported processor environments. During compilation, code characteristics, such as data locality, computational intensity, and data parallelism, are analyzed and recorded in the object file. During run time, the code characteristics are combined with runtime considerations, such as the current load on the processors and the size of the data being processed, to arrive at an overall value. The overall value is then used to determine which of the processors will be assigned the task. The values are assigned based on the characteristics of the various processors. For example, if one processor is better at handling intensive computations against large streams of data, programs that are highly computationally intensive and process large quantities of data are weighted in favor of that processor. The corresponding object is then loaded and executed on the assigned processor.

Type: Grant

Filed: May 7, 2008

Date of Patent: January 3, 2012

Assignee: International Business Machines Corporation

Inventors: Daniel Alan Brokenshire, Harm Peter Hofstee, Barry L Minor, Mark Richard Nutter
BINDING DATA PARALLEL DEVICE SOURCE CODE

Publication number: 20110314458

Abstract: A compile environment is provided in a computer system that allows programmers to program both CPUs and data parallel devices (e.g., GPUs) using a high level general purpose programming language that has data parallel (DP) extensions. A compilation process translates modular DP code written in the general purpose language into DP device source code in a high level DP device programming language using a set of binding descriptors for the DP device source code. A binder generates a single, self-contained DP device source code unit from the set of binding descriptors. A DP device compiler generates a DP device executable for execution on one or more data parallel devices from the DP device source code unit.

Type: Application

Filed: June 22, 2010

Publication date: December 22, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Weirong Zhu, Lingli Zhang, Sukhdeep S. Sodhi, Yosseff Levanoni
Task dispatch monitoring for dynamic adaptation to system conditions

Patent number: 8082545

Abstract: Monitoring system wide task dispatch allows dynamic adaptation to conditions of a system. A monitor tracks the total tasks currently dispatched to the system. In a system with multiple processing units, this monitor is centralized and collects information about tasks dispatched to each of the processing units. The monitor compares the total dispatched tasks against a threshold that has already been defined. Further dispatching of tasks to the system is regulated based on comparison of the total dispatched tasks against the threshold. If the comparison achieves a trigger condition (e.g., total dispatched tasks exceeding the threshold), then task dispatch is throttled. Throttling further task dispatching, as long as the threshold is exceeded, allows progress to continue without overwhelming the system.

Type: Grant

Filed: September 9, 2005

Date of Patent: December 20, 2011

Assignee: Oracle America, Inc.

Inventor: Raj Prakash
Computer program, multiprocessor system, and grouping method

Patent number: 8074211

Abstract: According to one embodiment, a grouping method for process units, each including basic modules and data, the process units being assigned to processors in a program for a multiprocessor system, the program including the basic modules and a parallel statement describing relationships between parallel processes for the basic modules, the method includes displaying a dataflow graph visually showing a process status of each process unit based on the parallel statement, and specifying a candidate for a connection of process units on the dataflow graph, wherein the dataflow graph displays data entries, nodes in the basic modules, and edges connecting the data entries and the nodes.

Type: Grant

Filed: September 30, 2009

Date of Patent: December 6, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventor: Ryuji Sakai
Method for message processing on a programmable logic device

Patent number: 8065130

Abstract: Programmable architecture for implementing a message processing system using an integrated circuit is described. In an example, configurable logic of an integrated circuit is configured to have a plurality of thread circuits and a memory. Messages are received to the integrated circuit for storage in the memory. The memory is accessed with the plurality of threads to concurrently process a plurality of the messages.

Type: Grant

Filed: May 13, 2009

Date of Patent: November 22, 2011

Assignee: Xilinx, Inc.

Inventors: Gordon J. Brebner, Philip B. James-Roxby, Eric R. Keller, Chidamber R. Kulkarni
Compiling Software For A Hierarchical Distributed Processing System

Publication number: 20110271263

Abstract: Compiling software for a hierarchical distributed processing system including providing to one or more compiling nodes software to be compiled, wherein at least a portion of the software to be compiled is to be executed by one or more other nodes; compiling, by the compiling node, the software; maintaining, by the compiling node, any compiled software to be executed on the compiling node; selecting, by the compiling node, one or more nodes in a next tier of the hierarchy of the distributed processing system in dependence upon whether any compiled software is for the selected node or the selected node's descendants; sending to the selected node only the compiled software to be executed by the selected node or selected node's descendant.

Type: Application

Filed: April 29, 2010

Publication date: November 3, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
METHOD FOR THE TRANSLATION OF PROGRAMS FOR RECONFIGURABLE ARCHITECTURES

Publication number: 20110271264

Abstract: Data processing using multidimensional fields is described along with methods for advantageously using high-level language codes.

Type: Application

Filed: July 7, 2011

Publication date: November 3, 2011

Inventors: Martin VORBACH, Frank May, Armin Nückel
Global compiler for controlling heterogeneous multiprocessor

Patent number: 8051412

Abstract: Performance of a heterogeneous multiprocessor is reduced as much as possible within a short time without any awareness of parallelization matched with a configuration of the heterogeneous multiprocessor. In a heterogeneous multiprocessor system, tasks having parallelism are automatically extracted by a compiler, a portion to be efficiently processed by a dedicated processor is extracted from an input program being a processing target, and processing time is estimated. Thus, by arranging the tasks according to PU characteristics, scheduling for efficiently operating a plurality of PU's in parallel is carried out.

Type: Grant

Filed: March 12, 2007

Date of Patent: November 1, 2011

Assignee: Waseda University

Inventors: Hironori Kasahara, Keiji Kimura, Hiroaki Shikano
Single Thread Performance in an In-Order Multi-Threaded Processor

Publication number: 20110265068

Abstract: A mechanism is provided for improving single-thread performance for a multi-threaded, in-order processor core. In a first phase, a compiler analyzes application code to identify instructions that can be executed in parallel with focus on instruction-level parallelism and removing any register interference between the threads. The compiler inserts as appropriate synchronization instructions supported by the apparatus to ensure that the resulting execution of the threads is equivalent to the execution of the application code in a single thread. In a second phase, an operating system schedules the threads produced in the first phase on the hardware threads of a single processor core such that they execute simultaneously. In a third phase, the microprocessor core executes the threads specified by the second phase such that there is one hardware thread executing an application thread.

Type: Application

Filed: April 27, 2010

Publication date: October 27, 2011

Applicant: International Business Machines Corporation

Inventors: Elmootazbellah N. Elnozahy, Ahmed Gheith
Disco: a simplified distributed computing library

Patent number: 8046750

Abstract: Core commands and aggregations of such commands are provided to programmers to enable them to generate programs that can be parallel-processed without requiring the programmer to be aware of parallel-processing techniques. The core commands and aggregations abstract mechanisms that can be executed in parallel, enabling the programmer to focus on higher-level concepts. The core commands provided include commands for applying a function in parallel and distributing and joining data in parallel. The output of each core command can implement an interface that can enable underlying mechanisms to stitch together multiple core commands in a cohesive manner to perform more complex actions.

Type: Grant

Filed: June 13, 2007

Date of Patent: October 25, 2011

Assignee: Microsoft Corporation

Inventors: William D Ramsey, Ronnie I Chaiken
IDENTIFICATION AND TRANSLATION OF PROGRAM CODE EXECUTABLE BY A GRAPHICAL PROCESSING UNIT (GPU)

Publication number: 20110252411

Abstract: A device receives program code, and receives size/type information associated with inputs to the program code. The device determines, prior to execution of the program code and based on the input size/type information, a portion of the program code that is executable by a graphical processing unit (GPU), and determines, prior to execution of the program code and based on the input size/type information, a portion of the program code that is executable by a central processing unit (CPU). The device compiles the GPU-executable portion of the program code to create a compiled GPU-executable portion of the program code, and compiles the CPU-executable portion of the program code to create a compiled CPU-executable portion of the program code. The device provides, to the GPU for execution, the compiled GPU-executable portion of the program code, and provides, to the CPU for execution, the compiled CPU-executable portion of the program code.

Type: Application

Filed: September 30, 2010

Publication date: October 13, 2011

Applicant: THE MATHWORKS, INC.

Inventors: Jocelyn Luke MARTIN, Joseph F. HICKLIN
Computer program functional partitioning system for heterogeneous multi-processing systems

Patent number: 8037463

Abstract: The present invention provides for a system for computer program functional partitioning for heterogeneous multi-processing systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A whole program representation is generated based on received computer program code. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one node-specific SESE region is identified based on identified SESE regions and the at least one system parameter. Each node-specific SESE region is grouped into a node-specific subroutine. Each node-specific subroutine is compiled based on a specified node characteristic. The computer program code is modified based on the node-specific subroutines and the modified computer program code is compiled.

Type: Grant

Filed: January 8, 2009

Date of Patent: October 11, 2011

Assignee: International Business Machines Corporation

Inventors: Kathryn M. O'Brien, John Kevin Patrick O'Brien
Framework for parallelizing general reduction

Patent number: 8037462

Abstract: A method for providing parallel processing capabilities including: performing scalar and array privatization analysis via a compiler; checking whether an assignment statement is reducible; recognizing reduction patterns through a pattern matching algorithm; classifying a reduction type of each of the reduction patterns; and performing transformations and code generation for each reduction the reduction type of each of the reduction patterns.

Type: Grant

Filed: August 2, 2006

Date of Patent: October 11, 2011

Assignee: International Business Machines Corporation

Inventors: Roch G. Archambault, Yaoqing Gao, Zhixing Ren, Raul E. Silvera
Method and apparatus for merging critical sections

Patent number: 8037466

Abstract: Critical sections used for multiple threads in a parallel program to access shared resource may be selected to merge with each other to reduce the number of signals/tokens used to create critical sections. Critical section merge may be based on a summarized dependence graph which is obtained from an instruction level dependence graph constructed based on a result of critical section minimization.

Type: Grant

Filed: December 29, 2006

Date of Patent: October 11, 2011

Assignee: Intel Corporation

Inventors: Xiaofeng Guo, Jinquan Dai, Long Li
Supporting applets on a high end platform

Patent number: 8032872

Abstract: To execute legacy smart card applications in a next generation smart card environment, a mechanism converts the applications into a format executable by the next generation smart card platforms. For instance, in a Java-based environment, a normalizer tool translates a CAP file into a Java Class file. Additional mechanisms recreate, on next generation smart cards, a specialized environment that allows the legacy applications to execute without impacting legacy and non-legacy application performance. For example, mechanisms create new instances of previously shared objects so that legacy applications can continue to expect exclusive access to those objects. Moreover, mechanisms manage the communication between a legacy application and non-legacy applications by controlling how and when calls are sent to the legacy application.

Type: Grant

Filed: December 18, 2006

Date of Patent: October 4, 2011

Assignee: Oracle America, Inc.

Inventors: Thierry P. Violleau, Tanjore S. Ravishankar, Matthew R. Hill, Saqib Ahmad
Computer program code size partitioning system for multiple memory multi-processing systems

Patent number: 8032873

Abstract: The present invention provides for a system for computer program code size partitioning for multiple memory multi-processor systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A program representation based on received computer program code is generated. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one SESE region of less than a certain size (store-size-specific) is identified based on identified SESE regions and the at least one system parameter. Each store-size-specific SESE region is grouped into a node-specific subroutine. The non node-specific parts of the computer program code are modified based on the partitioning into node-specific subroutines. The modified computer program code including each node-specific subroutine is compiled based on a specified node characteristic.

Type: Grant

Filed: December 17, 2008

Date of Patent: October 4, 2011

Assignee: International Business Machines Corporation

Inventors: Kathryn M. O'Brien, John Kevin Patrick O'Brien
METHOD AND SYSTEM FOR PARALLELIZATION OF SEQUENCIAL COMPUTER PROGRAM CODES

Publication number: 20110239201

Abstract: A method and system for parallelization of sequential computer program code are described. In one embodiment, an automatic parallelization system includes a syntactic analyser to analyze the structure of the sequential computer program code to identify the positions to insert SPI to the sequential computer code; a profiler for profiling the sequential computer program code by preparing call graph to determine dependency of each line of the sequential computer program code and the time required for the execution of each function of the sequential computer program code; an analyzer to determine parallelizability of the sequential computer program code from the information obtained by analysing and profiling of the sequential computer program code; and a code generator to insert SPI to the sequential computer program code upon determination of parallelizability to obtain parallel computer program code, which is further outputted to a parallel computing environment for execution and the method thereof.

Type: Application

Filed: December 1, 2009

Publication date: September 29, 2011

Inventors: Vinay G. Vaidya, Ranadive Priti, Sah Sudhakar
Computer architecture and method of operation for multi-computer distributed processing with finalization of objects

Patent number: 8028299

Abstract: The present invention discloses a modified computer architecture (50, 71, 72) which enables an applications program (50) to be run simultaneously on a plurality of computers (M1, . . . Mn). Shared memory at each computer is updated with amendments and/or overwrites so that all memory read requests are satisfied locally. During initial program loading (75), or similar, instructions which result in memory being re-written or manipulated are identified (92). Additional instructions are inserted (103) to cause the equivalent memory locations at all computers to be updated. In particular, the finalization of JAVA language classes and objects is disclosed (162, 163) so finalization only occurs when the last class or object present on all machines is no longer required.

Type: Grant

Filed: October 25, 2005

Date of Patent: September 27, 2011

Assignee: Waratek Pty, Ltd.

Inventor: John Matthew Holt
Run-Time parallelization of loops in computer programs using bit vectors

Patent number: 8028281

Abstract: Parallelization of loops is performed for loops having indirect loop index variables and embedded conditional statements in the loop body. Loops having any finite number of array variables in the loop body, and any finite number of indirect loop index variables can be parallelized. There are two particular limitations of the described techniques: (i) that there are no cross-iteration dependencies in the loop other than through the indirect loop index variables; and (ii) that the loop index variables (either direct or indirect) are not redefined in the loop body.

Type: Grant

Filed: January 5, 2007

Date of Patent: September 27, 2011

Assignee: International Business Machines Corporation

Inventor: Rajendra K. Bera
Method for compiling scalar code for a single instruction multiple data (SIMD) execution engine

Patent number: 8010953

Abstract: Performing scalar operations using a SIMD data parallel execution unit is provided. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined.

Type: Grant

Filed: April 4, 2006

Date of Patent: August 30, 2011

Assignee: International Business Machines Corporation

Inventor: Michael K. Gschwind
Parallel programming interface to dynamically allocate program portions

Patent number: 8010954

Abstract: A computing device-implemented method includes receiving a program created by a technical computing environment, analyzing the program, generating multiple program portions based on the analysis of the program, dynamically allocating the multiple program portions to multiple software units of execution for parallel programming, receiving multiple results associated with the multiple program portions from the multiple software units of execution, and providing the multiple results or a single result to the program.

Type: Grant

Filed: May 15, 2007

Date of Patent: August 30, 2011

Assignee: The MathWorks, Inc.

Inventors: John N. Little, Joseph F. Hicklin, Jocelyn Luke Martin, Nausheen B. Moulana, Halldor N. Stefansson, Loren Dean, Roy E. Lurie, Stephen C. Johnson, Penelope L. Anderson, Michael E. Karr, Jason A. Kinchen
PARALLELIZATION METHOD, SYSTEM AND PROGRAM

Publication number: 20110209129

Abstract: A parallelization method, system and program. A program expressed by a block diagram or the like is divided into strands and a balance in calculation time is made among the strands. The functional blocks are divided into strands and the strand involving the maximum calculation time from a strand set is found. One or more movable blocks in the strand involving the maximum calculation time is found. The next step is obtaining calculation time of each strand after the movable block is moved to the strand in the input or output direction according to its property, and moving the block to a strand most largely reducing the calculation time of the strand having the maximum calculation time before the movement. This process loops until calculation time is no longer reduced. Strands are then transformed into source codes. Source codes are compiled and assigned to separate cores or processors for execution.

Type: Application

Filed: February 22, 2011

Publication date: August 25, 2011

Applicant: International Business Machines Corporation

Inventors: Hideaki Komatsu, Takeo Yoshizawa
Workload partitioning in a parallel system with hetergeneous alignment constraints

Patent number: 8006238

Abstract: A process, compiler, computer program product and system for workload partitioning in a heterogeneous system. The process includes determining heterogeneous alignment constraints in the workload, partitioning a portion of tasks to a processing element sensitive to alignment constraints, and partitioning a remaining portion of tasks to a processing element not sensitive to alignment constraints.

Type: Grant

Filed: September 26, 2006

Date of Patent: August 23, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Kathryn M. O'Brien, Tong Chen
Method for the translation of programs for reconfigurable architectures

Patent number: 7996827

Abstract: A method for advantageously translating high-level language codes for data processing using a reconfigurable architecture, memories addressable internally from within said reconfigurable architecture, and memories external to said reconfigurable architecture, may include constructing a finite automaton for computation in such a way that a complex combinatory network of individual functions is formed, assigning memories to the network for storage of operands and results, and separating external memory accesses for providing a transfer of at least one of operands and results as data from an external memory to a memory addressable internally by the reconfigurable architecture.

Type: Grant

Filed: August 16, 2002

Date of Patent: August 9, 2011

Inventors: Martin Vorbach, May Frank, Armin Nückel
Schema specific parser generation

Patent number: 7991799

Abstract: A computer-implemented method of creating a schema specific parser for processing Extensible Markup Language (XML) documents can include identifying a plurality of XML processing templates, wherein each of the plurality of XML processing templates performs a specific task of processing an XML document against an XML schema component. An XML schema including a plurality of components can be received. A hierarchy of the plurality of components of the XML schema can be determined. An execution plan specifying a hierarchy of XML processing instructions can be created, wherein each XML processing instruction is associated with an XML processing template from the plurality of XML processing templates. The hierarchy of the XML processing templates can be determined according to the hierarchy of components of the XML schema. The execution plan can be compiled to generate the schema specific parser. The schema specific parser can be output.

Type: Grant

Filed: June 5, 2007

Date of Patent: August 2, 2011

Assignee: International Business Machines Corporation

Inventors: Abraham Heifets, Margaret G. Kostoulas, Moshe Morris Emanuel Matsa, Eric Perkins
Dynamic distribution for distributed arrays and related rules

Patent number: 7987227

Abstract: The present invention provides a method and system for the dynamic distribution of an array in a parallel computing environment. The present invention obtains a criterion for distributing an array and performs flexible portioning based on the obtained criterion. In some embodiment analysis may be performed based on the criterion. The flexible portioning is then performed based on the analysis.

Type: Grant

Filed: May 12, 2010

Date of Patent: July 26, 2011

Assignee: The MathWorks, Inc.

Inventors: Penelope Anderson, Cleve Moler, Sheung Hun Cheng, Patrick D. Quillen
TICC-paradigm to build formally verified parallel software for multi-core chips

Patent number: 7979844

Abstract: This invention teaches a way of implementing formally verified massively parallel programs, which run efficiently in distributed and shared-memory multi-core chips. It allows programs to be developed from an initial abstract statement of interactions among parallel software components, called cells, and progressively refine them to their final implementation. At each stage of refinement a formal description of patterns of events in computations is derived automatically from implementations. This formal description is used for two purposes: One is to prove correctness, timings, progress, mutual exclusion, and freedom from deadlocks/livelocks, etc. The second is to automatically incorporate into each application a Self-Monitoring System (SMS) that constantly monitors the application in parallel, with no interference with its timings, to identify and report errors in performance, pending errors, and patterns of critical behavior.

Type: Grant

Filed: May 5, 2009

Date of Patent: July 12, 2011

Assignee: EDSS, Inc.

Inventor: Chitoor V. Srinivasan
SYSTEMS, APPARATUSES, AND METHODS FOR A HARDWARE AND SOFTWARE SYSTEM TO AUTOMATICALLY DECOMPOSE A PROGRAM TO MULTIPLE PARALLEL THREADS

Publication number: 20110167416

Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.

Type: Application

Filed: December 25, 2010

Publication date: July 7, 2011

Inventors: David J. Sager, Ruchira Sasanka, Ron Gabor, Shlomo Raikin, Joseph Nuzman, Leeor Peled, Jason A. Domer, Ho-Seop Kim, Youfeng Wu, Koichi Yamada, Tin-Fook Ngai, Howard H. Chen, Jayaram Bobba, Jeffery J. Cook, Omar M. Shaikh, Suresh Srinivas
PROGRAMMING SYSTEM IN MULTI-CORE, AND METHOD AND PROGRAM OF THE SAME

Publication number: 20110167417

Abstract: A first compiler generates one or more object codes from a program code for a first processor included in an arithmetic processing system to which a plurality of processors are mutually connected. A first linker links the generated one or more object codes to generate an execution file for the first processor. A parameter information generation unit generates, based on the information acquired from the first linker, parameter information used in a second processor included in the arithmetic processing system. A second compiler refers to a program code and the parameter information for the second processor to generate one or more object codes. A second linker links the generated one or more object codes to generate an execution file for the second processor.

Type: Application

Filed: July 23, 2009

Publication date: July 7, 2011

Inventor: Tomoyoshi Kobori
METHOD AND APPARATUS FOR TRANSFORMING PROGRAM CODE

Publication number: 20110161944

Abstract: Provided is a method of transforming program code written such that a plurality of work-items are allocated respectively to and concurrently executed on a plurality of processing elements included in a computing unit. A program code translator may identify, in the program code, two or more code regions, which are to be enclosed by work-item coalescing loops (WCLs), based on a synchronization barrier function contained in the program code, such that the work-items are serially executable on a smaller number of processing elements than a number of the processing elements, and may enclose the identified code regions with the WCLs, respectively.

Type: Application

Filed: December 23, 2010

Publication date: June 30, 2011

Applicants: SAMSUNG ELECTRONICS CO., LTD., SUN R&DB FOUNDATION

Inventors: Seung-Mo Cho, Jong-Deok Choi, Jaejin Lee
METHOD TO DYNAMICALLY DISTRIBUTE A MULTI-DIMENSIONAL WORK SET ACROSS A MULTI-CORE SYSTEM

Publication number: 20110161943

Abstract: A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item.

Type: Application

Filed: December 30, 2009

Publication date: June 30, 2011

Applicant: IBM CORPORATION

Inventors: Gregory H. Bellows, Brian H. Horton, Joaquin Madruga, Barry L. Minor
Compiler method for employing multiple autonomous synergistic processors to simultaneously operate on longer vectors of data

Patent number: 7962906

Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.

Type: Grant

Filed: March 15, 2007

Date of Patent: June 14, 2011

Assignee: International Business Machines Corporation

Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
System and method for synchronizing test runs on separate systems

Patent number: 7962799

Abstract: A system and method provide for test automation of a process running on separated systems. The systems may be separated physically and/or logically separated. The system and method provide that all information required for a test run are made available on one system. In an embodiment, a central component is used to provide all required status and result information regarding the test status of every system in the test landscape. In further embodiment, an extension of the capabilities of existing test tools is made so that the test tool communicates with the central component via an appropriate protocol.

Type: Grant

Filed: December 30, 2007

Date of Patent: June 14, 2011

Assignee: SAP AG

Inventors: Michael Lauer, Frank Reisenhofer
LINK-TIME RESOURCE ALLOCATION FOR A MULTI-THREADED PROCESSOR ARCHITECTURE

Publication number: 20110131558

Abstract: A method comprising: independently compiling a plurality of modules of source code to generate a plurality of respective object modules comprising a plurality of respective parallel threads explicitly designated by a user to be executed in parallel on a target platform; in each of the object modules, inserting at least one symbol indicative of a usage of a resource of the target platform associated with the respective thread; executing a linker to perform a linking process for linking the object modules, wherein the linking process comprises assessing the symbols in conjunction with one another, and based on the assessment generating an indication relating to a usage of the resource required for execution of the threads in parallel.

Type: Application

Filed: May 11, 2009

Publication date: June 2, 2011

Applicant: Xmos Limited

Inventors: Martin Young, Richard Osborne, Douglas Watt

prev … 5 6 7 8 9 10 11 12 13 … next