For A Parallel Or Multiprocessor System Patents (Class 717/149)
  • Patent number: 8136097
    Abstract: A thread debugging device which can provide reliable debugging is provided when at least one thread is debugged among a plurality of threads which are executed in association with each other. According to the thread debugging device, a target computer (20) executes at least some processing of at least one target thread to be debugged among the plurality of threads, and further executes non-target threads, which are threads other than the at least one target thread among the plurality of threads, during execution of the at least one target thread while restricting access by the non-target threads to at least some hardware resources of the computer (20).
    Type: Grant
    Filed: October 25, 2006
    Date of Patent: March 13, 2012
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Yousuke Konishi, Shinichiro Mikami, Makoto Ishii, Yasuyuki Kinoshita, Atsuhiko Fujimoto, Masayuki Takahashi
  • Patent number: 8136104
    Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
    Type: Grant
    Filed: March 5, 2007
    Date of Patent: March 13, 2012
    Assignee: Google Inc.
    Inventors: Matthew N. Papakipos, Brian K. Grant, Morgan S. McGuire, Christopher G. Demetriou
  • Patent number: 8136107
    Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: March 13, 2012
    Assignee: International Business Machines Corporation
    Inventor: Ayal Zaks
  • Patent number: 8136105
    Abstract: A computer program product is provided for extracting SIMD parallelism. The computer program product includes instructions for providing a stream of input code comprising basic blocks; identifying pairs of statements that are semi-isomorphic with respect to each other within a basic block; iteratively combining into packs, pairs of statements that are semi-isomorphic with respect to each other, and combining packs into combined packs; collecting packs whose statements can be scheduled together for processing; and generating SIMD instructions for each pack to provide for extracting the SIMD parallelism..
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: March 13, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
  • Patent number: 8127283
    Abstract: In one embodiment, the present invention includes a method for developing of a parallel program by specifying graphical representations for input data objects into a parallel computation code segment, specifying graphical representations for parallel program schemes, each including at least one graphical representation of an operator to perform an operation on an data object, determining if any of the parallel program schemes include at least one alternative computation, and unrolling the corresponding parallel program schemes and generating alternative parallel program scheme fragments therefrom. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 5, 2007
    Date of Patent: February 28, 2012
    Assignee: Intel Corporation
    Inventors: Yuriy E. Sheynin, Alexey Y. Syschikov
  • Patent number: 8122441
    Abstract: Embodiments of the invention enable application programs running across multiple compute nodes of a highly-parallel system to compile source code into native instructions, and subsequently share the optimizations used to compile the source code with other nodes. For example, determining what optimizations to use may consume significant processing power and memory on a node. In cases where multiple nodes exhibit similar characteristics, it is possible that these nodes may use the same set of optimizations when compiling similar pieces of code. Therefore, when one node compiles source code into native instructions, it may share the optimizations used with other similar nodes, thereby removing the burden for the other nodes to figure out which optimizations to use. Thus, while one node may suffer a performance hit for determining the necessary optimizations, other nodes may be saved from this burden by simply using the optimizations provided to them.
    Type: Grant
    Filed: June 24, 2008
    Date of Patent: February 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: Eric L. Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
  • Patent number: 8122442
    Abstract: A method for transforming access to a structure array, that includes compiling source code, wherein compiling the source code includes identifying the structure array in the source code, performing an object safety analysis to determine whether the structure array is safe for transformation, wherein the object safety analysis includes an inter-procedural alias class analysis, performing a profitability analysis on the structure array when the structure array is safe for transformation, wherein the profitability analysis includes selecting a transformation from a plurality of transformations, wherein the plurality of transformations includes a pointer based fully splitting transformation, a pointer based partially splitting transformation, and an address based fully splitting transformation, and performing the selected transformation on the structure array, and storing the compiled code.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: February 21, 2012
    Assignee: Oracle America, Inc.
    Inventor: Jin Lin
  • Patent number: 8108847
    Abstract: A system can include an analyzer module configured to analyze spill code generated by a register allocator to determine that register spill instructions can be paired, wherein paired register spill instructions relate to corresponding register locations in each of a first register set and a second register set and that no instructions between said register spill instructions modify any of said register spill instructions; a rewriter module configured to, based on the determining, modify said register spill instructions as a parallel register spill instruction; and a storage module configured to configure storage of associated register spills in memory so said register spills can be loaded back in parallel into corresponding registers of said first and second register sets based on said modified parallel register spill instruction, wherein the configuration of storage includes allocation of space on a memory stack such that the register spills are double word aligned.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: January 31, 2012
    Assignee: International Business Machines Corporation
    Inventor: Christopher Lapkowski
  • Patent number: 8108845
    Abstract: A computing system receives a program created by a technical computing environment, analyzes the program, generates multiple program portions based on the analysis of the program, dynamically allocates the multiple program portions to multiple software units of execution for parallel programming, receives multiple results associated with the multiple program portions from the multiple software units of execution, and provides the multiple results or a single result to the program.
    Type: Grant
    Filed: May 15, 2007
    Date of Patent: January 31, 2012
    Assignee: The Mathworks, Inc.
    Inventors: John N. Little, Joseph F. Hicklin, Jocelyn Luke Martin, Nausheen B. Moulana, Halldor N. Stefansson, Loren Dean, Roy E. Lurie, Stephen C. Johnson, Penelope L. Anderson, Michael E. Karr, Jason A. Kinchen
  • Patent number: 8108846
    Abstract: A mechanism is provided for performing scalar operations using a SIMD data parallel execution unit. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined.
    Type: Grant
    Filed: May 28, 2008
    Date of Patent: January 31, 2012
    Assignee: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Patent number: 8108838
    Abstract: A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.
    Type: Grant
    Filed: May 15, 2008
    Date of Patent: January 31, 2012
    Assignee: International Business Machines Corporation
    Inventors: Sameh W. Asaad, Richard Gerard Hofmann
  • Patent number: 8108844
    Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
    Type: Grant
    Filed: March 5, 2007
    Date of Patent: January 31, 2012
    Assignee: Google Inc.
    Inventors: William Y. Crutchfield, Brian K. Grant, Matthew N. Papakipos
  • Publication number: 20120023479
    Abstract: A method and a computer program product include the steps of receiving, into a computing machine, a business logic (BL) source code for an application program, the BL source code comprising at least a class, an object and a method or function. Steps transform, in the computing machine, the BL source code into an executable BL. Steps acquire schema information from the BL source code. Steps generate, in the computing machine, a user interface (UI) source code for a form using the schema information for the class. Steps transform, in the computing machine, the UI source code into at least one executable UI.
    Type: Application
    Filed: July 21, 2010
    Publication date: January 26, 2012
    Inventor: Gregory S. Moress
  • Patent number: 8104030
    Abstract: A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: January 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Raul Esteban Silvera, Priya Unnikrishnan, Guansong Zhang
  • Patent number: 8104028
    Abstract: Repetitive synchronization in program code is optimized through lock coarsening that is performed subject to a number of constraints. Using a forward pass over the program code followed by a backward pass, region extent bits may be determined that identify the points in the program where object locking can be coarsened. The program code may then be modified to realize coarsened locking regions determined based on the region extent bits. Alternatively, previously determined value numbers may provide much of the information collected by the two passes. In such a case, a single pass over the program code may locate features that limit lock coarsening opportunities. A set of synchronization operations that can be removed may then be determined and used when modifying the program code to coarsen locking regions.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: January 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Mark Graham Stoodley, Vijay Sundaresan
  • Publication number: 20120005662
    Abstract: A high level programming language provides an extensible set of transformations for use on indexable types in a data parallel processing environment. A compiler for the language implements each transformation as a map from indexable types to allow each transformation to be applied to other transformations. At compile time, the compiler identifies sequences of the transformations on each indexable type in data parallel source code and generates data parallel executable code to implement the sequences as a combined operation at runtime using the transformation maps. The compiler also incorporates optimizations that are based on the sequences of transformations into the data parallel executable code.
    Type: Application
    Filed: June 30, 2010
    Publication date: January 5, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Paul F. Ringseth, Weirong Zhu, Rick Molloy, Charles D. Callahan, II, Yosseff Levanoni, Lingli Zhang
  • Patent number: 8091078
    Abstract: A program is into at least two object files: one object file for each of the supported processor environments. During compilation, code characteristics, such as data locality, computational intensity, and data parallelism, are analyzed and recorded in the object file. During run time, the code characteristics are combined with runtime considerations, such as the current load on the processors and the size of the data being processed, to arrive at an overall value. The overall value is then used to determine which of the processors will be assigned the task. The values are assigned based on the characteristics of the various processors. For example, if one processor is better at handling intensive computations against large streams of data, programs that are highly computationally intensive and process large quantities of data are weighted in favor of that processor. The corresponding object is then loaded and executed on the assigned processor.
    Type: Grant
    Filed: May 7, 2008
    Date of Patent: January 3, 2012
    Assignee: International Business Machines Corporation
    Inventors: Daniel Alan Brokenshire, Harm Peter Hofstee, Barry L Minor, Mark Richard Nutter
  • Publication number: 20110314458
    Abstract: A compile environment is provided in a computer system that allows programmers to program both CPUs and data parallel devices (e.g., GPUs) using a high level general purpose programming language that has data parallel (DP) extensions. A compilation process translates modular DP code written in the general purpose language into DP device source code in a high level DP device programming language using a set of binding descriptors for the DP device source code. A binder generates a single, self-contained DP device source code unit from the set of binding descriptors. A DP device compiler generates a DP device executable for execution on one or more data parallel devices from the DP device source code unit.
    Type: Application
    Filed: June 22, 2010
    Publication date: December 22, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Weirong Zhu, Lingli Zhang, Sukhdeep S. Sodhi, Yosseff Levanoni
  • Patent number: 8082545
    Abstract: Monitoring system wide task dispatch allows dynamic adaptation to conditions of a system. A monitor tracks the total tasks currently dispatched to the system. In a system with multiple processing units, this monitor is centralized and collects information about tasks dispatched to each of the processing units. The monitor compares the total dispatched tasks against a threshold that has already been defined. Further dispatching of tasks to the system is regulated based on comparison of the total dispatched tasks against the threshold. If the comparison achieves a trigger condition (e.g., total dispatched tasks exceeding the threshold), then task dispatch is throttled. Throttling further task dispatching, as long as the threshold is exceeded, allows progress to continue without overwhelming the system.
    Type: Grant
    Filed: September 9, 2005
    Date of Patent: December 20, 2011
    Assignee: Oracle America, Inc.
    Inventor: Raj Prakash
  • Patent number: 8074211
    Abstract: According to one embodiment, a grouping method for process units, each including basic modules and data, the process units being assigned to processors in a program for a multiprocessor system, the program including the basic modules and a parallel statement describing relationships between parallel processes for the basic modules, the method includes displaying a dataflow graph visually showing a process status of each process unit based on the parallel statement, and specifying a candidate for a connection of process units on the dataflow graph, wherein the dataflow graph displays data entries, nodes in the basic modules, and edges connecting the data entries and the nodes.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: December 6, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Ryuji Sakai
  • Patent number: 8065130
    Abstract: Programmable architecture for implementing a message processing system using an integrated circuit is described. In an example, configurable logic of an integrated circuit is configured to have a plurality of thread circuits and a memory. Messages are received to the integrated circuit for storage in the memory. The memory is accessed with the plurality of threads to concurrently process a plurality of the messages.
    Type: Grant
    Filed: May 13, 2009
    Date of Patent: November 22, 2011
    Assignee: Xilinx, Inc.
    Inventors: Gordon J. Brebner, Philip B. James-Roxby, Eric R. Keller, Chidamber R. Kulkarni
  • Publication number: 20110271263
    Abstract: Compiling software for a hierarchical distributed processing system including providing to one or more compiling nodes software to be compiled, wherein at least a portion of the software to be compiled is to be executed by one or more other nodes; compiling, by the compiling node, the software; maintaining, by the compiling node, any compiled software to be executed on the compiling node; selecting, by the compiling node, one or more nodes in a next tier of the hierarchy of the distributed processing system in dependence upon whether any compiled software is for the selected node or the selected node's descendants; sending to the selected node only the compiled software to be executed by the selected node or selected node's descendant.
    Type: Application
    Filed: April 29, 2010
    Publication date: November 3, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110271264
    Abstract: Data processing using multidimensional fields is described along with methods for advantageously using high-level language codes.
    Type: Application
    Filed: July 7, 2011
    Publication date: November 3, 2011
    Inventors: Martin VORBACH, Frank May, Armin Nückel
  • Patent number: 8051412
    Abstract: Performance of a heterogeneous multiprocessor is reduced as much as possible within a short time without any awareness of parallelization matched with a configuration of the heterogeneous multiprocessor. In a heterogeneous multiprocessor system, tasks having parallelism are automatically extracted by a compiler, a portion to be efficiently processed by a dedicated processor is extracted from an input program being a processing target, and processing time is estimated. Thus, by arranging the tasks according to PU characteristics, scheduling for efficiently operating a plurality of PU's in parallel is carried out.
    Type: Grant
    Filed: March 12, 2007
    Date of Patent: November 1, 2011
    Assignee: Waseda University
    Inventors: Hironori Kasahara, Keiji Kimura, Hiroaki Shikano
  • Publication number: 20110265068
    Abstract: A mechanism is provided for improving single-thread performance for a multi-threaded, in-order processor core. In a first phase, a compiler analyzes application code to identify instructions that can be executed in parallel with focus on instruction-level parallelism and removing any register interference between the threads. The compiler inserts as appropriate synchronization instructions supported by the apparatus to ensure that the resulting execution of the threads is equivalent to the execution of the application code in a single thread. In a second phase, an operating system schedules the threads produced in the first phase on the hardware threads of a single processor core such that they execute simultaneously. In a third phase, the microprocessor core executes the threads specified by the second phase such that there is one hardware thread executing an application thread.
    Type: Application
    Filed: April 27, 2010
    Publication date: October 27, 2011
    Applicant: International Business Machines Corporation
    Inventors: Elmootazbellah N. Elnozahy, Ahmed Gheith
  • Patent number: 8046750
    Abstract: Core commands and aggregations of such commands are provided to programmers to enable them to generate programs that can be parallel-processed without requiring the programmer to be aware of parallel-processing techniques. The core commands and aggregations abstract mechanisms that can be executed in parallel, enabling the programmer to focus on higher-level concepts. The core commands provided include commands for applying a function in parallel and distributing and joining data in parallel. The output of each core command can implement an interface that can enable underlying mechanisms to stitch together multiple core commands in a cohesive manner to perform more complex actions.
    Type: Grant
    Filed: June 13, 2007
    Date of Patent: October 25, 2011
    Assignee: Microsoft Corporation
    Inventors: William D Ramsey, Ronnie I Chaiken
  • Publication number: 20110252411
    Abstract: A device receives program code, and receives size/type information associated with inputs to the program code. The device determines, prior to execution of the program code and based on the input size/type information, a portion of the program code that is executable by a graphical processing unit (GPU), and determines, prior to execution of the program code and based on the input size/type information, a portion of the program code that is executable by a central processing unit (CPU). The device compiles the GPU-executable portion of the program code to create a compiled GPU-executable portion of the program code, and compiles the CPU-executable portion of the program code to create a compiled CPU-executable portion of the program code. The device provides, to the GPU for execution, the compiled GPU-executable portion of the program code, and provides, to the CPU for execution, the compiled CPU-executable portion of the program code.
    Type: Application
    Filed: September 30, 2010
    Publication date: October 13, 2011
    Applicant: THE MATHWORKS, INC.
    Inventors: Jocelyn Luke MARTIN, Joseph F. HICKLIN
  • Patent number: 8037463
    Abstract: The present invention provides for a system for computer program functional partitioning for heterogeneous multi-processing systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A whole program representation is generated based on received computer program code. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one node-specific SESE region is identified based on identified SESE regions and the at least one system parameter. Each node-specific SESE region is grouped into a node-specific subroutine. Each node-specific subroutine is compiled based on a specified node characteristic. The computer program code is modified based on the node-specific subroutines and the modified computer program code is compiled.
    Type: Grant
    Filed: January 8, 2009
    Date of Patent: October 11, 2011
    Assignee: International Business Machines Corporation
    Inventors: Kathryn M. O'Brien, John Kevin Patrick O'Brien
  • Patent number: 8037462
    Abstract: A method for providing parallel processing capabilities including: performing scalar and array privatization analysis via a compiler; checking whether an assignment statement is reducible; recognizing reduction patterns through a pattern matching algorithm; classifying a reduction type of each of the reduction patterns; and performing transformations and code generation for each reduction the reduction type of each of the reduction patterns.
    Type: Grant
    Filed: August 2, 2006
    Date of Patent: October 11, 2011
    Assignee: International Business Machines Corporation
    Inventors: Roch G. Archambault, Yaoqing Gao, Zhixing Ren, Raul E. Silvera
  • Patent number: 8037466
    Abstract: Critical sections used for multiple threads in a parallel program to access shared resource may be selected to merge with each other to reduce the number of signals/tokens used to create critical sections. Critical section merge may be based on a summarized dependence graph which is obtained from an instruction level dependence graph constructed based on a result of critical section minimization.
    Type: Grant
    Filed: December 29, 2006
    Date of Patent: October 11, 2011
    Assignee: Intel Corporation
    Inventors: Xiaofeng Guo, Jinquan Dai, Long Li
  • Patent number: 8032872
    Abstract: To execute legacy smart card applications in a next generation smart card environment, a mechanism converts the applications into a format executable by the next generation smart card platforms. For instance, in a Java-based environment, a normalizer tool translates a CAP file into a Java Class file. Additional mechanisms recreate, on next generation smart cards, a specialized environment that allows the legacy applications to execute without impacting legacy and non-legacy application performance. For example, mechanisms create new instances of previously shared objects so that legacy applications can continue to expect exclusive access to those objects. Moreover, mechanisms manage the communication between a legacy application and non-legacy applications by controlling how and when calls are sent to the legacy application.
    Type: Grant
    Filed: December 18, 2006
    Date of Patent: October 4, 2011
    Assignee: Oracle America, Inc.
    Inventors: Thierry P. Violleau, Tanjore S. Ravishankar, Matthew R. Hill, Saqib Ahmad
  • Patent number: 8032873
    Abstract: The present invention provides for a system for computer program code size partitioning for multiple memory multi-processor systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A program representation based on received computer program code is generated. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one SESE region of less than a certain size (store-size-specific) is identified based on identified SESE regions and the at least one system parameter. Each store-size-specific SESE region is grouped into a node-specific subroutine. The non node-specific parts of the computer program code are modified based on the partitioning into node-specific subroutines. The modified computer program code including each node-specific subroutine is compiled based on a specified node characteristic.
    Type: Grant
    Filed: December 17, 2008
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Kathryn M. O'Brien, John Kevin Patrick O'Brien
  • Publication number: 20110239201
    Abstract: A method and system for parallelization of sequential computer program code are described. In one embodiment, an automatic parallelization system includes a syntactic analyser to analyze the structure of the sequential computer program code to identify the positions to insert SPI to the sequential computer code; a profiler for profiling the sequential computer program code by preparing call graph to determine dependency of each line of the sequential computer program code and the time required for the execution of each function of the sequential computer program code; an analyzer to determine parallelizability of the sequential computer program code from the information obtained by analysing and profiling of the sequential computer program code; and a code generator to insert SPI to the sequential computer program code upon determination of parallelizability to obtain parallel computer program code, which is further outputted to a parallel computing environment for execution and the method thereof.
    Type: Application
    Filed: December 1, 2009
    Publication date: September 29, 2011
    Inventors: Vinay G. Vaidya, Ranadive Priti, Sah Sudhakar
  • Patent number: 8028299
    Abstract: The present invention discloses a modified computer architecture (50, 71, 72) which enables an applications program (50) to be run simultaneously on a plurality of computers (M1, . . . Mn). Shared memory at each computer is updated with amendments and/or overwrites so that all memory read requests are satisfied locally. During initial program loading (75), or similar, instructions which result in memory being re-written or manipulated are identified (92). Additional instructions are inserted (103) to cause the equivalent memory locations at all computers to be updated. In particular, the finalization of JAVA language classes and objects is disclosed (162, 163) so finalization only occurs when the last class or object present on all machines is no longer required.
    Type: Grant
    Filed: October 25, 2005
    Date of Patent: September 27, 2011
    Assignee: Waratek Pty, Ltd.
    Inventor: John Matthew Holt
  • Patent number: 8028281
    Abstract: Parallelization of loops is performed for loops having indirect loop index variables and embedded conditional statements in the loop body. Loops having any finite number of array variables in the loop body, and any finite number of indirect loop index variables can be parallelized. There are two particular limitations of the described techniques: (i) that there are no cross-iteration dependencies in the loop other than through the indirect loop index variables; and (ii) that the loop index variables (either direct or indirect) are not redefined in the loop body.
    Type: Grant
    Filed: January 5, 2007
    Date of Patent: September 27, 2011
    Assignee: International Business Machines Corporation
    Inventor: Rajendra K. Bera
  • Patent number: 8010953
    Abstract: Performing scalar operations using a SIMD data parallel execution unit is provided. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined.
    Type: Grant
    Filed: April 4, 2006
    Date of Patent: August 30, 2011
    Assignee: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Patent number: 8010954
    Abstract: A computing device-implemented method includes receiving a program created by a technical computing environment, analyzing the program, generating multiple program portions based on the analysis of the program, dynamically allocating the multiple program portions to multiple software units of execution for parallel programming, receiving multiple results associated with the multiple program portions from the multiple software units of execution, and providing the multiple results or a single result to the program.
    Type: Grant
    Filed: May 15, 2007
    Date of Patent: August 30, 2011
    Assignee: The MathWorks, Inc.
    Inventors: John N. Little, Joseph F. Hicklin, Jocelyn Luke Martin, Nausheen B. Moulana, Halldor N. Stefansson, Loren Dean, Roy E. Lurie, Stephen C. Johnson, Penelope L. Anderson, Michael E. Karr, Jason A. Kinchen
  • Publication number: 20110209129
    Abstract: A parallelization method, system and program. A program expressed by a block diagram or the like is divided into strands and a balance in calculation time is made among the strands. The functional blocks are divided into strands and the strand involving the maximum calculation time from a strand set is found. One or more movable blocks in the strand involving the maximum calculation time is found. The next step is obtaining calculation time of each strand after the movable block is moved to the strand in the input or output direction according to its property, and moving the block to a strand most largely reducing the calculation time of the strand having the maximum calculation time before the movement. This process loops until calculation time is no longer reduced. Strands are then transformed into source codes. Source codes are compiled and assigned to separate cores or processors for execution.
    Type: Application
    Filed: February 22, 2011
    Publication date: August 25, 2011
    Applicant: International Business Machines Corporation
    Inventors: Hideaki Komatsu, Takeo Yoshizawa
  • Patent number: 8006238
    Abstract: A process, compiler, computer program product and system for workload partitioning in a heterogeneous system. The process includes determining heterogeneous alignment constraints in the workload, partitioning a portion of tasks to a processing element sensitive to alignment constraints, and partitioning a remaining portion of tasks to a processing element not sensitive to alignment constraints.
    Type: Grant
    Filed: September 26, 2006
    Date of Patent: August 23, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Kathryn M. O'Brien, Tong Chen
  • Patent number: 7996827
    Abstract: A method for advantageously translating high-level language codes for data processing using a reconfigurable architecture, memories addressable internally from within said reconfigurable architecture, and memories external to said reconfigurable architecture, may include constructing a finite automaton for computation in such a way that a complex combinatory network of individual functions is formed, assigning memories to the network for storage of operands and results, and separating external memory accesses for providing a transfer of at least one of operands and results as data from an external memory to a memory addressable internally by the reconfigurable architecture.
    Type: Grant
    Filed: August 16, 2002
    Date of Patent: August 9, 2011
    Inventors: Martin Vorbach, May Frank, Armin Nückel
  • Patent number: 7991799
    Abstract: A computer-implemented method of creating a schema specific parser for processing Extensible Markup Language (XML) documents can include identifying a plurality of XML processing templates, wherein each of the plurality of XML processing templates performs a specific task of processing an XML document against an XML schema component. An XML schema including a plurality of components can be received. A hierarchy of the plurality of components of the XML schema can be determined. An execution plan specifying a hierarchy of XML processing instructions can be created, wherein each XML processing instruction is associated with an XML processing template from the plurality of XML processing templates. The hierarchy of the XML processing templates can be determined according to the hierarchy of components of the XML schema. The execution plan can be compiled to generate the schema specific parser. The schema specific parser can be output.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: August 2, 2011
    Assignee: International Business Machines Corporation
    Inventors: Abraham Heifets, Margaret G. Kostoulas, Moshe Morris Emanuel Matsa, Eric Perkins
  • Patent number: 7987227
    Abstract: The present invention provides a method and system for the dynamic distribution of an array in a parallel computing environment. The present invention obtains a criterion for distributing an array and performs flexible portioning based on the obtained criterion. In some embodiment analysis may be performed based on the criterion. The flexible portioning is then performed based on the analysis.
    Type: Grant
    Filed: May 12, 2010
    Date of Patent: July 26, 2011
    Assignee: The MathWorks, Inc.
    Inventors: Penelope Anderson, Cleve Moler, Sheung Hun Cheng, Patrick D. Quillen
  • Patent number: 7979844
    Abstract: This invention teaches a way of implementing formally verified massively parallel programs, which run efficiently in distributed and shared-memory multi-core chips. It allows programs to be developed from an initial abstract statement of interactions among parallel software components, called cells, and progressively refine them to their final implementation. At each stage of refinement a formal description of patterns of events in computations is derived automatically from implementations. This formal description is used for two purposes: One is to prove correctness, timings, progress, mutual exclusion, and freedom from deadlocks/livelocks, etc. The second is to automatically incorporate into each application a Self-Monitoring System (SMS) that constantly monitors the application in parallel, with no interference with its timings, to identify and report errors in performance, pending errors, and patterns of critical behavior.
    Type: Grant
    Filed: May 5, 2009
    Date of Patent: July 12, 2011
    Assignee: EDSS, Inc.
    Inventor: Chitoor V. Srinivasan
  • Publication number: 20110167416
    Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
    Type: Application
    Filed: December 25, 2010
    Publication date: July 7, 2011
    Inventors: David J. Sager, Ruchira Sasanka, Ron Gabor, Shlomo Raikin, Joseph Nuzman, Leeor Peled, Jason A. Domer, Ho-Seop Kim, Youfeng Wu, Koichi Yamada, Tin-Fook Ngai, Howard H. Chen, Jayaram Bobba, Jeffery J. Cook, Omar M. Shaikh, Suresh Srinivas
  • Publication number: 20110167417
    Abstract: A first compiler generates one or more object codes from a program code for a first processor included in an arithmetic processing system to which a plurality of processors are mutually connected. A first linker links the generated one or more object codes to generate an execution file for the first processor. A parameter information generation unit generates, based on the information acquired from the first linker, parameter information used in a second processor included in the arithmetic processing system. A second compiler refers to a program code and the parameter information for the second processor to generate one or more object codes. A second linker links the generated one or more object codes to generate an execution file for the second processor.
    Type: Application
    Filed: July 23, 2009
    Publication date: July 7, 2011
    Inventor: Tomoyoshi Kobori
  • Publication number: 20110161944
    Abstract: Provided is a method of transforming program code written such that a plurality of work-items are allocated respectively to and concurrently executed on a plurality of processing elements included in a computing unit. A program code translator may identify, in the program code, two or more code regions, which are to be enclosed by work-item coalescing loops (WCLs), based on a synchronization barrier function contained in the program code, such that the work-items are serially executable on a smaller number of processing elements than a number of the processing elements, and may enclose the identified code regions with the WCLs, respectively.
    Type: Application
    Filed: December 23, 2010
    Publication date: June 30, 2011
    Applicants: SAMSUNG ELECTRONICS CO., LTD., SUN R&DB FOUNDATION
    Inventors: Seung-Mo Cho, Jong-Deok Choi, Jaejin Lee
  • Publication number: 20110161943
    Abstract: A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item.
    Type: Application
    Filed: December 30, 2009
    Publication date: June 30, 2011
    Applicant: IBM CORPORATION
    Inventors: Gregory H. Bellows, Brian H. Horton, Joaquin Madruga, Barry L. Minor
  • Patent number: 7962906
    Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.
    Type: Grant
    Filed: March 15, 2007
    Date of Patent: June 14, 2011
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
  • Patent number: 7962799
    Abstract: A system and method provide for test automation of a process running on separated systems. The systems may be separated physically and/or logically separated. The system and method provide that all information required for a test run are made available on one system. In an embodiment, a central component is used to provide all required status and result information regarding the test status of every system in the test landscape. In further embodiment, an extension of the capabilities of existing test tools is made so that the test tool communicates with the central component via an appropriate protocol.
    Type: Grant
    Filed: December 30, 2007
    Date of Patent: June 14, 2011
    Assignee: SAP AG
    Inventors: Michael Lauer, Frank Reisenhofer
  • Publication number: 20110131558
    Abstract: A method comprising: independently compiling a plurality of modules of source code to generate a plurality of respective object modules comprising a plurality of respective parallel threads explicitly designated by a user to be executed in parallel on a target platform; in each of the object modules, inserting at least one symbol indicative of a usage of a resource of the target platform associated with the respective thread; executing a linker to perform a linking process for linking the object modules, wherein the linking process comprises assessing the symbols in conjunction with one another, and based on the assessment generating an indication relating to a usage of the resource required for execution of the threads in parallel.
    Type: Application
    Filed: May 11, 2009
    Publication date: June 2, 2011
    Applicant: Xmos Limited
    Inventors: Martin Young, Richard Osborne, Douglas Watt