For A Parallel Or Multiprocessor System Patents (Class 717/149)

Loop compiling (Class 717/150)

Method and system for automated code conversion

Patent number: 8261252

Abstract: A method and system for converting application code into optimized application code or into execution code suitable for execution on a computation engine with an architecture comprising at least a first and a second level of data memory units are disclosed. In one aspect, the method comprises obtaining application code, the application code comprising data transfer operations between the levels of memory units. The method further comprises converting at least a part of the application code. The converting of application code comprises scheduling of data transfer operations from a first level of memory units to a second level of memory units such that accesses of data accessed multiple times are brought closer together in time than in the original code.

Type: Grant

Filed: March 26, 2008

Date of Patent: September 4, 2012

Assignees: IMEC, Katholieke Universiteit Leuven

Inventors: Praveen Raghavan, Murali Jayapala, Francky Catthoor, Absar Javed, Andy Lambrechts
Distributed schemes for deploying an application in a large parallel system

Patent number: 8261249

Abstract: Embodiments of the invention provide a method for deploying and running an application on a massively parallel computer system, while minimizing the costs associated with latency, bandwidth, and limited memory resources. The executable code of a program may be divided into multiple code fragments and distributed to different compute nodes of a parallel computing system. During program execution, one compute node may fetch code fragments from other compute nodes as necessary.

Type: Grant

Filed: January 8, 2008

Date of Patent: September 4, 2012

Assignee: International Business Machines Corporation

Inventors: Charles Jens Archer, Thomas Michael Gooding, Ruth Janine Poole, Albert Sidelnik
Media for performing parallel processing of distributed arrays

Patent number: 8255890

Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.

Type: Grant

Filed: October 20, 2008

Date of Patent: August 28, 2012

Assignee: The MathWorks, Inc.

Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
Method of using parallel processing constructs and dynamically allocating program portions

Patent number: 8255889

Abstract: A device, for performing parallel processing, includes a processor to receive one or more portions of an inner context of a program created for a technical computing environment, and allocate one or more portions of the inner context of the program to two or more labs for parallel execution. The processor is also configured to receive one or more results associated with the parallel execution of the one or more portions from the two or more labs, and provide the one or more results to an outer context of the program.

Type: Grant

Filed: October 20, 2008

Date of Patent: August 28, 2012

Assignee: The MathWorks, Inc.

Inventors: Halldor N. Stefansson, Brett Baker, Edric Ellis, Joseph F. Hicklin, John N. Little, Jocelyn Luke Martin, Piotr R. Luszczek, Nausheen B. Moulana, Loren Dean, Roy E. Lurie
System and method for selecting and assigning a basic module with a minimum transfer cost to thread

Patent number: 8255911

Abstract: According to one embodiment, parallel processing optimization method for an apparatus configured to assign dynamically a part of some of basic modules, into which a program is divided and which comprise a execution rule which defines a executing order of the basic modules and are executable asynchronously with another modules, to threads includes identifiers based on the execution rule wherein the some of the basic modules are assignable to the threads, and configured to execute in parallel the threads by execution modules, the method includes managing the part of some of the basic modules and the identifiers of the threads assigned the part of some of the basic modules, managing an executable set includes the some of the basic modules, calculating transfer costs of the some of the basic modules when data, and selecting one of the basic module with a minimum transfer cost in the transfer costs.

Type: Grant

Filed: April 27, 2010

Date of Patent: August 28, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventor: Ryuji Sakai
Optimized scalar promotion with load and splat SIMD instructions

Patent number: 8255884

Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

Type: Grant

Filed: June 6, 2008

Date of Patent: August 28, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
Method for controlling heterogeneous multiprocessor and multigrain parallelizing compiler

Patent number: 8250548

Abstract: A heterogeneous multiprocessor system including a plurality of processor elements having mutually different instruction sets and structures avoids a specific processor element from being short of resources to improve throughput. An executable task is extracted based on a preset depending relationship between a plurality of tasks, and the plurality of first processors are allocated to a general-purpose processor group based on a depending relationship among the extracted tasks. A second processor is allocated to an accelerator group, a task to be allocated is determined from the extracted tasks based on a priority value for each of tasks, and an execution cost of executing the determined task by the first processor is compared with an execution cost of executing the task by the second processor. The task is allocated to one of the general-purpose processor group and the accelerator group that is judged to be lower as a result of the cost comparison.

Type: Grant

Filed: January 23, 2007

Date of Patent: August 21, 2012

Assignee: Waseda University

Inventors: Hironori Kasahara, Keiji Kimura, Jun Shirako, Yasutaka Wada, Masaki Ito, Hiroaki Shikano
Parallel processing of distributed arrays and optimum data distribution

Patent number: 8250550

Abstract: A computing device-implemented method includes initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The method also includes transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The method further includes receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.

Type: Grant

Filed: October 20, 2008

Date of Patent: August 21, 2012

Assignee: The MathWorks, Inc.

Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
Variable coherency support when mapping a computer program to a data processing apparatus

Patent number: 8250549

Abstract: A computer implemented tool is provided for assisting in the mapping of a computer program to a data processing apparatus wherein multiple physical instances of a logical variable in the computer program are required. A computer program is provided as the input to the tool which analyses the data flow of the program and identifies multiple physical instance requirement for logical variables. The tool adds mapping support commands, such as instantiation commands, Direct Memory Access (DMA) move commands and the like as necessary to support the mapping of the computer program to a data processing apparatus.

Type: Grant

Filed: October 23, 2007

Date of Patent: August 21, 2012

Assignee: ARM Limited

Inventors: Alastair David Reid, Edmund Grimley-Evans, Simon Andrew Ford
Fast image loading mechanism in cell SPU

Patent number: 8250547

Abstract: The present invention provides a method and system for loading and running program images rapidly in a multi-processor system. The method comprises the steps of: starting in a synergistic processor a synergistic processing program listener, which is configured to listen to a notification from a main processor; calling in the main processor a run-synergistic-processing-program function which is configured to notify the synergistic processing program listener to run a synergistic processing program image which is part of the program image and has been transferred to the local store of the synergistic processor; and the synergistic processing program listener running the synergistic processing program image in response to receiving the notification.

Type: Grant

Filed: August 22, 2008

Date of Patent: August 21, 2012

Assignee: International Business Machines Corporation

Inventors: Wen Jun Wang, Xiu Hua Huang, Jian Chen, Yuan Li
Technique for dynamically restricting thread concurrency without rewriting thread code

Patent number: 8245207

Abstract: A method for executing uniprocessor (UP) coded workloads in a computer capable of concurrent thread execution is disclosed. The method identifies threads in the uniprocessor coded workloads (UP-workloads) which can execute concurrently, and identifies threads in the UP-workloads which cannot execute concurrently. First threads which cannot execute concurrently are assigned to a first concurrency group. Second threads which cannot execute concurrently are assigned to a second concurrency group. Any thread in the first concurrency group can execute concurrently with any thread in the second concurrency group. The computer capable of concurrent thread execution then executes the UP-coded workloads in the first concurrency group at substantially the same time as executing the UP-coded workloads in the second concurrency group.

Type: Grant

Filed: April 18, 2008

Date of Patent: August 14, 2012

Assignee: NetApp, Inc.

Inventors: Robert M. English, Zdenko Kukavica, Konstantinos Roussos
Building call tree branches and utilizing break points

Patent number: 8245212

Abstract: The claimed subject matter provides systems and mechanisms that create frame accurate call trees for threads. The system can include devices that determine the thread on which a break point or event halted execution, identifies a location of the event that halted execution, sets break points at multiple locations during a stopping event where stopping events do not typically involve setting or unsetting set break points, and constructs the frame accurate call tree subset for the thread based at least in part on the break point or event that halted execution and information gleaned from an inspection of a call stack associated with the event that halted execution or the break point.

Type: Grant

Filed: February 22, 2008

Date of Patent: August 14, 2012

Assignee: Microsoft Corporation

Inventor: Steven J. Steiner
Value predictable variable scoping for speculative automatic parallelization with transactional memory

Patent number: 8239843

Abstract: Parallelize a computer program by scoping program variables at compile time and inserting code into the program. Identify as value predictable variables, variables that are: defined only once in a loop of the program; not defined in any inner loop of the loop; and used in the loop. Optionally also: identify a code block in the program that contains a variable assignment, and then traverse a path backwards from the block through a control flow graph of the program. Name in a set all blocks along the path until a loop header block. For each block in the set, determine program blocks that logically succeed the block and are not in the first set. Identify all paths between the block and the determined blocks as failure paths, and insert code into the failure paths. When executed at run time of the program, the inserted code fails the corresponding path.

Type: Grant

Filed: March 11, 2008

Date of Patent: August 7, 2012

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
Device for performing parallel processing of distributed arrays

Patent number: 8239846

Abstract: A device for performing parallel processing includes a processor to initiate a single programming language, and identify, via the single programming language, one or more data distribution schemes for executing a program. The processor also transforms, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocates the parallel program to two or more labs for parallel execution. The processor further receives one or more results associated with the parallel execution of the parallel program from the two or more labs, and provides the one or more results to the program.

Type: Grant

Filed: October 20, 2008

Date of Patent: August 7, 2012

Assignee: The MathWorks, Inc.

Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
Method of using parallel processing constructs and dynamically allocating program portions

Patent number: 8239844

Abstract: A computing device-implemented method includes receiving a program, analyzing and transforming the program, determining an inner context and an outer context of the program based on the analysis of the program, and allocating one or more portions of the inner context of the program to two or more labs for parallel execution. The method also includes receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to the outer context of the program.

Type: Grant

Filed: October 20, 2008

Date of Patent: August 7, 2012

Assignee: The MathWorks, Inc.

Inventors: Halldor N. Stefansson, Brett Baker, Edric Ellis, Joseph F. Hicklin, John N. Little, Jocelyn Luke Martin, Piotr R. Luszczek, Nausheen B. Moulana, Loren Dean, Roy E. Lurie
Reduction of memory latencies using fine grained parallelism and FIFO data structures

Patent number: 8239866

Abstract: Software rendering and fine grained parallelism are utilized to reduce/avoid memory latency in a multi-processor (MP) system. According to one embodiment, the management of the transfer of data from one processor to another in the MP environment is moved into a low overhead hardware system. The low overhead hardware system may be a FIFO (“First In First Out”) hardware control. Each FIFO may be real or virtual.

Type: Grant

Filed: April 24, 2009

Date of Patent: August 7, 2012

Assignee: Microsoft Corporation

Inventor: Susan Carrie
Media for using parallel processing constructs

Patent number: 8239845

Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for receiving one or more portions of an inner context of a program created for a technical computing environment, allocating one or more portions of the inner context of the program to two or more labs for parallel execution, receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to an outer context of the program.

Type: Grant

Filed: October 20, 2008

Date of Patent: August 7, 2012

Assignee: The MathWorks, Inc.

Inventors: Halldor N. Stefansson, Brett Baker, Edric Ellis, Joseph F. Hicklin, John N. Little, Jocelyn Luke Martin, Piotr R. Luszczek, Nausheen B. Moulana, Loren Dean, Roy E. Lurie
General distributed reduction for data parallel computing

Patent number: 8239847

Abstract: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.

Type: Grant

Filed: March 18, 2009

Date of Patent: August 7, 2012

Assignee: Microsoft Corporation

Inventors: Yuan Yu, Pradeep Kumar Gunda, Michael A Isard
Program processing device, parallel processing program, program processing method, parallel processing compiler, recording medium containing the parallel processing compiler, and multi-processor system

Patent number: 8234635

Abstract: In a multi-processor system for performing a parallel processing, each of a plurality of processors includes a communication processing unit for performing control between the processors in a data flow machine-type data-driven control method; and a program processing unit for performing control in each processor in a Neumann-type program-driven control method. The communication processing unit performs a communication between the processors in synchronization with the program processing unit, and has a function of detecting a communication data hazard between the processors. The program processing unit performs a processing based on an execution code stored in a local memory, and has a function of executing or suspending the execution code, according to a result of detecting the data hazard.

Type: Grant

Filed: January 16, 2007

Date of Patent: July 31, 2012

Assignee: Tokyo Institute of Technology

Inventors: Tsuyoshi Isshiki, Hiroaki Kunieda
STATE GROUPING FOR ELEMENT UTILIZATION

Publication number: 20120192166

Abstract: Embodiments of a system and method for generating an image configured to program a parallel machine from source code are disclosed. One such parallel machine includes a plurality of state machine elements (SMEs) grouped into pairs, such that SMEs in a pair have a common output. One such method includes converting source code into an automaton comprising a plurality of interconnected states, and converting the automaton into a netlist comprising instances corresponding to states in the automaton, wherein converting includes pairing states corresponding to pairs of SMEs based on the fact that SMEs in a pair have a common output. The netlist can be converted into the image and published.

Type: Application

Filed: January 24, 2012

Publication date: July 26, 2012

Inventors: Junjuan Xu, Paul Glendenning
UNROLLING QUANTIFICATIONS TO CONTROL IN-DEGREE AND/OR OUT-DEGREE OF AUTOMATON

Publication number: 20120192165

Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.

Type: Application

Filed: January 24, 2012

Publication date: July 26, 2012

Inventors: Junjuan Xu, Paul Glendenning
UTILIZING SPECIAL PURPOSE ELEMENTS TO IMPLEMENT A FSM

Publication number: 20120192164

Abstract: Apparatus, systems, and methods for a compiler are described. One such compiler generates machine code corresponding to a set of elements including a general purpose element and a special purpose element. The compiler identifies a portion in an arrangement of relationally connected operators that corresponds to a special purpose element. The compiler also determines whether the portion meets a condition to be mapped to the special purpose element. The compiler also converts the arrangement into an automaton comprising a plurality of states, wherein the portion is converted using a special purpose state that corresponds to the special purpose element if the portion meets the condition. The compiler also converts the automaton into machine code. Additional apparatus, systems, and methods are disclosed.

Type: Application

Filed: January 24, 2012

Publication date: July 26, 2012

Inventors: Junjuan Xu, Paul Glendenning
Utilizing a bidding model in a microparallel processor architecture to allocate additional registers and execution units for short to intermediate stretches of code identified as opportunities for microparallelization

Patent number: 8230410

Abstract: An enhanced mechanism for parallel execution of computer programs utilizes a bidding model to allocate additional registers and execution units for stretches of code identified as opportunities for microparallelization. A microparallel processor architecture apparatus permits software (e.g. compiler) to implement short-term parallel execution of stretches of code identified as such before execution. In one embodiment, an additional paired unit, if available, is allocated for execution of an identified stretch of code. Each additional paired unit includes an execution unit and a half set of registers. This apparatus is available for compilers or assembler language coders to use and allows software to unlock parallel execution capabilities that are present in existing computer programs but heretofore were executed sequentially for lack of a suitable apparatus.

Type: Grant

Filed: October 26, 2009

Date of Patent: July 24, 2012

Assignee: International Business Machines Corporation

Inventor: Larry W. Loen
Execution of hardware description language (HDL) programs

Patent number: 8230408

Abstract: In one embodiment, a hardware implementation of an electronic system may be realized by compiling the HDL description into an executable form and executing the processor instructions. By applying data flow separation technique, the operations of the system can be effectively mapped into the instruction set of complex processors for efficient logic evaluation, in some implementations. An array of interconnected processors may be deployed, in some embodiments, to exploit the inherent parallelism in a HDL description.

Type: Grant

Filed: June 28, 2005

Date of Patent: July 24, 2012

Assignee: Coherent Logix, Incorporated

Inventor: Tommy Kinming Eng
Client program executable on multiple heterogeneous server platforms

Patent number: 8225300

Abstract: A device receives a program that includes one of a parallel construct or a distributed construct, creates a target component from the program, and integrates the target component into a target environment to produce a client program that is executable on multiple heterogeneous server platforms.

Type: Grant

Filed: July 29, 2008

Date of Patent: July 17, 2012

Assignee: The Mathworks, Inc.

Inventors: Peter Hartwell Webb, Loren Dean, Anthony Paul Astolfi, Jocelyn Luke Martin, Richard John Alcock, James T. Stewart
Data Parallel Function Call for Determining if Called Routine is Data Parallel

Publication number: 20120180031

Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.

Type: Application

Filed: March 26, 2012

Publication date: July 12, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
SYSTEMS AND METHODS FOR DYNAMICALLY CHOOSING A PROCESSING ELEMENT FOR A COMPUTE KERNEL

Publication number: 20120180030

Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.

Type: Application

Filed: January 12, 2012

Publication date: July 12, 2012

Inventors: William Y. Crutchfield, Brian K. Grant, Matthew N. Papakipos
Processor dedicated code handling in a multi-processor environment

Patent number: 8219981

Abstract: Code handling, such as interpreting language instructions or performing “just-in-time” compilation, is performed using a heterogeneous processing environment that shares a common memory. In a heterogeneous processing environment that includes a plurality of processors, one of the processors is programmed to perform a dedicated code-handling task, such as perform just-in-time compilation or interpretation of interpreted language instructions, such as Java. The other processors request code handling processing that is performed by the dedicated processor. Speed is achieved using a shared memory map so that the dedicated processor can quickly retrieve data provided by one of the other processors.

Type: Grant

Filed: July 15, 2008

Date of Patent: July 10, 2012

Assignee: International Business Machines Corporation

Inventors: Maximino Aguilar, Jr., Mark Richard Nutter, James Michael Stafford
System and method for speculative thread assist in a heterogeneous processing environment

Patent number: 8214808

Abstract: A system and method for speculative assistance to a thread in a heterogeneous processing environment is provided. A first set of instructions is identified in a source code representation (e.g., a source code file) that is suitable for speculative execution. The identified set of instructions are analyzed to determine the processing requirements. Based on the analysis, a processor type is identified that will be used to execute the identified first set of instructions based. The processor type is selected from more than one processor types that are included in the heterogeneous processing environment. The heterogeneous processing environment includes more than one heterogeneous processing cores in a single silicon substrate. The various processing cores can utilize different instruction set architectures (ISAs). An object code representation is then generated for the identified first set of instructions with the object code representation being adapted to execute on the determined type of processor.

Type: Grant

Filed: May 7, 2007

Date of Patent: July 3, 2012

Assignee: International Business Machines Corporation

Inventors: Michael Norman Day, Michael Karl Gschwind, John Kevin Patrick O'Brien, Kathryn O'Brien
Sharing compiler optimizations in a multi-node system

Patent number: 8214814

Abstract: Embodiments of the invention enable application programs running across multiple compute nodes of a highly-parallel system to compile source code into native instructions, and subsequently share the optimizations used to compile the source code with other nodes. For example, determining what optimizations to use may consume significant processing power and memory on a node. In cases where multiple nodes exhibit similar characteristics, it is possible that these nodes may use the same set of optimizations when compiling similar pieces of code. Therefore, when one node compiles source code into native instructions, it may share the optimizations used with other similar nodes, thereby removing the burden for the other nodes to figure out which optimizations to use. Thus, while one node may suffer a performance hit for determining the necessary optimizations, other nodes may be saved from this burden by simply using the optimizations provided to them.

Type: Grant

Filed: June 24, 2008

Date of Patent: July 3, 2012

Assignee: International Business Machines Corporation

Inventors: Eric L. Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
Tier splitting support for distributed execution environments

Patent number: 8209674

Abstract: A spectrum of tier-splitting mechanisms facilitates distributed programming. A rich application model and associated tools enable programmers to write rich distributed applications that can run anywhere. A program can be developed simply as a single tier or tier agnostic application. Subsequently or concurrently, the program can be sliced into multiple tiers in different ways to reflect, for instance, capabilities and/or constraints of a server, client and/or network.

Type: Grant

Filed: February 12, 2007

Date of Patent: June 26, 2012

Assignee: Microsoft Corporation

Inventors: Henricus Johannes Maria Meijer, Brian C. Beckman, Christopher W. Brumme, Mark B. Shields, Wei Zhu
NESTED COMMUNICATION OPERATOR

Publication number: 20120151459

Abstract: A high level programming language provides a nested communication operator that partitions a computational space. An indexable type with a rank and element type defines the computational space. The nested communication operator partitions a specified dimension of an index indexable type into segments specified by a segmentation vector and returns an output indexable type that represents the segments. By doing so, the nested communication operator allows data parallel algorithms to operate on the segments as individual units.

Type: Application

Filed: December 9, 2010

Publication date: June 14, 2012

Applicant: MICROSOFT CORPORATION

Inventor: Paul F. Ringseth
Procedural Concurrency Graph Generator

Publication number: 20120151460

Abstract: A parallel-code optimization system includes a Procedural Concurrency Graph (PCG) generator. The PCG generator produces an initial PCG of a computer program including parallel code, and determines a refined PCG from the initial PCG by applying concurrency-type refinements and interference-type refinements to the initial PCG. The initial PCG and the refined PCG include nodes and edges connecting pairs of the nodes. The nodes represent defined procedures in the parallel code, and each edge represents a may-happen-in-parallel relation, and is associated with a set of lvalues that represents the immediate interference between the corresponding pair of nodes.

Type: Application

Filed: December 13, 2010

Publication date: June 14, 2012

Inventors: Pramod G. Joisha, Robert Samuel Schreiber, Prithviraj Banerjee, Hans Boehm, Dhruva Chakrabarti
Adjacent data parallel and streaming operator fusion

Patent number: 8201171

Abstract: Various technologies and techniques are disclosed for handling data parallel operations. Data parallel operations are composed together to create a more complex data parallel operation. A fusion plan process is performed on a particular complex operation dynamically at runtime. As part of the fusion plan process, an analysis is performed of a structure of the complex operation and input data. One particular algorithm that best preserves parallelism is chosen from multiple algorithms. The structure of the complex operation is revised based on the particular algorithm chosen. A nested complex operation can also be fused, by inlining its contents into an outer complex operation so that parallelism is preserved across nested operation boundaries.

Type: Grant

Filed: June 27, 2007

Date of Patent: June 12, 2012

Assignee: Microsoft Corporation

Inventors: John Joseph Duffy, David Callahan
Common debug adaptor in a multiple computer programming language environment

Patent number: 8196109

Abstract: Software developers working on multi-language systems with various debug tools (BPEL, AE, Java, etc.) can use a common debug adaptor (CDA). The CDA implements a method of debugging in a multi-computer program language environment. The method includes registering various debug tools associated with different programming languages in the multi-computer program language environment, each one of the plurality of debug tools providing suspended threads and stack frames in response to a debug event in the multi-computer program language environment. The method can further include receiving the suspended threads and stack frames from the plurality of debug tools. The method can further include correlating the received suspended threads and stack frames under a common suspended thread; and providing the common suspended thread in a debug view. Such a method can have a number of attributes intended to assist developers facing debugging problems in multi-language systems.

Type: Grant

Filed: September 9, 2005

Date of Patent: June 5, 2012

Assignee: International Business Machines Corporation

Inventors: Jane Chi-Yan Fung, Grace Hai Yan Lo, William Gerald O'Farrell, Shu Xia Tan
Object model for transactional memory

Patent number: 8196123

Abstract: Various technologies and techniques are disclosed for providing an object model for transactional memory. The object model for transactional memory allows transactional semantics to be separated from program flow. Memory transaction objects created using the object model can live beyond the instantiating execution scope, which allows additional details about the memory transaction to be provided and controlled. Transactional memory can be supported even from languages that do not directly expose transactional memory constructs. This is made possible by defining the object model in one or more base class libraries and allowing the language that does not support transactional memory directly to use transactional memory through the object model.

Type: Grant

Filed: June 26, 2007

Date of Patent: June 5, 2012

Assignee: Microsoft Corporation

Inventor: Martin Taillefer
Information processing apparatus, parallel processing optimization method, and program

Patent number: 8196146

Abstract: An information processing apparatus includes a plurality of execution units and a scheduler which controls assignment of a plurality of basic modules of a program to the plurality of execution units. The scheduler detects a parallel degree representing a parallelization ratio in parallel processing of a program by the plurality of execution units, and detects a load associated with control of assigning the plurality of basic modules in the parallel processing of the program by the plurality of execution units. And then, the scheduler combines two or more basic modules which are successively executed according to a paralleled execution description in order to assign two or more basic modules as a module to a single execution unit, when a value of the parallel degree exceeds a predetermined value and a value of the load exceeds a predetermined value.

Type: Grant

Filed: September 2, 2008

Date of Patent: June 5, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventor: Ryuji Sakai
Process for handling shared references to private data

Patent number: 8191054

Abstract: Methods and apparatus are provided for a linker to resolve references from shared memory to private memory in a multi-core system.

Type: Grant

Filed: October 20, 2006

Date of Patent: May 29, 2012

Assignee: Analog Devices, Inc.

Inventors: Stephen M. Kilbane, Alexander Raikman
External trace synchronization via periodic sampling

Patent number: 8185879

Abstract: A method for tracing a multi-tasking embedded pipelined processor includes executing compiled code including trace controls. Tracing is initiated when the execution of the compiled code is initiated. Tracing is stopped when execution of the compiled code is completed. A trace record is formed during tracing. The trace record includes a processor mode indication, application space identity value and an instruction architecture set mode indication.

Type: Grant

Filed: November 6, 2006

Date of Patent: May 22, 2012

Assignee: MIPS Technologies, Inc.

Inventors: Radhika Thekkath, Franz Treue, Ernest L. Edgar, Richard T. Leatherman
Memory access assignment for parallel processing architectures

Patent number: 8181168

Abstract: A system comprises a plurality of computation units interconnected by an interconnection network. A method for configuring the system comprises forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; and assigning memory access instructions a given equivalence class to one of the computation units for execution on the assigned computation unit.

Type: Grant

Filed: February 7, 2008

Date of Patent: May 15, 2012

Assignee: Tilera Corporation

Inventors: Walter Lee, Robert A. Gottlieb, Vineet Soni, Anant Agarwal, Richard Schooler
Process for running programs on processors and corresponding processor system

Patent number: 8176478

Abstract: Programs having a given instruction-set architecture are executed on a multiprocessor system comprising a plurality of processors, for example of a VLIW type, each of said processors being able to execute, at each processing cycle, a respective maximum number of instructions. The instructions are compiled as instruction words of given length executable on a first processor. At least some of the instruction words of given length are converted into modified-instruction words executable on a second processor. The operation of modifying comprises in turn at least one operation chosen in the group consisting of: splitting the instruction words into modified-instruction words; and entering no-operation instructions in the modified-instruction words.

Type: Grant

Filed: June 27, 2008

Date of Patent: May 8, 2012

Assignee: STMicroelectronics S.r.l

Inventors: Antonio Maria Borneo, Fabrizio Simone Rovati, Danilo Pietro Pau
Transparent parallelism among linear solvers

Patent number: 8150789

Abstract: A model, which defines a mathematical problem, and multiple directives may be received. Each of the multiple directives may be mapped to a respective linear solver instance. The linear solver instances may be launched to execute in parallel. Each of the linear solver instances may use either a primal or a dual algorithm and may further use double arithmetic, exact arithmetic, or hybrid arithmetic, as specified by corresponding ones of the multiple directives. A linear solver instance that uses hybrid arithmetic may start by using double arithmetic and may use exact arithmetic after experiencing a numerical difficulty. After the numerical difficulty is resolved, the linear solver instance that uses hybrid arithmetic may restart and continue to solve the mathematical problem using double arithmetic. After one of the linear solver instances finds an optimal solution, others of the linear solver instances may be stopped and a report may be provided.

Type: Grant

Filed: December 29, 2008

Date of Patent: April 3, 2012

Assignee: Microsoft Corporation

Inventors: Min Wei, Alexander Sasha Stojanovic, David Lao
Systems and methods for general aggregation of characteristics and key figures

Patent number: 8150749

Abstract: Computer-implemented methods, computer systems, and computer programs product are provided for automated generic and parallel aggregation of characteristics and key figures of unsorted mass data being of specific economic interest, particularly associated with financial institutions, and with financial affairs in banking practice. The parallel aggregation may reduce the amount of data for a customer defined granularity for the purpose of facilitating the handling of raw data related to all areas of credit risk management in banking practice. Moreover, the computing power of software and the software performance run time, respectively, may be improved in the case of mass data.

Type: Grant

Filed: August 18, 2009

Date of Patent: April 3, 2012

Assignee: SAP AG

Inventors: Markus Kahn, Marcus Baumann
Using police threads to detect dependence violations to reduce speculative parallelization overhead

Patent number: 8151255

Abstract: A method for detecting a dependence violation in an application that involves executing a plurality of sections of the application in parallel, and logging memory transactions that occur while executing the plurality of sections to obtain a plurality of logs and a plurality of temporary results, where the plurality of logs is compared while executing the plurality of sections to determine whether the dependence violation exists.

Type: Grant

Filed: June 26, 2006

Date of Patent: April 3, 2012

Assignee: Oracle America, Inc.

Inventors: Phyllis E. Gustafson, Miguel Angel Lujan Moreno, Michael H. Paleczny, Christopher A. Vick, Olaf Manczak, Jay R. Freeman
Systems And Methods For Compiler-Based Full-Function Vectorization

Publication number: 20120079466

Abstract: Systems and methods for the vectorization of software applications are described. In some embodiments, a compiler may automatically generate both scalar and vector versions of a function from a single source code description. A vector interface may be exposed in a persistent dependency database that is associated with the function. This may allow a compiler to make vector function calls from within vectorized loops, rather than making multiple serialized scalar function calls from within a vectorized loop. This may in turn facilitate the vectorization of hierarchical code, which may improve application performance when vector execution resources are available.

Type: Application

Filed: September 23, 2010

Publication date: March 29, 2012

Inventor: Jeffry E. Gonion
Systems and methods for caching compute kernels for an application running on a parallel-processing computer system

Patent number: 8146066

Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.

Type: Grant

Filed: March 5, 2007

Date of Patent: March 27, 2012

Assignee: Google Inc.

Inventors: Christopher G. Demetriou, Matthew N. Papakipos
Scheduling of pipelined loop operations

Patent number: 8140883

Abstract: Pipelined loop operations are efficiently scheduled. A preliminary as soon as possible (ASAP) schedule for a data operation in a pipelined loop is determined. A producer operation clock cycle associated with a producer operation in the pipelined loop is determined. The producer operation provides a data value for use by the data operation in a subsequent loop. A consumer operation clock cycle associated with a consumer operation in the pipelined loop is determined. The consumer operation obtains the data value from the data operation in a previous loop. The data operation is scheduled at the half-way point between the producer operation clock cycle and the consumer operation clock cycle.

Type: Grant

Filed: May 1, 2008

Date of Patent: March 20, 2012

Assignee: Altera Corporation

Inventor: David James Lau
C/C++ LANGUAGE EXTENSIONS FOR GENERAL-PURPOSE GRAPHICS PROCESSING UNIT

Publication number: 20120066668

Abstract: A general-purpose programming environment allows users to program a GPU as a general-purpose computation engine using familiar C/C++ programming constructs. Users may use declaration specifiers to identify which portions of a program are to be compiled for a CPU or a GPU. Specifically, functions, objects and variables may be specified for GPU binary compilation using declaration specifiers. A compiler separates the GPU binary code and the CPU binary code in a source file using the declaration specifiers. The location of objects and variables in different memory locations in the system may be identified using the declaration specifiers. CTA threading information is also provided for the GPU to support parallel processing.

Type: Application

Filed: July 11, 2011

Publication date: March 15, 2012

Applicant: NVIDIA Corporation

Inventors: Ian Buck, Bastiaan Aarts
Systems and methods for compiling an application for a parallel-processing computer system

Patent number: 8136102

Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.

Type: Grant

Filed: March 5, 2007

Date of Patent: March 13, 2012

Assignee: Google Inc.

Inventors: Matthew N. Papakipos, Brian K. Grant, Christopher G. Demetriou, Morgan S. McGuire
Thread debugging device, thread debugging method and information storage medium

Patent number: 8136097

Abstract: A thread debugging device which can provide reliable debugging is provided when at least one thread is debugged among a plurality of threads which are executed in association with each other. According to the thread debugging device, a target computer (20) executes at least some processing of at least one target thread to be debugged among the plurality of threads, and further executes non-target threads, which are threads other than the at least one target thread among the plurality of threads, during execution of the at least one target thread while restricting access by the non-target threads to at least some hardware resources of the computer (20).

Type: Grant

Filed: October 25, 2006

Date of Patent: March 13, 2012

Assignee: Sony Computer Entertainment Inc.

Inventors: Yousuke Konishi, Shinichiro Mikami, Makoto Ishii, Yasuyuki Kinoshita, Atsuhiko Fujimoto, Masayuki Takahashi

prev … 4 5 6 7 8 9 10 11 12 … next