For A Parallel Or Multiprocessor System Patents (Class 717/149)
  • Patent number: 8261252
    Abstract: A method and system for converting application code into optimized application code or into execution code suitable for execution on a computation engine with an architecture comprising at least a first and a second level of data memory units are disclosed. In one aspect, the method comprises obtaining application code, the application code comprising data transfer operations between the levels of memory units. The method further comprises converting at least a part of the application code. The converting of application code comprises scheduling of data transfer operations from a first level of memory units to a second level of memory units such that accesses of data accessed multiple times are brought closer together in time than in the original code.
    Type: Grant
    Filed: March 26, 2008
    Date of Patent: September 4, 2012
    Assignees: IMEC, Katholieke Universiteit Leuven
    Inventors: Praveen Raghavan, Murali Jayapala, Francky Catthoor, Absar Javed, Andy Lambrechts
  • Patent number: 8261249
    Abstract: Embodiments of the invention provide a method for deploying and running an application on a massively parallel computer system, while minimizing the costs associated with latency, bandwidth, and limited memory resources. The executable code of a program may be divided into multiple code fragments and distributed to different compute nodes of a parallel computing system. During program execution, one compute node may fetch code fragments from other compute nodes as necessary.
    Type: Grant
    Filed: January 8, 2008
    Date of Patent: September 4, 2012
    Assignee: International Business Machines Corporation
    Inventors: Charles Jens Archer, Thomas Michael Gooding, Ruth Janine Poole, Albert Sidelnik
  • Patent number: 8255890
    Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.
    Type: Grant
    Filed: October 20, 2008
    Date of Patent: August 28, 2012
    Assignee: The MathWorks, Inc.
    Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
  • Patent number: 8255889
    Abstract: A device, for performing parallel processing, includes a processor to receive one or more portions of an inner context of a program created for a technical computing environment, and allocate one or more portions of the inner context of the program to two or more labs for parallel execution. The processor is also configured to receive one or more results associated with the parallel execution of the one or more portions from the two or more labs, and provide the one or more results to an outer context of the program.
    Type: Grant
    Filed: October 20, 2008
    Date of Patent: August 28, 2012
    Assignee: The MathWorks, Inc.
    Inventors: Halldor N. Stefansson, Brett Baker, Edric Ellis, Joseph F. Hicklin, John N. Little, Jocelyn Luke Martin, Piotr R. Luszczek, Nausheen B. Moulana, Loren Dean, Roy E. Lurie
  • Patent number: 8255911
    Abstract: According to one embodiment, parallel processing optimization method for an apparatus configured to assign dynamically a part of some of basic modules, into which a program is divided and which comprise a execution rule which defines a executing order of the basic modules and are executable asynchronously with another modules, to threads includes identifiers based on the execution rule wherein the some of the basic modules are assignable to the threads, and configured to execute in parallel the threads by execution modules, the method includes managing the part of some of the basic modules and the identifiers of the threads assigned the part of some of the basic modules, managing an executable set includes the some of the basic modules, calculating transfer costs of the some of the basic modules when data, and selecting one of the basic module with a minimum transfer cost in the transfer costs.
    Type: Grant
    Filed: April 27, 2010
    Date of Patent: August 28, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Ryuji Sakai
  • Patent number: 8255884
    Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
    Type: Grant
    Filed: June 6, 2008
    Date of Patent: August 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
  • Patent number: 8250548
    Abstract: A heterogeneous multiprocessor system including a plurality of processor elements having mutually different instruction sets and structures avoids a specific processor element from being short of resources to improve throughput. An executable task is extracted based on a preset depending relationship between a plurality of tasks, and the plurality of first processors are allocated to a general-purpose processor group based on a depending relationship among the extracted tasks. A second processor is allocated to an accelerator group, a task to be allocated is determined from the extracted tasks based on a priority value for each of tasks, and an execution cost of executing the determined task by the first processor is compared with an execution cost of executing the task by the second processor. The task is allocated to one of the general-purpose processor group and the accelerator group that is judged to be lower as a result of the cost comparison.
    Type: Grant
    Filed: January 23, 2007
    Date of Patent: August 21, 2012
    Assignee: Waseda University
    Inventors: Hironori Kasahara, Keiji Kimura, Jun Shirako, Yasutaka Wada, Masaki Ito, Hiroaki Shikano
  • Patent number: 8250550
    Abstract: A computing device-implemented method includes initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The method also includes transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The method further includes receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.
    Type: Grant
    Filed: October 20, 2008
    Date of Patent: August 21, 2012
    Assignee: The MathWorks, Inc.
    Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
  • Patent number: 8250549
    Abstract: A computer implemented tool is provided for assisting in the mapping of a computer program to a data processing apparatus wherein multiple physical instances of a logical variable in the computer program are required. A computer program is provided as the input to the tool which analyses the data flow of the program and identifies multiple physical instance requirement for logical variables. The tool adds mapping support commands, such as instantiation commands, Direct Memory Access (DMA) move commands and the like as necessary to support the mapping of the computer program to a data processing apparatus.
    Type: Grant
    Filed: October 23, 2007
    Date of Patent: August 21, 2012
    Assignee: ARM Limited
    Inventors: Alastair David Reid, Edmund Grimley-Evans, Simon Andrew Ford
  • Patent number: 8250547
    Abstract: The present invention provides a method and system for loading and running program images rapidly in a multi-processor system. The method comprises the steps of: starting in a synergistic processor a synergistic processing program listener, which is configured to listen to a notification from a main processor; calling in the main processor a run-synergistic-processing-program function which is configured to notify the synergistic processing program listener to run a synergistic processing program image which is part of the program image and has been transferred to the local store of the synergistic processor; and the synergistic processing program listener running the synergistic processing program image in response to receiving the notification.
    Type: Grant
    Filed: August 22, 2008
    Date of Patent: August 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: Wen Jun Wang, Xiu Hua Huang, Jian Chen, Yuan Li
  • Patent number: 8245207
    Abstract: A method for executing uniprocessor (UP) coded workloads in a computer capable of concurrent thread execution is disclosed. The method identifies threads in the uniprocessor coded workloads (UP-workloads) which can execute concurrently, and identifies threads in the UP-workloads which cannot execute concurrently. First threads which cannot execute concurrently are assigned to a first concurrency group. Second threads which cannot execute concurrently are assigned to a second concurrency group. Any thread in the first concurrency group can execute concurrently with any thread in the second concurrency group. The computer capable of concurrent thread execution then executes the UP-coded workloads in the first concurrency group at substantially the same time as executing the UP-coded workloads in the second concurrency group.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: August 14, 2012
    Assignee: NetApp, Inc.
    Inventors: Robert M. English, Zdenko Kukavica, Konstantinos Roussos
  • Patent number: 8245212
    Abstract: The claimed subject matter provides systems and mechanisms that create frame accurate call trees for threads. The system can include devices that determine the thread on which a break point or event halted execution, identifies a location of the event that halted execution, sets break points at multiple locations during a stopping event where stopping events do not typically involve setting or unsetting set break points, and constructs the frame accurate call tree subset for the thread based at least in part on the break point or event that halted execution and information gleaned from an inspection of a call stack associated with the event that halted execution or the break point.
    Type: Grant
    Filed: February 22, 2008
    Date of Patent: August 14, 2012
    Assignee: Microsoft Corporation
    Inventor: Steven J. Steiner
  • Patent number: 8239843
    Abstract: Parallelize a computer program by scoping program variables at compile time and inserting code into the program. Identify as value predictable variables, variables that are: defined only once in a loop of the program; not defined in any inner loop of the loop; and used in the loop. Optionally also: identify a code block in the program that contains a variable assignment, and then traverse a path backwards from the block through a control flow graph of the program. Name in a set all blocks along the path until a loop header block. For each block in the set, determine program blocks that logically succeed the block and are not in the first set. Identify all paths between the block and the determined blocks as failure paths, and insert code into the failure paths. When executed at run time of the program, the inserted code fails the corresponding path.
    Type: Grant
    Filed: March 11, 2008
    Date of Patent: August 7, 2012
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8239846
    Abstract: A device for performing parallel processing includes a processor to initiate a single programming language, and identify, via the single programming language, one or more data distribution schemes for executing a program. The processor also transforms, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocates the parallel program to two or more labs for parallel execution. The processor further receives one or more results associated with the parallel execution of the parallel program from the two or more labs, and provides the one or more results to the program.
    Type: Grant
    Filed: October 20, 2008
    Date of Patent: August 7, 2012
    Assignee: The MathWorks, Inc.
    Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
  • Patent number: 8239844
    Abstract: A computing device-implemented method includes receiving a program, analyzing and transforming the program, determining an inner context and an outer context of the program based on the analysis of the program, and allocating one or more portions of the inner context of the program to two or more labs for parallel execution. The method also includes receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to the outer context of the program.
    Type: Grant
    Filed: October 20, 2008
    Date of Patent: August 7, 2012
    Assignee: The MathWorks, Inc.
    Inventors: Halldor N. Stefansson, Brett Baker, Edric Ellis, Joseph F. Hicklin, John N. Little, Jocelyn Luke Martin, Piotr R. Luszczek, Nausheen B. Moulana, Loren Dean, Roy E. Lurie
  • Patent number: 8239866
    Abstract: Software rendering and fine grained parallelism are utilized to reduce/avoid memory latency in a multi-processor (MP) system. According to one embodiment, the management of the transfer of data from one processor to another in the MP environment is moved into a low overhead hardware system. The low overhead hardware system may be a FIFO (“First In First Out”) hardware control. Each FIFO may be real or virtual.
    Type: Grant
    Filed: April 24, 2009
    Date of Patent: August 7, 2012
    Assignee: Microsoft Corporation
    Inventor: Susan Carrie
  • Patent number: 8239845
    Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for receiving one or more portions of an inner context of a program created for a technical computing environment, allocating one or more portions of the inner context of the program to two or more labs for parallel execution, receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to an outer context of the program.
    Type: Grant
    Filed: October 20, 2008
    Date of Patent: August 7, 2012
    Assignee: The MathWorks, Inc.
    Inventors: Halldor N. Stefansson, Brett Baker, Edric Ellis, Joseph F. Hicklin, John N. Little, Jocelyn Luke Martin, Piotr R. Luszczek, Nausheen B. Moulana, Loren Dean, Roy E. Lurie
  • Patent number: 8239847
    Abstract: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.
    Type: Grant
    Filed: March 18, 2009
    Date of Patent: August 7, 2012
    Assignee: Microsoft Corporation
    Inventors: Yuan Yu, Pradeep Kumar Gunda, Michael A Isard
  • Patent number: 8234635
    Abstract: In a multi-processor system for performing a parallel processing, each of a plurality of processors includes a communication processing unit for performing control between the processors in a data flow machine-type data-driven control method; and a program processing unit for performing control in each processor in a Neumann-type program-driven control method. The communication processing unit performs a communication between the processors in synchronization with the program processing unit, and has a function of detecting a communication data hazard between the processors. The program processing unit performs a processing based on an execution code stored in a local memory, and has a function of executing or suspending the execution code, according to a result of detecting the data hazard.
    Type: Grant
    Filed: January 16, 2007
    Date of Patent: July 31, 2012
    Assignee: Tokyo Institute of Technology
    Inventors: Tsuyoshi Isshiki, Hiroaki Kunieda
  • Publication number: 20120192166
    Abstract: Embodiments of a system and method for generating an image configured to program a parallel machine from source code are disclosed. One such parallel machine includes a plurality of state machine elements (SMEs) grouped into pairs, such that SMEs in a pair have a common output. One such method includes converting source code into an automaton comprising a plurality of interconnected states, and converting the automaton into a netlist comprising instances corresponding to states in the automaton, wherein converting includes pairing states corresponding to pairs of SMEs based on the fact that SMEs in a pair have a common output. The netlist can be converted into the image and published.
    Type: Application
    Filed: January 24, 2012
    Publication date: July 26, 2012
    Inventors: Junjuan Xu, Paul Glendenning
  • Publication number: 20120192165
    Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.
    Type: Application
    Filed: January 24, 2012
    Publication date: July 26, 2012
    Inventors: Junjuan Xu, Paul Glendenning
  • Publication number: 20120192164
    Abstract: Apparatus, systems, and methods for a compiler are described. One such compiler generates machine code corresponding to a set of elements including a general purpose element and a special purpose element. The compiler identifies a portion in an arrangement of relationally connected operators that corresponds to a special purpose element. The compiler also determines whether the portion meets a condition to be mapped to the special purpose element. The compiler also converts the arrangement into an automaton comprising a plurality of states, wherein the portion is converted using a special purpose state that corresponds to the special purpose element if the portion meets the condition. The compiler also converts the automaton into machine code. Additional apparatus, systems, and methods are disclosed.
    Type: Application
    Filed: January 24, 2012
    Publication date: July 26, 2012
    Inventors: Junjuan Xu, Paul Glendenning
  • Patent number: 8230410
    Abstract: An enhanced mechanism for parallel execution of computer programs utilizes a bidding model to allocate additional registers and execution units for stretches of code identified as opportunities for microparallelization. A microparallel processor architecture apparatus permits software (e.g. compiler) to implement short-term parallel execution of stretches of code identified as such before execution. In one embodiment, an additional paired unit, if available, is allocated for execution of an identified stretch of code. Each additional paired unit includes an execution unit and a half set of registers. This apparatus is available for compilers or assembler language coders to use and allows software to unlock parallel execution capabilities that are present in existing computer programs but heretofore were executed sequentially for lack of a suitable apparatus.
    Type: Grant
    Filed: October 26, 2009
    Date of Patent: July 24, 2012
    Assignee: International Business Machines Corporation
    Inventor: Larry W. Loen
  • Patent number: 8230408
    Abstract: In one embodiment, a hardware implementation of an electronic system may be realized by compiling the HDL description into an executable form and executing the processor instructions. By applying data flow separation technique, the operations of the system can be effectively mapped into the instruction set of complex processors for efficient logic evaluation, in some implementations. An array of interconnected processors may be deployed, in some embodiments, to exploit the inherent parallelism in a HDL description.
    Type: Grant
    Filed: June 28, 2005
    Date of Patent: July 24, 2012
    Assignee: Coherent Logix, Incorporated
    Inventor: Tommy Kinming Eng
  • Patent number: 8225300
    Abstract: A device receives a program that includes one of a parallel construct or a distributed construct, creates a target component from the program, and integrates the target component into a target environment to produce a client program that is executable on multiple heterogeneous server platforms.
    Type: Grant
    Filed: July 29, 2008
    Date of Patent: July 17, 2012
    Assignee: The Mathworks, Inc.
    Inventors: Peter Hartwell Webb, Loren Dean, Anthony Paul Astolfi, Jocelyn Luke Martin, Richard John Alcock, James T. Stewart
  • Publication number: 20120180031
    Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.
    Type: Application
    Filed: March 26, 2012
    Publication date: July 12, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
  • Publication number: 20120180030
    Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
    Type: Application
    Filed: January 12, 2012
    Publication date: July 12, 2012
    Inventors: William Y. Crutchfield, Brian K. Grant, Matthew N. Papakipos
  • Patent number: 8219981
    Abstract: Code handling, such as interpreting language instructions or performing “just-in-time” compilation, is performed using a heterogeneous processing environment that shares a common memory. In a heterogeneous processing environment that includes a plurality of processors, one of the processors is programmed to perform a dedicated code-handling task, such as perform just-in-time compilation or interpretation of interpreted language instructions, such as Java. The other processors request code handling processing that is performed by the dedicated processor. Speed is achieved using a shared memory map so that the dedicated processor can quickly retrieve data provided by one of the other processors.
    Type: Grant
    Filed: July 15, 2008
    Date of Patent: July 10, 2012
    Assignee: International Business Machines Corporation
    Inventors: Maximino Aguilar, Jr., Mark Richard Nutter, James Michael Stafford
  • Patent number: 8214808
    Abstract: A system and method for speculative assistance to a thread in a heterogeneous processing environment is provided. A first set of instructions is identified in a source code representation (e.g., a source code file) that is suitable for speculative execution. The identified set of instructions are analyzed to determine the processing requirements. Based on the analysis, a processor type is identified that will be used to execute the identified first set of instructions based. The processor type is selected from more than one processor types that are included in the heterogeneous processing environment. The heterogeneous processing environment includes more than one heterogeneous processing cores in a single silicon substrate. The various processing cores can utilize different instruction set architectures (ISAs). An object code representation is then generated for the identified first set of instructions with the object code representation being adapted to execute on the determined type of processor.
    Type: Grant
    Filed: May 7, 2007
    Date of Patent: July 3, 2012
    Assignee: International Business Machines Corporation
    Inventors: Michael Norman Day, Michael Karl Gschwind, John Kevin Patrick O'Brien, Kathryn O'Brien
  • Patent number: 8214814
    Abstract: Embodiments of the invention enable application programs running across multiple compute nodes of a highly-parallel system to compile source code into native instructions, and subsequently share the optimizations used to compile the source code with other nodes. For example, determining what optimizations to use may consume significant processing power and memory on a node. In cases where multiple nodes exhibit similar characteristics, it is possible that these nodes may use the same set of optimizations when compiling similar pieces of code. Therefore, when one node compiles source code into native instructions, it may share the optimizations used with other similar nodes, thereby removing the burden for the other nodes to figure out which optimizations to use. Thus, while one node may suffer a performance hit for determining the necessary optimizations, other nodes may be saved from this burden by simply using the optimizations provided to them.
    Type: Grant
    Filed: June 24, 2008
    Date of Patent: July 3, 2012
    Assignee: International Business Machines Corporation
    Inventors: Eric L. Barsness, David L. Darrington, Amanda Peters, John Matthew Santosuosso
  • Patent number: 8209674
    Abstract: A spectrum of tier-splitting mechanisms facilitates distributed programming. A rich application model and associated tools enable programmers to write rich distributed applications that can run anywhere. A program can be developed simply as a single tier or tier agnostic application. Subsequently or concurrently, the program can be sliced into multiple tiers in different ways to reflect, for instance, capabilities and/or constraints of a server, client and/or network.
    Type: Grant
    Filed: February 12, 2007
    Date of Patent: June 26, 2012
    Assignee: Microsoft Corporation
    Inventors: Henricus Johannes Maria Meijer, Brian C. Beckman, Christopher W. Brumme, Mark B. Shields, Wei Zhu
  • Publication number: 20120151459
    Abstract: A high level programming language provides a nested communication operator that partitions a computational space. An indexable type with a rank and element type defines the computational space. The nested communication operator partitions a specified dimension of an index indexable type into segments specified by a segmentation vector and returns an output indexable type that represents the segments. By doing so, the nested communication operator allows data parallel algorithms to operate on the segments as individual units.
    Type: Application
    Filed: December 9, 2010
    Publication date: June 14, 2012
    Applicant: MICROSOFT CORPORATION
    Inventor: Paul F. Ringseth
  • Publication number: 20120151460
    Abstract: A parallel-code optimization system includes a Procedural Concurrency Graph (PCG) generator. The PCG generator produces an initial PCG of a computer program including parallel code, and determines a refined PCG from the initial PCG by applying concurrency-type refinements and interference-type refinements to the initial PCG. The initial PCG and the refined PCG include nodes and edges connecting pairs of the nodes. The nodes represent defined procedures in the parallel code, and each edge represents a may-happen-in-parallel relation, and is associated with a set of lvalues that represents the immediate interference between the corresponding pair of nodes.
    Type: Application
    Filed: December 13, 2010
    Publication date: June 14, 2012
    Inventors: Pramod G. Joisha, Robert Samuel Schreiber, Prithviraj Banerjee, Hans Boehm, Dhruva Chakrabarti
  • Patent number: 8201171
    Abstract: Various technologies and techniques are disclosed for handling data parallel operations. Data parallel operations are composed together to create a more complex data parallel operation. A fusion plan process is performed on a particular complex operation dynamically at runtime. As part of the fusion plan process, an analysis is performed of a structure of the complex operation and input data. One particular algorithm that best preserves parallelism is chosen from multiple algorithms. The structure of the complex operation is revised based on the particular algorithm chosen. A nested complex operation can also be fused, by inlining its contents into an outer complex operation so that parallelism is preserved across nested operation boundaries.
    Type: Grant
    Filed: June 27, 2007
    Date of Patent: June 12, 2012
    Assignee: Microsoft Corporation
    Inventors: John Joseph Duffy, David Callahan
  • Patent number: 8196109
    Abstract: Software developers working on multi-language systems with various debug tools (BPEL, AE, Java, etc.) can use a common debug adaptor (CDA). The CDA implements a method of debugging in a multi-computer program language environment. The method includes registering various debug tools associated with different programming languages in the multi-computer program language environment, each one of the plurality of debug tools providing suspended threads and stack frames in response to a debug event in the multi-computer program language environment. The method can further include receiving the suspended threads and stack frames from the plurality of debug tools. The method can further include correlating the received suspended threads and stack frames under a common suspended thread; and providing the common suspended thread in a debug view. Such a method can have a number of attributes intended to assist developers facing debugging problems in multi-language systems.
    Type: Grant
    Filed: September 9, 2005
    Date of Patent: June 5, 2012
    Assignee: International Business Machines Corporation
    Inventors: Jane Chi-Yan Fung, Grace Hai Yan Lo, William Gerald O'Farrell, Shu Xia Tan
  • Patent number: 8196123
    Abstract: Various technologies and techniques are disclosed for providing an object model for transactional memory. The object model for transactional memory allows transactional semantics to be separated from program flow. Memory transaction objects created using the object model can live beyond the instantiating execution scope, which allows additional details about the memory transaction to be provided and controlled. Transactional memory can be supported even from languages that do not directly expose transactional memory constructs. This is made possible by defining the object model in one or more base class libraries and allowing the language that does not support transactional memory directly to use transactional memory through the object model.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: June 5, 2012
    Assignee: Microsoft Corporation
    Inventor: Martin Taillefer
  • Patent number: 8196146
    Abstract: An information processing apparatus includes a plurality of execution units and a scheduler which controls assignment of a plurality of basic modules of a program to the plurality of execution units. The scheduler detects a parallel degree representing a parallelization ratio in parallel processing of a program by the plurality of execution units, and detects a load associated with control of assigning the plurality of basic modules in the parallel processing of the program by the plurality of execution units. And then, the scheduler combines two or more basic modules which are successively executed according to a paralleled execution description in order to assign two or more basic modules as a module to a single execution unit, when a value of the parallel degree exceeds a predetermined value and a value of the load exceeds a predetermined value.
    Type: Grant
    Filed: September 2, 2008
    Date of Patent: June 5, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Ryuji Sakai
  • Patent number: 8191054
    Abstract: Methods and apparatus are provided for a linker to resolve references from shared memory to private memory in a multi-core system.
    Type: Grant
    Filed: October 20, 2006
    Date of Patent: May 29, 2012
    Assignee: Analog Devices, Inc.
    Inventors: Stephen M. Kilbane, Alexander Raikman
  • Patent number: 8185879
    Abstract: A method for tracing a multi-tasking embedded pipelined processor includes executing compiled code including trace controls. Tracing is initiated when the execution of the compiled code is initiated. Tracing is stopped when execution of the compiled code is completed. A trace record is formed during tracing. The trace record includes a processor mode indication, application space identity value and an instruction architecture set mode indication.
    Type: Grant
    Filed: November 6, 2006
    Date of Patent: May 22, 2012
    Assignee: MIPS Technologies, Inc.
    Inventors: Radhika Thekkath, Franz Treue, Ernest L. Edgar, Richard T. Leatherman
  • Patent number: 8181168
    Abstract: A system comprises a plurality of computation units interconnected by an interconnection network. A method for configuring the system comprises forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; and assigning memory access instructions a given equivalence class to one of the computation units for execution on the assigned computation unit.
    Type: Grant
    Filed: February 7, 2008
    Date of Patent: May 15, 2012
    Assignee: Tilera Corporation
    Inventors: Walter Lee, Robert A. Gottlieb, Vineet Soni, Anant Agarwal, Richard Schooler
  • Patent number: 8176478
    Abstract: Programs having a given instruction-set architecture are executed on a multiprocessor system comprising a plurality of processors, for example of a VLIW type, each of said processors being able to execute, at each processing cycle, a respective maximum number of instructions. The instructions are compiled as instruction words of given length executable on a first processor. At least some of the instruction words of given length are converted into modified-instruction words executable on a second processor. The operation of modifying comprises in turn at least one operation chosen in the group consisting of: splitting the instruction words into modified-instruction words; and entering no-operation instructions in the modified-instruction words.
    Type: Grant
    Filed: June 27, 2008
    Date of Patent: May 8, 2012
    Assignee: STMicroelectronics S.r.l
    Inventors: Antonio Maria Borneo, Fabrizio Simone Rovati, Danilo Pietro Pau
  • Patent number: 8150789
    Abstract: A model, which defines a mathematical problem, and multiple directives may be received. Each of the multiple directives may be mapped to a respective linear solver instance. The linear solver instances may be launched to execute in parallel. Each of the linear solver instances may use either a primal or a dual algorithm and may further use double arithmetic, exact arithmetic, or hybrid arithmetic, as specified by corresponding ones of the multiple directives. A linear solver instance that uses hybrid arithmetic may start by using double arithmetic and may use exact arithmetic after experiencing a numerical difficulty. After the numerical difficulty is resolved, the linear solver instance that uses hybrid arithmetic may restart and continue to solve the mathematical problem using double arithmetic. After one of the linear solver instances finds an optimal solution, others of the linear solver instances may be stopped and a report may be provided.
    Type: Grant
    Filed: December 29, 2008
    Date of Patent: April 3, 2012
    Assignee: Microsoft Corporation
    Inventors: Min Wei, Alexander Sasha Stojanovic, David Lao
  • Patent number: 8150749
    Abstract: Computer-implemented methods, computer systems, and computer programs product are provided for automated generic and parallel aggregation of characteristics and key figures of unsorted mass data being of specific economic interest, particularly associated with financial institutions, and with financial affairs in banking practice. The parallel aggregation may reduce the amount of data for a customer defined granularity for the purpose of facilitating the handling of raw data related to all areas of credit risk management in banking practice. Moreover, the computing power of software and the software performance run time, respectively, may be improved in the case of mass data.
    Type: Grant
    Filed: August 18, 2009
    Date of Patent: April 3, 2012
    Assignee: SAP AG
    Inventors: Markus Kahn, Marcus Baumann
  • Patent number: 8151255
    Abstract: A method for detecting a dependence violation in an application that involves executing a plurality of sections of the application in parallel, and logging memory transactions that occur while executing the plurality of sections to obtain a plurality of logs and a plurality of temporary results, where the plurality of logs is compared while executing the plurality of sections to determine whether the dependence violation exists.
    Type: Grant
    Filed: June 26, 2006
    Date of Patent: April 3, 2012
    Assignee: Oracle America, Inc.
    Inventors: Phyllis E. Gustafson, Miguel Angel Lujan Moreno, Michael H. Paleczny, Christopher A. Vick, Olaf Manczak, Jay R. Freeman
  • Publication number: 20120079466
    Abstract: Systems and methods for the vectorization of software applications are described. In some embodiments, a compiler may automatically generate both scalar and vector versions of a function from a single source code description. A vector interface may be exposed in a persistent dependency database that is associated with the function. This may allow a compiler to make vector function calls from within vectorized loops, rather than making multiple serialized scalar function calls from within a vectorized loop. This may in turn facilitate the vectorization of hierarchical code, which may improve application performance when vector execution resources are available.
    Type: Application
    Filed: September 23, 2010
    Publication date: March 29, 2012
    Inventor: Jeffry E. Gonion
  • Patent number: 8146066
    Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
    Type: Grant
    Filed: March 5, 2007
    Date of Patent: March 27, 2012
    Assignee: Google Inc.
    Inventors: Christopher G. Demetriou, Matthew N. Papakipos
  • Patent number: 8140883
    Abstract: Pipelined loop operations are efficiently scheduled. A preliminary as soon as possible (ASAP) schedule for a data operation in a pipelined loop is determined. A producer operation clock cycle associated with a producer operation in the pipelined loop is determined. The producer operation provides a data value for use by the data operation in a subsequent loop. A consumer operation clock cycle associated with a consumer operation in the pipelined loop is determined. The consumer operation obtains the data value from the data operation in a previous loop. The data operation is scheduled at the half-way point between the producer operation clock cycle and the consumer operation clock cycle.
    Type: Grant
    Filed: May 1, 2008
    Date of Patent: March 20, 2012
    Assignee: Altera Corporation
    Inventor: David James Lau
  • Publication number: 20120066668
    Abstract: A general-purpose programming environment allows users to program a GPU as a general-purpose computation engine using familiar C/C++ programming constructs. Users may use declaration specifiers to identify which portions of a program are to be compiled for a CPU or a GPU. Specifically, functions, objects and variables may be specified for GPU binary compilation using declaration specifiers. A compiler separates the GPU binary code and the CPU binary code in a source file using the declaration specifiers. The location of objects and variables in different memory locations in the system may be identified using the declaration specifiers. CTA threading information is also provided for the GPU to support parallel processing.
    Type: Application
    Filed: July 11, 2011
    Publication date: March 15, 2012
    Applicant: NVIDIA Corporation
    Inventors: Ian Buck, Bastiaan Aarts
  • Patent number: 8136102
    Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
    Type: Grant
    Filed: March 5, 2007
    Date of Patent: March 13, 2012
    Assignee: Google Inc.
    Inventors: Matthew N. Papakipos, Brian K. Grant, Christopher G. Demetriou, Morgan S. McGuire
  • Patent number: 8136097
    Abstract: A thread debugging device which can provide reliable debugging is provided when at least one thread is debugged among a plurality of threads which are executed in association with each other. According to the thread debugging device, a target computer (20) executes at least some processing of at least one target thread to be debugged among the plurality of threads, and further executes non-target threads, which are threads other than the at least one target thread among the plurality of threads, during execution of the at least one target thread while restricting access by the non-target threads to at least some hardware resources of the computer (20).
    Type: Grant
    Filed: October 25, 2006
    Date of Patent: March 13, 2012
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Yousuke Konishi, Shinichiro Mikami, Makoto Ishii, Yasuyuki Kinoshita, Atsuhiko Fujimoto, Masayuki Takahashi