For A Parallel Or Multiprocessor System Patents (Class 717/149)

Loop compiling (Class 717/150)

Binding data parallel device source code

Patent number: 8756590

Abstract: A compile environment is provided in a computer system that allows programmers to program both CPUs and data parallel devices (e.g., GPUs) using a high level general purpose programming language that has data parallel (DP) extensions. A compilation process translates modular DP code written in the general purpose language into DP device source code in a high level DP device programming language using a set of binding descriptors for the DP device source code. A binder generates a single, self-contained DP device source code unit from the set of binding descriptors. A DP device compiler generates a DP device executable for execution on one or more data parallel devices from the DP device source code unit.

Type: Grant

Filed: June 22, 2010

Date of Patent: June 17, 2014

Assignee: Microsoft Corporation

Inventors: Weirong Zhu, Lingli Zhang, Sukhdeep S. Sodhi, Yosseff Levanoni
Emitting coherent output from multiple threads for printf

Patent number: 8752018

Abstract: One embodiment of the present invention sets forth a technique for emitting coherent output from multiple threads for the printf( ) function. Additionally, parallel (not divergent) execution of the threads for the printf( ) function is maintained when possible to improve run-time performance. Processing of the printf( ) function is separated into two tasks, gathering of the per thread data and formatting the gathered data according to the formatting codes for display. The threads emit a coherent stream of contiguous segments, where each segment includes the format string for the printf( ) function and the gathered data for a thread. The coherent stream is written by the threads and read by a display processor. The display processor executes a single thread to format the gathered data according to the format string for display.

Type: Grant

Filed: June 21, 2011

Date of Patent: June 10, 2014

Assignee: NVIDIA Corporation

Inventors: Stephen Jones, Geoffrey Gerfin
Throughput-aware software pipelining for highly multi-threaded systems

Patent number: 8752036

Abstract: Embodiments of the invention provide systems and methods for throughput-aware software pipelining in compilers to produce optimal code for single-thread and multi-thread execution on multi-threaded systems. A loop is identified within source code as a candidate for software pipelining. An attempt is made to generate pipelined code (e.g., generate an instruction schedule and a set of register assignments) for the loop in satisfaction of throughput-aware pipelining criteria, like maximum register count, minimum trip count, target core pipeline resource utilization, maximum code size, etc. If the attempt fails to generate code in satisfaction of the criteria, embodiments adjust one or more settings (e.g., by reducing scalarity or latency settings being used to generate the instruction schedule).

Type: Grant

Filed: October 31, 2011

Date of Patent: June 10, 2014

Assignee: Oracle International Corporation

Inventors: Spiros Kalogeropulos, Partha Tirumalai
Application program interface of a parallel-processing computer system that supports multiple programming languages

Patent number: 8745603

Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.

Type: Grant

Filed: May 10, 2013

Date of Patent: June 3, 2014

Assignee: Google Inc.

Inventors: Morgan S. McGuire, Christopher G. Demetriou, Brian K. Grant, Matthew N. Papakipos
Transferring data in a parallel processing environment

Patent number: 8745604

Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a processor, a switch including switching circuitry to forward data over data paths from other tiles to the processor and to switches of other tiles, and a switch memory that stores instruction streams that are able to operate independently for respective output ports of the switch.

Type: Grant

Filed: February 25, 2008

Date of Patent: June 3, 2014

Assignee: Massachusetts Institute of Technology

Inventor: Anant Agarwal
Method and apparatus for run-time statistics dependent program execution using source-coding principles

Patent number: 8739142

Abstract: Disclosed are a method and system for optimized, dynamic data-dependent program execution. The disclosed system comprises a statistics computer which computes statistics of the incoming data at the current time instant, where the said statistics include the probability distribution of the incoming data, the probability distribution over program modules induced by the incoming data, the probability distribution induced over program outputs by the incoming data, and the time-complexity of each program module for the incoming data, wherein the said statistics are computed on as a function of current and past data, and previously computed statistics; a plurality of alternative execution path orders designed prior to run-time by the use of an appropriate source code; a source code selector which selects one of the execution path orders as a function of the statistics computed by the statistics computer; a complexity measurement which measures the time-complexity of the currently selected execution path-order.

Type: Grant

Filed: December 21, 2012

Date of Patent: May 27, 2014

Assignee: International Business Machines Corporation

Inventors: Dake He, Ashish Jagmohan, Jian Lou, Ligang Lu
Program conversion apparatus and computer readable medium

Patent number: 8732684

Abstract: According to one embodiment, a first program code including a plurality of variables is converted to a second program code to be executed by a multi-core processor including a plurality of cores. Specifically, an access pattern of each variable in the first program code is decided. All variables in the first program code are classified into a plurality of groups each of which variables belong to the same access pattern. A member structure of each group having variables belonging to the same access pattern is created. Each member structure includes variables of one group. A route-pointer indicating an address (in a memory) of variables of the member structure is created. The variables in the first program code are converted to the member structure and the route-pointer (in the second program code) that indicate the variables. The second program code is outputted to the multi-core processor.

Type: Grant

Filed: January 25, 2011

Date of Patent: May 20, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Nobuaki Tojo, Ken Tanabe, Hidenori Matsuzaki
Methods and system for executing a program in multiple execution environments

Patent number: 8732685

Abstract: A method and medium are disclosed for executing a technical computing program in parallel in multiple execution environments. A program is invoked for execution in a first execution environment and from the invocation the program is executed in the first execution environment and one or more additional execution environments to provide for parallel execution of the program. New constructs in a technical computing programming language are disclosed for parallel programming of a technical computing program for execution in multiple execution environments. It is also further disclosed a system and method for changing the mode of operation of an execution environment from a sequential mode to a parallel mode of operation and vice-versa.

Type: Grant

Filed: February 3, 2011

Date of Patent: May 20, 2014

Assignee: The Mathworks, Inc.

Inventor: Cleve Moler
Bootup method and device for application program in mobile equipment

Patent number: 8726249

Abstract: A bootup device and method for an application program on a mobile equipment to improve the bootup speed of the application program on the mobile equipment. The bootup device has an application management module, that boots up a virtual machine module based on the application program to be run. A virtual machine module, loads codes of the application program and Just in Time (JIT) compilation results of a bootup process of the application program into a memory, search, in the JIT compilation results, for local JIT compiled codes corresponding to the bootup process code segment to be executed, and executes the found local JIT compiled codes when executing each bootup process code segment of the application program. A storage management module, store and reads the codes of the application program and the JIT compilation results obtained from the JIT compilation of the bootup process of the application program.

Type: Grant

Filed: February 21, 2011

Date of Patent: May 13, 2014

Assignee: ZTE Corportaion

Inventors: Youpeng Gu, Lifeng Xu, Wei Hu, Sheng Zhong, Wei Wang, Zemin Wang
Configurable logic integrated circuit having a multidimensional structure of configurable elements

Patent number: 8726250

Abstract: Programming of modules which can be reprogrammed during operation is described. Partitioning of code sequences is also described.

Type: Grant

Filed: March 10, 2010

Date of Patent: May 13, 2014

Assignee: Pact XPP Technologies AG

Inventors: Martin Vorbach, Armin Nückel
Pipelined loop parallelization with pre-computations

Patent number: 8726251

Abstract: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.

Type: Grant

Filed: March 29, 2011

Date of Patent: May 13, 2014

Assignee: Oracle International Corporation

Inventors: Spiros Kalogeropulos, Partha Pal Tirumalai
Interactive iterative program parallelization based on dynamic feedback

Patent number: 8726238

Abstract: Interactive iterative program parallelization based on dynamic feedback program parallelization, in one aspect, may identify a ranked list of one or more candidate pieces of code each with one or more source refactorings that can be applied to parallelize the code, apply at least one of the one or more refactorings to create a revised code, and determine performance data associated with the revised code. The performance data may be used to make decisions on identifying next possible ranked list of refactorings.

Type: Grant

Filed: February 22, 2010

Date of Patent: May 13, 2014

Assignee: International Business Machines Corporation

Inventors: Evelyn Duesterwald, Robert M. Fuhrer, Vijay Saraswat
Controlling parallelization of recursion using pluggable policies

Patent number: 8719803

Abstract: A parallelism policy object provides a control parallelism interface whose implementation evaluates parallelism conditions that are left unspecified in the interface. User-defined and other parallelism policy procedures can make recommendations to a worker program for transitioning between sequential program execution and parallel execution. Parallelizing assistance values obtained at runtime can be used in the parallelism conditions on which the recommendations are based. A consistent parallelization policy can be employed across a range of parallel constructs, and inside recursive procedures.

Type: Grant

Filed: June 4, 2008

Date of Patent: May 6, 2014

Assignee: Microsoft Corporation

Inventors: Stephen Toub, Igor Ostrovsky, Joe Duffy, Vance Morrison, Huseyin Yildiz
Speculative multi-threading for instruction prefetch and/or trace pre-build

Patent number: 8719806

Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is a speculative prefetch thread to perform instruction prefetch and/or trace pre-build for the main thread.

Type: Grant

Filed: September 10, 2010

Date of Patent: May 6, 2014

Assignee: Intel Corporation

Inventors: Hong Wang, Tor M. Aamodt, Pedro Marcuello, Jared W. Stark, IV, John P. Shen, Antonio Gonzalez, Per Hammarlund, Gerolf F. Hoflehner, Perry H. Wang, Steve Shih-wei Liao
Architecture for accelerated computer processing

Patent number: 8713545

Abstract: A data processing system includes a host computer, an additional computer, an application module including a first executable code, a module for analyzing said first executable code and a module for generating a second executable code segmented notably into code blocks which are executed in a preferential manner on one of the two computers. The second executable code includes a sub-module for managing the distribution of the processing operations between the host computer and the additional computer and a sub-module for managing the additional computer as a virtual machine which executes the blocks allocated to the additional computer.

Type: Grant

Filed: March 13, 2009

Date of Patent: April 29, 2014

Assignee: Silkan

Inventor: Pierre Fiorini
Method for data management in a collaborative service-oriented workshop

Patent number: 8706702

Abstract: A method for managing data in a collaborative service-oriented workshop, which is adapted to treat objects associated with data representative of real or process data, is provided to share data and resources in an architecture of a workspace. The architecture is adapted to design complex objects and manipulate information technology objects that represent data, which may be representative of a real object or a process based on metadata representing characteristic data. The metadata includes a generic part that is common to all data, a specific part that is inherent to the type of data, and links to other objects. The links make it possible to establish, at a later time, the traceability of the data, or in other words the traceability between the different data produced or used during execution of processes.

Type: Grant

Filed: May 14, 2009

Date of Patent: April 22, 2014

Assignee: Airbus Operations S.A.S.

Inventors: Bernard Marquez, Thierry Chevalier, Philippe Sauvage
Performing parallel processing of distributed arrays

Patent number: 8707281

Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.

Type: Grant

Filed: July 23, 2012

Date of Patent: April 22, 2014

Assignee: The MathWorks, Inc.

Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
Using parallel processing constructs and dynamically allocating program portions

Patent number: 8707280

Abstract: A computing device-implemented method includes receiving a program, analyzing and transforming the program, determining an inner context and an outer context of the program based on the analysis of the program, and allocating one or more portions of the inner context of the program to two or more labs for parallel execution. The method also includes receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to the outer context of the program.

Type: Grant

Filed: June 29, 2012

Date of Patent: April 22, 2014

Assignee: The MathWorks, Inc.

Inventors: Halldor N Stefansson, Brett Baker, Edric Ellis, Joseph F Hicklin, John N Little, Jocelyn Luke Martin, Piotr R Luszczek, Nausheen B Moulana, Loren Dean, Roy E. Lurie
METHOD OF COMPILING PROGRAM TO BE EXECUTED ON MULTI-CORE PROCESSOR, AND TASK MAPPING METHOD AND TASK SCHEDULING METHOD OF RECONFIGURABLE PROCESSOR

Publication number: 20140109069

Abstract: A method of compiling a program to be executed on a multicore processor is provided. The method may include generating an initial solution by mapping a task to a source processing element (PE) and a destination PE, and selecting a communication scheme for transmission of the task from the source PE to the destination PE, approximately optimizing the mapping and communication scheme included in the initial solution, and scheduling the task, wherein the communication scheme is designated in a compiling process.

Type: Application

Filed: October 11, 2013

Publication date: April 17, 2014

Applicants: SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jin-Hoo LEE, Moo-Kyoung CHUNG, Key-Young CHOI, Yeon-Gon CHO, Soo-Jung RYU
Accelerating generic loop iterators using speculative execution

Patent number: 8701099

Abstract: A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates.

Type: Grant

Filed: November 2, 2010

Date of Patent: April 15, 2014

Assignee: International Business Machines Corporation

Inventors: Ganesh Bikshandi, Dibyendu Das, Smruti Ranjan Sarangi
Programming system in multi-core environment, and method and program of the same

Patent number: 8694975

Abstract: A first compiler generates one or more object codes from a program code for a first processor included in an arithmetic processing system to which a plurality of processors are mutually connected. A first linker links the generated one or more object codes to generate an execution file for the first processor. A parameter information generation unit generates, based on the information acquired from the first linker, parameter information used in a second processor included in the arithmetic processing system. A second compiler refers to a program code and the parameter information for the second processor to generate one or more object codes. A second linker links the generated one or more object codes to generate an execution file for the second processor.

Type: Grant

Filed: July 23, 2009

Date of Patent: April 8, 2014

Assignee: NEC Corporation

Inventor: Tomoyoshi Kobori
DATA DEPENDENCE ANALYSIS SUPPORT DEVICE, DATA DEPENDENCE ANALYSIS SUPPORT PROGRAM, AND DATA DEPENDENCE ANALYSIS SUPPORT METHOD

Publication number: 20140096117

Abstract: A data dependence analysis support device calculates pointer information by performing a context-sensitive pointer analysis on every pointer used in a program; calculates dataflow information between statements by performing a context-sensitive dataflow analysis, using the context-sensitive pointer information, on all statements in an analysis target region and all statements that might be called upon execution of the analysis target region; and calculates inter-region data dependence information, using the dataflow information, for two or more threaded regions included in the source program.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Applicant: PANASONIC CORPORATION

Inventor: Akira Tanaka
System and method for ordering tasks with complex interrelationships

Patent number: 8689231

Abstract: One or more embodiments of the invention enable a system and method for ordering tasks with complex interrelationships. The present invention as described herein may be used to produce a linear ordering of tasks with complex interrelationships including dependencies and constraints. In one or more embodiments optional tasks may be permitted such that a given task may or may not be added to the execution queue depending on the scheduling of earlier tasks following evaluation of their dependencies—that is, the system of the invention supports the management of optional tasks in a task ordering operation where some or all of tasks have complex interdependencies.

Type: Grant

Filed: June 30, 2009

Date of Patent: April 1, 2014

Assignee: SAP AG

Inventor: Brent Milnor
Display of data from parallel programming contexts

Patent number: 8689196

Abstract: The display of a debugging interface for use with parallel computing. When a break state has been entered in a particular code context (such as a method) by a particular execution context (such as a thread), related execution contexts are found that were also executing in the particular code context. While in the break state, multiple expressions are then evaluated for each of the execution contexts. The results are then displayed with perhaps navigation controls that allow the results to be efficiently navigated.

Type: Grant

Filed: December 10, 2010

Date of Patent: April 1, 2014

Assignee: Microsoft Corporation

Inventors: Paul E. Maybee, Daniel Moth
Methods and apparatuses for a compiler server

Patent number: 8677329

Abstract: A method and an apparatus that instructs a compiler server to build or otherwise obtain a compiled code corresponding to a compilation request received from an application are described. The compiler server may be configured to compile source codes for a plurality of independent applications, each running in a separate process, using a plurality of independent compilers, each running in a separate compiler process. A search may be performed in a cache for a compiled code that satisfies a compilation request received from an application. A reply message including the compiled code can be provided for the application, wherein the compiled code is compiled in direct response to the request, or is obtained from the cache if the search identifies in the cache the compiled code that satisfies the compilation request.

Type: Grant

Filed: June 3, 2009

Date of Patent: March 18, 2014

Assignee: Apple Inc.

Inventors: Robert Beretta, Nicholas William Burns, Nathaniel Begeman, Phillip Kent Miller, Geoffrey Grant Stahl
Parallelization method, system and program

Patent number: 8677334

Abstract: A computer-implemented method, system, and article of manufacture for parallelizing a code configured by coupling a functional block having an internal state and a functional block without any internal state. The method includes: creating and storing a graphical representation where functional blocks are chosen as nodes and connections between functional blocks are chosen as links; visiting the nodes on the graphical representation sequentially, detecting inputs from functional blocks without any internal state to functional blocks having an internal state and storing these functional blocks as a set of use blocks, and detecting inputs from functional blocks having an internal state to functional blocks without any internal state and storing these functional blocks as a set of definition blocks; and forming strands of functional blocks based on information on the set of use blocks and information on the set of definition blocks stored in association with the functional blocks.

Type: Grant

Filed: October 28, 2010

Date of Patent: March 18, 2014

Assignee: International Business Machines Corporation

Inventors: Arquimedes Martinez Canedo, Hideaki Komatsu, Takeo Yoshizawa
Executing multiple child code blocks via a single compiled parent code block

Patent number: 8677332

Abstract: Systems and methods for compiling one or more code blocks written in programming language are provided. In some aspects, display associated with application is provided. Display includes plurality of graphical objects. That each of plurality of graphical objects is associated with child code block in one-to-one association between graphical objects and child code blocks is determined. Each child code block is written in programming language. The child code blocks associated with plurality of graphical objects are transformed into single parent code block. Parent code block, upon compiling, is configured to be reused across execution contexts and to allow injection of global scope. Parent code block, upon specific execution, includes execution context for specified child code block. Parent code block is configured to receive indication of specified child code block for initiating execution of parent code block. Parent code block is compiled.

Type: Grant

Filed: July 24, 2012

Date of Patent: March 18, 2014

Assignee: Google Inc.

Inventors: John Hjelmstad, Malte Ubl
Processors and compiling methods for processors

Patent number: 8677330

Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.

Type: Grant

Filed: June 9, 2010

Date of Patent: March 18, 2014

Assignee: Altera Corporation

Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
Performance analysis of software executing in different sessions

Patent number: 8671400

Abstract: A technique includes providing first objects that are associated with an application session and in a processor-based system, identifying second objects in another application session corresponding to the first objects based at least in part on a comparison of the second objects to matching rules associated with the first objects.

Type: Grant

Filed: December 23, 2009

Date of Patent: March 11, 2014

Assignee: Intel Corporation

Inventors: Christopher J. Cormack, Nathaniel Duca, Joseph D. Matarazzo
Automatic Exploitation of Data Parallelism in Streaming Applications

Publication number: 20140068578

Abstract: An embodiment of the invention provides a method for exploiting stateless and stateful data parallelism in a streaming application, wherein a compiler determines whether an operator of the streaming application is safe to parallelize based on a definition of the operator and an instance of the definition. The operator is not safe to parallelize when the operator has selectivity greater than 1, wherein the selectivity is the number of output tuples generated for each input tuple. Parallel regions are formed within the streaming application with the compiler when the operator is safe to parallelize. Synchronization strategies for the parallel regions are determined with the compiler, wherein the synchronization strategies are determined based on the definition of the operator and the instance of the definition. The synchronization strategies of the parallel regions are enforced with a runtime system.

Type: Application

Filed: October 12, 2012

Publication date: March 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bugra Gedik, Martin J. Hirzel, Scott A. Schneider, Kun-Lung Wu
OPTIMIZED DIVISION OF WORK AMONG PROCESSORS IN A HETEROGENEOUS PROCESSING SYSTEM

Publication number: 20140068582

Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.

Type: Application

Filed: September 10, 2012

Publication date: March 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
OPTIMIZED DIVISION OF WORK AMONG PROCESSORS IN A HETEROGENEOUS PROCESSING SYSTEM

Publication number: 20140068581

Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.

Type: Application

Filed: August 30, 2012

Publication date: March 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
Automatic Exploitation of Data Parallelism in Streaming Applications

Publication number: 20140068577

Abstract: An embodiment of the invention provides a method for exploiting stateless and stateful data parallelism in a streaming application, wherein a compiler determines whether an operator of the streaming application is safe to parallelize based on a definition of the operator and an instance of the definition. The operator is not safe to parallelize when the operator has selectivity greater than 1, wherein the selectivity is the number of output tuples generated for each input tuple. Parallel regions are formed within the streaming application with the compiler when the operator is safe to parallelize. Synchronization strategies for the parallel regions are determined with the compiler, wherein the synchronization strategies are determined based on the definition of the operator and the instance of the definition. The synchronization strategies of the parallel regions are enforced with a runtime system.

Type: Application

Filed: August 28, 2012

Publication date: March 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bugra Gedik, Martin J. Hirzel, Scott A. Schneider, Kun-Lung Wu
Instruction grouping and ungrouping apparatus and method for an adaptive microprocessor system

Patent number: 8667476

Abstract: Multiple instructions including branch instructions are grouped into a condensed variable length instruction. A trimmed and grouped branch instruction branches to one of the instructions in the same group. Therefore, a grouped instruction including branch(es) inherently executes correct branch behaviors without deploying any branch prediction schemes. In addition, a grouped instruction in a condensed form delivers multiple operations to execute without fetching all of the instructions grouped separately from the instruction memory via its caches while conserving instruction memory and/or cache as well as decreasing the number of bit switching on the bus. Software developers can make their own compatible, compact and ciphered instruction sets after grouping existing instructions in their software compiled with existing software compilers and the associated microprocessors.

Type: Grant

Filed: January 6, 2010

Date of Patent: March 4, 2014

Assignee: Adaptmicrosys LLC

Inventor: Yong-Kyu Jung
Auto-generation of concurrent code for multi-core applications

Patent number: 8661424

Abstract: A code generation system comprises a model analyzer configured to identify data dependencies in a data flow diagram that describes functional behavior of an application, wherein the model analyzer is further configured to compute a data and computation map based on the data dependencies and to compute one or more implementation constraints; a model partitioner configured to compute one or more partition boundaries based on the data and computation map and the one or more implementation constraints; and a code generator configured to generate parallelized code based on the data flow diagram, the one or more implementation constraints, and the one or more partition boundaries, wherein the code generator is configured to map the code corresponding to each partition defined by the one or more partition boundaries to one of a plurality of cores of a multi-core processor, and to generate inter-core communication code for at least one line of the data and computation map crossed by the one or more partition boundari

Type: Grant

Filed: September 2, 2010

Date of Patent: February 25, 2014

Assignee: Honeywell International Inc.

Inventors: Kirk Schloegel, Devesh Bhatt
Architecture and programming in a parallel processing environment with switch-interconnected processors

Patent number: 8656141

Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a pipelined processor configured to process multiple streams of instructions for the processor; and a switch including switching circuitry to forward data over data paths from other tiles to one or more pipeline stages of the processor and to switches of other tiles. At least some of the data is forwarded based on one or more streams of instructions for the switch.

Type: Grant

Filed: December 13, 2005

Date of Patent: February 18, 2014

Assignee: Massachusetts Institute of Technology

Inventor: Anant Agarwal
Cross-logical entity accelerators

Patent number: 8656375

Abstract: A cross-logical entity group is created that includes one or more accelerators to be shared by a plurality of logical entities. Instantiated on the accelerators are functions that are common across multiple logical entities. The functions to be instantiated are determined, for instance, dynamically during run-time.

Type: Grant

Filed: November 2, 2009

Date of Patent: February 18, 2014

Assignee: International Business Machines Corporation

Inventors: Rajaram B. Krishnamurthy, Thomas A. Gregg
Compiler for providing intrinsic supports for VLIW PAC processors with distributed register files and method thereof

Patent number: 8656376

Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.

Type: Grant

Filed: September 1, 2011

Date of Patent: February 18, 2014

Assignee: National Tsing Hua University

Inventors: Jenq Kuen Lee, Chi Bang Kuan
PARALLELIZATION METHOD, SYSTEM, AND PROGRAM

Publication number: 20140047421

Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.

Type: Application

Filed: August 21, 2013

Publication date: February 13, 2014

Applicant: International Business Machines Corporation

Inventors: Shuhichi Shimizu, Takeo Yoshizawa
Single thread performance in an in-order multi-threaded processor

Patent number: 8650554

Abstract: A mechanism is provided for improving single-thread performance for a multi-threaded, in-order processor core. In a first phase, a compiler analyzes application code to identify instructions that can be executed in parallel with focus on instruction-level parallelism and removing any register interference between the threads. The compiler inserts as appropriate synchronization instructions supported by the apparatus to ensure that the resulting execution of the threads is equivalent to the execution of the application code in a single thread. In a second phase, an operating system schedules the threads produced in the first phase on the hardware threads of a single processor core such that they execute simultaneously. In a third phase, the microprocessor core executes the threads specified by the second phase such that there is one hardware thread executing an application thread.

Type: Grant

Filed: April 27, 2010

Date of Patent: February 11, 2014

Assignee: International Business Machines Corporation

Inventors: Elmootazbellah N. Elnozahy, Ahmed Gheith
Method and system for dynamically parallelizing application program

Patent number: 8650384

Abstract: Provided is a method and system for dynamically parallelizing an application program. Specifically, provided is a method and system having multi-core control that may verify a number of available threads according to an application program and dynamically parallelize data based on the verified number of available threads. The method and system for dynamically parallelizing the application program may divide a data block to be processed according to the application program based on a relevant data characteristic and dynamically map the threads to division blocks, and thereby enhance a system performance.

Type: Grant

Filed: April 27, 2010

Date of Patent: February 11, 2014

Assignees: Samsung Electronics Co., Ltd., University of Southern California

Inventors: Seung Won Lee, Shi Hwa Lee, Dong-In Kang, Mikyung Kang
Counter operation in a state machine lattice

Patent number: 8648621

Abstract: Disclosed are methods and devices, among which is a device that includes a finite state machine lattice. The lattice may include a counter suitable for counting a number of times a programmable element in the lattice detects a condition. The counter may be configured to output in response to counting the condition was detected a certain number of times. For example, the counter may be configured to output in response to determining a condition was detected at least (or no more than) the certain number of times, determining the condition was detected exactly the certain number of times, or determining the condition was detected within a certain range of times. The counter may be coupled to other counters in the device for determining high-count operations and/or certain quantifiers.

Type: Grant

Filed: December 15, 2011

Date of Patent: February 11, 2014

Assignee: Micron Technology, Inc.

Inventors: Harold B Noyes, David R. Brown, Paul Glendenning
Data parallelism aware debugging

Patent number: 8645920

Abstract: The debugging of a kernel in a data parallel environment. A debugger engine interfaces with a data parallel environment that is running one or more data parallel kernels through a first interface. For each of at least one of the one or more kernels, a program object is formulated that abstractly represents the data parallel kernel including data parallel functionality of the kernel. The program object has a second interface that allows information regarding the kernel to be discovered by the debugger user interface module.

Type: Grant

Filed: December 10, 2010

Date of Patent: February 4, 2014

Assignee: Microsoft Corporation

Inventor: Paul E. Maybee
Vectorizing combinations of program operations

Patent number: 8640112

Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.

Type: Grant

Filed: March 30, 2011

Date of Patent: January 28, 2014

Assignee: National Instruments Corporation

Inventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
Dynamic optimization using a resource cost registry

Patent number: 8635606

Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.

Type: Grant

Filed: October 13, 2009

Date of Patent: January 21, 2014

Assignee: Empire Technology Development LLC

Inventor: Ezekiel John Joseph Kruglick
Parallel dynamic optimization

Patent number: 8627300

Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.

Type: Grant

Filed: October 13, 2009

Date of Patent: January 7, 2014

Assignee: Empire Technology Development LLC

Inventor: Ezekiel John Joseph Kruglick
Concurrent management of adaptive programs

Patent number: 8627301

Abstract: A method for concurrent management of adaptive programs is disclosed wherein changes in a set of modifiable references are initially identified. A list of uses of the changed references is next computed using records made in structures of the references. The list is next inserted into an elimination queue. Comparison is next made of each of the uses to the other uses to determine independence or dependence thereon. Determined dependent uses are eliminated and the preceding steps are repeated for all determined independent uses until all dependencies have been eliminated.

Type: Grant

Filed: May 18, 2007

Date of Patent: January 7, 2014

Assignee: Intel Corporation

Inventors: Matthew Hammer, Mohan Rajagopalan, Anwar Ghuloum
Technologies for detecting erroneous resumptions in a continuation based runtime

Patent number: 8620991

Abstract: Technologies for enabling a continuation based runtime to accept or reject external stimulus and, in addition, to determine if an external stimulus may be valid for processing at a later point in execution.

Type: Grant

Filed: August 1, 2012

Date of Patent: December 31, 2013

Assignee: Microsoft Corporation

Inventors: Kenneth David Wolf, Justin David Brown, Karthik Raman, Nathan Christopher Talbert, Edmund Samuel Victor Pinto
Compiling software for a hierarchical distributed processing system

Patent number: 8621446

Abstract: Compiling software for a hierarchical distributed processing system including providing to one or more compiling nodes software to be compiled, wherein at least a portion of the software to be compiled is to be executed by one or more other nodes; compiling, by the compiling node, the software; maintaining, by the compiling node, any compiled software to be executed on the compiling node; selecting, by the compiling node, one or more nodes in a next tier of the hierarchy of the distributed processing system in dependence upon whether any compiled software is for the selected node or the selected node's descendants; sending to the selected node only the compiled software to be executed by the selected node or selected node's descendant.

Type: Grant

Filed: April 29, 2010

Date of Patent: December 31, 2013

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
System and method for large-scale data processing using an application-independent framework

Patent number: 8612510

Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values.

Type: Grant

Filed: January 12, 2010

Date of Patent: December 17, 2013

Assignee: Google Inc.

Inventors: Jeffrey Dean, Sanjay Ghemawat

prev 1 2 3 4 5 6 7 8 9 … next