For A Parallel Or Multiprocessor System Patents (Class 717/149)
-
Publication number: 20100229161Abstract: A compile technique is provided for multicore allocation, by which a desired running performance can be achieved. The steps of analyzing a taskization directive, taskizing a specified part, and assigning a specified CPU the task are adopted for the compile technique. According to the program-to-tasks-decomposition compile technique, the multicore decomposition is performed by allocating tasks to CPUs individually while following a task decomposition directive of a main part designated by a user. When no direction is issued concerning a CPU to be allocated, the relation with a principal task is judged from the relation of invocation and the dependency, and CPU to be allocated, and then the CPU to be allocated is determined. In allocation to the CPU, an efficient multicore-task decomposition is achieved in consideration of copy and assignment of one processing to more than one CPU while figuring in the balance between processing speed and resources.Type: ApplicationFiled: January 27, 2010Publication date: September 9, 2010Applicant: RENESAS TECHNOLOGY CORP.Inventor: Noriyasu MORI
-
Patent number: 7793278Abstract: Systems and methods perform affine partitioning on a code stream to produce code segments that may be parallelized. The code segments include copies of the original code stream with conditional inserted that aid in parallelizing code. The conditional is formed by determining the constraints on a processor variable determined by the affine partitioning and applying the constraints to the original code stream.Type: GrantFiled: September 30, 2005Date of Patent: September 7, 2010Assignee: Intel CorporationInventors: Zhao Hui Du, Shih-Wei Liao, Gansha Wu, Guei-Yuan Lueh
-
Patent number: 7793276Abstract: In some embodiments, a method and apparatus for automatically parallelizing a sequential network application through pipeline transformation are described. In one embodiment, the method includes the configuration of a network processor into a D-stage processor pipeline. Once configured, a sequential network application program is transformed into D-pipeline stages. Once transformed, the D-pipeline stages are executed in parallel within the D-stage processor pipeline. In one embodiment, transformation of a sequential application program is performed by modeling the sequential network program as a flow network model and selecting from the flow network model into a plurality of preliminary pipeline stages. Other embodiments are described and claimed.Type: GrantFiled: November 14, 2003Date of Patent: September 7, 2010Assignee: Intel CorporationInventors: Jinquan Dai, Luddy Harrison, Bo Huang, Cotton Seed, Long Li
-
Patent number: 7788672Abstract: According to one embodiment, an information processing apparatus includes a plurality of execution modules and a scheduler which controls assignment of a plurality of basic modules to the plurality of execution modules. The scheduler includes assigning, when an available execution module which is not assigned any basic modules exists, a basic module which stands by for completion of execution of other basic module to the available execution module, measuring an execution time of processing of the basic module itself, measuring execution time of processing for assigning the basic module to the execution module, and performing granularity adjustment by linking two or more basic modules to be successively executed according to the restriction of a execution sequence so as to be assigned as one set to the execution module and redividing the linked two or more basic modules, based on the two execution measured execution times.Type: GrantFiled: April 6, 2009Date of Patent: August 31, 2010Assignee: Kabushiki Kaisha ToshibaInventor: Yasuyuki Tanaka
-
Patent number: 7788650Abstract: Source code includes a directive to indicate data structures of related data to a compiler. The compiler associates the related data to the same one of multiple processors in a multiprocessor environment. The compiler searches the source code for locks associated with the related data, and generates executable code that is modified with respect to locks written in the source code. The compiler may replace or remove locks written in the source code to protect access to the related data, resulting in an executable program that does not include the locks.Type: GrantFiled: May 10, 2005Date of Patent: August 31, 2010Assignee: Intel CorporationInventors: Erik J. Johnson, Stephen D. Goglin
-
Patent number: 7779393Abstract: A system for efficiently verifying compliance with a memory consistency model includes a test module and an analysis module. The test module may coordinate an execution of a multithreaded test program on a test platform. If the test platform provides an indication of the order in which writes from multiple processing elements are performed at shared memory locations, the analysis module may use a first set of rules to verify that the results of the execution correspond to a valid ordering of events according to a memory consistency model. If the test platform does not provide an indication of write ordering, the analysis module may use a second set of rules to verify compliance with the memory consistency model.Type: GrantFiled: May 25, 2005Date of Patent: August 17, 2010Assignee: Oracle America, Inc.Inventors: Chaiyasit Manovit, Sudheendra G. Hangal, Robert E. Cypher
-
Patent number: 7779382Abstract: Validity of one or more assertions for any concurrent execution of a plurality of software instructions with at most k?1 context switches can be determined. Validity checking can account for execution of the software instructions in an unbounded stack depth scenario. A finite data domain representation can be used. The software instructions can be represented by a pushdown system. Validity checking can account for thread creation during execution of the plurality of software instructions.Type: GrantFiled: December 10, 2004Date of Patent: August 17, 2010Assignee: Microsoft CorporationInventors: Niels Jakob Rehof, Shaz Qadeer
-
Publication number: 20100205588Abstract: General-purpose distributed data-parallel computing using a high-level language is disclosed. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. The distributed execution plan is then executed on large compute clusters. Thus, the developer is allowed to write the program using familiar programming constructs in the high level language. Moreover, developers without experience with distributed compute systems are able to take advantage of such systems.Type: ApplicationFiled: February 9, 2009Publication date: August 12, 2010Applicant: MICROSOFT CORPORATIONInventors: Yuan Yu, Dennis Fetterly, Michael Isard, Ulfar Erlingsson, Mihai Budiu
-
Patent number: 7774768Abstract: An improved method of optimizing the instruction set of a digital processor using code compression. In one embodiment, the method comprises obtaining an assembly language program to be used for the optimization process; calculating the static frequency of each instruction type from the base instruction set; sorting the instruction types by frequency; determining the number and type of instructions necessary for correct program execution; creating a compressed instruction set encoding; re-evaluating the compressed instruction according to the foregoing steps; and generating an instruction set encoding for the compressed instruction set. Improved compressed instruction formats and register structures useful in a processor are also disclosed. A computer program and apparatus for synthesizing logic implementing the aforementioned data cache architecture and pipeline performance enhancements are further disclosed.Type: GrantFiled: May 22, 2006Date of Patent: August 10, 2010Assignee: ARC International, PLCInventor: Peter Warnes
-
Publication number: 20100199257Abstract: A method and a system for transformation-based program generation using two separate specifications as input: An implementation neutral specification of the desired computation and a specification of the execution platform. The generated implementation incorporates execution platform opportunities such as parallelism. Operationally, the invention has two broad stages. First, it designs the abstract implementation in the problem domain in terms of an Intermediate Language (IL) that is unfettered by programming language restrictions and requirements. Concurrently, the design is evolved by specializing the IL to encapsulate a plurality of desired design features in the implementation such as partitioning for multicore and/or instruction level parallelism. Concurrently, constraints that stand in for implied implementation structures are added to the design and coordinated with other constraints. Second, the IL is refined into implementation code.Type: ApplicationFiled: January 31, 2009Publication date: August 5, 2010Inventor: Ted James Biggerstaff
-
Patent number: 7765532Abstract: An Induced Multi-threading (IMT) framework may be configured to induce multi-threaded execution in software code. In one embodiment, the IMT framework may include a concurrent code generator configured to receive marked code having one or more blocks of code marked for concurrent execution. Software code initially intended for sequential execution may have been automatically marked by an automated code marker and/or marked manually to generate the marked code. The concurrent code generator may be configured to generate concurrent code from the marked code. The concurrent code may include one or more tasks configured for concurrent execution in place of the one or more marked blocks of code. In one embodiment, the IMT framework may also include a scheduler configured to schedule one or more of the tasks for multi-threaded execution.Type: GrantFiled: October 22, 2002Date of Patent: July 27, 2010Assignee: Oracle America, Inc.Inventors: Bala Dutt, Ajay Kumar, Hanumantha R. Susarla
-
Patent number: 7757222Abstract: Code is affine partitioned to generate affine partitioning mappings. Parallel code is generated based on the affine partitioning mappings. Generating the parallel code includes coalescing loops in the parallel code generated from the affine partitioning mappings to generate coalesced parallel code and optimizing the coalesced parallel code.Type: GrantFiled: September 30, 2005Date of Patent: July 13, 2010Assignee: Intel CorporationInventors: Shih-wei Liao, Zhao Hui Du, Bu Qi Cheng, Gansha Wu, Guei-Yuan Lueh
-
Publication number: 20100175049Abstract: Embodiments of the present invention relate to systems, methods and computer storage media for providing Structured Computations Optimized for Parallel Execution (SCOPE) that facilitate analysis of a large-scale dataset utilizing row data of those data sets. SCOPE includes, among other features, an extract command for extracting data bytes from a data stream and structuring the data bytes as data rows having strictly defined columns. SCOPE also includes a process command and a reduce command that identify data rows as inputs. The reduce command also identifies a reduce key that facilitates the reduction based on the reduce key. SCOPE additionally includes a combine command that identifies two data row sets that are to be combined based on an identified joint condition. Additionally, SCOPE includes a select command that leverages SQL and C# languages to create an expressive script that is capable of analyzing large-scale data sets in a parallel computing environment.Type: ApplicationFiled: January 7, 2009Publication date: July 8, 2010Applicant: MICROSOFT CORPORATIONInventors: WILLIAM D. RAMSEY, RONNIE IRA CHAIKEN, DARREN A. SHAKIB, ROBERT JOHN JENKINS, JR., SIMON J. WEAVER, JINGREN ZHOU, DANIEL DEDU-CONSTANTIN, ACHINT SRIVASTAVA
-
Patent number: 7752212Abstract: A computer-implemented method of creating a schema specific parser for processing Extensible Markup Language (XML) documents can include receiving an XML schema comprising a plurality of components, determining a hierarchy of the plurality of components of the XML schema, and creating an execution plan specifying a hierarchy of XML processing instructions. Each XML processing instruction can be associated with an XML processing function of a virtual machine that performs an XML document processing task. The hierarchy of XML processing instructions can be determined according to the hierarchy of components of the XML schema. An instruction causing the virtual machine to invoke a de-serialization module that extracts at least one item of information from the XML document can be inserted into the execution plan. The execution plan can be compiled into a bytecode version of the execution plan that is interpretable by the virtual machine. The bytecode version of the execution plan can be output.Type: GrantFiled: June 5, 2007Date of Patent: July 6, 2010Assignee: International Business Machines CorporationInventors: Abraham Heifets, Margaret G. Kostoulas, Moshe Morris Emanuel Matsa, Eric Perkins
-
Patent number: 7747996Abstract: A method to enabling interoperability of a locking synchronization method with a lock-free synchronization method in a multi-threaded environment is presented. The method examines a class file for mutable fields contained in critical code sections. The mutable fields are transferred to a shadow record and a pointer is substituted in the class field for each transferred mutable field. Code is altered so that the lock-free synchronization method is used if a lock is not held on the object. Atomic compare and swap operations are employed after mutable fields are updated during execution of the lock-free synchronization method.Type: GrantFiled: May 25, 2006Date of Patent: June 29, 2010Assignee: Oracle America, Inc.Inventor: David Dice
-
Patent number: 7743087Abstract: The present invention provides a method and system for the dynamic distribution of an array in a parallel computing environment. The present invention obtains a criterion for distributing an array and performs flexible portioning based on the obtained criterion. In some embodiment analysis may be performed based on the criterion. The flexible portioning is then performed based on the analysis.Type: GrantFiled: March 22, 2006Date of Patent: June 22, 2010Assignee: The Math Works, Inc.Inventors: Penelope Anderson, Cleve Moler, Sheung Hun Cheng, Patrick D. Quillen
-
Publication number: 20100153937Abstract: A computer system for executing a computer program on parallel processors, the system having a compiler for identifying within a computer program concurrency markers that indicate that code between them can be executed in parallel and should be executed with delayed side-effects; and an execution system that is operable to execute the code identified by the concurrency markers to generate a queue of side-effects and after execution of that code is completed, sequentially execute the queue of side-effects.Type: ApplicationFiled: January 26, 2007Publication date: June 17, 2010Applicant: CODEPLAY SOFTWARE LIMITEDInventors: Andrew Richards, Andrew Cook, Colin Riley
-
Patent number: 7739667Abstract: A system for conducting performance analysis for executing tasks. The analysis involves generating a variety of trace information related to performance measures, including parallelism-related information, during execution of the task. In order to generate the trace information, target source code of interest is compiled in such a manner that executing the resulting executable code will generate execution trace information composed of a series of events. Each event stores trace information related to a variety of performance measures for the one or more processors and protection domains used. After the execution trace information has been generated, the system can use that trace information and a trace information description file to produce useful performance measure information. The trace information description file contains information that describes the types of execution events as well as the structure of the stored information.Type: GrantFiled: October 19, 2005Date of Patent: June 15, 2010Assignee: Cray Inc.Inventors: Charles David Callahan, II, Keith Arnett Shields, Preston Pengra Briggs, III
-
Patent number: 7730463Abstract: A computer implemented method, system and computer program product for automatically generating SIMD code. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized and combined with the non-simdized code.Type: GrantFiled: February 21, 2006Date of Patent: June 1, 2010Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
-
Patent number: 7730483Abstract: The invention relates to a system and also method for storage of project-planning data in an automation system containing automation devices. To simplify changes within the automation system the project-planning data is stored in a generic, expandable data storage format, with parts of the project-planning data being assigned runtime data in each case, with the runtime data being assigned at least one automation device in each case, with the runtime data being executable parts of programs on the automation devices assigned to it and with the parts of the project-planning data being stored distributed in parallel to the runtime data assigned to it in each case in the automation devices assigned to the runtime data in each case.Type: GrantFiled: July 28, 2005Date of Patent: June 1, 2010Assignee: Siemens AktiengesellschaftInventors: Martin Daimer, Ludwig Karl-Dietze, Andreas Macher, Siegfried Prieler
-
Patent number: 7721273Abstract: The present invention relates to a system and methodology facilitating automated manufacturing processes in an industrial controller environment. An automation system is provided for automated industrial processing. The system includes an equipment phase object that is executed by a controller engine, wherein the equipment phase object can be accessible from internal instructions within the controller and/or from external instructions directed to the controller such as from a server or another controller across a network connection. A sequencing engine operates with the equipment phase object to facilitate automated industrial processing. The sequencing engine can be adapted to various industrial standards or in accordance with other state type models.Type: GrantFiled: June 4, 2004Date of Patent: May 18, 2010Assignee: Rockwell Automation Technologies, Inc.Inventors: Kenwood H. Hall, Stephen D. Ryan, Richard Alan Morse, Kam-Por Yuen, Raymond J. Staron, Paul R. D'Mura, James H. Jarrett, Michael D. Kalan, Robert C. Kline, Jr., Charles Martin Rischar, Christopher E. Stanek, Tao Zhao, Kenneth S. Plache, Shoshana L. Wodzisz, Jan Bezdicek, David A. Johnston, Jeffery W. Brooks
-
Patent number: 7716638Abstract: A machine readable description of a new feature of a processor is provided by a processor vendor. Control code executing on a processor, such as a traditional operating system kernel, a partitioning kernel, or the like can be programmed to receive the description of the feature and to use information provided by the description to detect, enable and manage operation of the new feature.Type: GrantFiled: March 4, 2005Date of Patent: May 11, 2010Assignee: Microsoft CorporationInventor: Andrew J. Thornton
-
Patent number: 7712080Abstract: The present invention relates generally to computer programming, and more particularly to systems and methods for parallel distributed programming. Generally, a parallel distributed program is configured to operate across multiple processors and multiple memories. In one aspect of the invention, a parallel distributed program includes a distributed shared variable located across the multiple memories and distributed programs capable of operating across multiple processors.Type: GrantFiled: May 21, 2004Date of Patent: May 4, 2010Assignee: The Regents of the University of CaliforniaInventors: Lei Pan, Lubomir R. Bic, Michael B. Dillencourt
-
Patent number: 7712090Abstract: Methods and apparatus, including computer program products, for generating an executable program, including receiving serial compile commands in a pseudo-compiler to compile source code modules, scheduling the serial compiler commands in parallel compilers to compile the source code modules, compiling the source code modules in the parallel compliers to generate object code modules, sending compiler completion acknowledgements to a synchronizer and linking the object code modules in linkers in response to linker initiation commands from the synchronizer.Type: GrantFiled: February 7, 2003Date of Patent: May 4, 2010Assignee: SAP AktiengesellschaftInventor: Thomas Stuefe
-
Patent number: 7707543Abstract: A method, a device and a system arrangement are disclosed for generating self-contained software components having in each case synchronous and/or asynchronous interfaces with an internal threading model. The concept disclosed enables all necessary synchronization mechanisms to be provided automatically. The concept is based on an asynchronous operation manager used to divert callbacks from a called component into a calling component.Type: GrantFiled: November 22, 2005Date of Patent: April 27, 2010Assignee: Siemens AktiengesellschaftInventors: Detlef Becker, Karlheinz Dorn, Vladyslav Ukis, Hans-Martin Von Stockhausen
-
Patent number: 7694289Abstract: Methods for embedding codes executable in a first system having a first microprocessor into codes executable in a second system having a second microprocessor are described herein. In one aspect of the invention, an exemplary method includes providing first codes having a routine, the first codes being compilable to be executed in the first system, and compiling the first codes, resulting in second codes; the second codes comprising opcodes of the routine executable by the first system, which convert the second codes into third codes automatically, the third codes being compilable to be executed by the second system; this is followed by compiling the third codes, resulting in the fourth codes being executable in the second system, and linking the fourth codes, generating an executable image and executing the executable image in the second system. Other methods and apparatuses are also described.Type: GrantFiled: December 5, 2005Date of Patent: April 6, 2010Assignee: Apple Inc.Inventor: Keith Stattenfield
-
Patent number: 7689980Abstract: Linear transformations of statements in code are performed to generate linear expressions associated with the statements. Parallel code is generated using the linear expressions. Generating the parallel code includes splitting the computation-space of the statements into intervals and generating parallel code for the intervals.Type: GrantFiled: September 30, 2005Date of Patent: March 30, 2010Assignee: Intel CorporationInventors: Zhao Hui Du, Shih-wei Liao, Gansha Wu, Guei-Yuan Lueh
-
Open multi-processing reduction implementation in cell broadband engine (CBE) single source compiler
Patent number: 7689977Abstract: The present disclosure is directed to a method for providing an OpenMP reduction implementation. The method may comprise creating an aggregate of at least one reduction variable in a parallel region or a work-sharing construct; defining a pointer variable, the pointer variable pointing to a dynamic array of the aggregate; creating an initialization routine, an outlined routine and a reduction accumulation routine; replacing the parallel region or the work-sharing construct with a runtime routine, the runtime routine taking a plurality of arguments including an address of the initialization routine, an address of the outlined routine, an address of the reduction accumulation routine, an address of the pointer variable, and a size of the aggregate; and executing the runtime routine when the at least one reduction variable is in the parallel region or the work-sharing construct.Type: GrantFiled: April 15, 2009Date of Patent: March 30, 2010Assignee: International Business Machines CorporationInventors: Guansong Zhang, Shimin Cui, Ettore Tiotto -
Patent number: 7689971Abstract: Methods and apparatuses provide for referencing thread local variables (TLVs) with techniques such as stack address mapping. A method may involve a head pointer that points to a set of thread local variables (TLVs) of a thread. A method according to one embodiment may include an operation for storing the head pointer in a global data structure in a user space of a processing system. The head pointer may subsequently be retrieved from the global data structure and used to access one or more TLVs associated with the thread. In one embodiment, the head pointer is retrieved without executing any kernel system calls. In an example embodiment, the head pointer is stored in a global array, and a stack address for the thread is used to derive an index into the array. Other embodiments are described and claimed.Type: GrantFiled: August 9, 2004Date of Patent: March 30, 2010Assignee: Intel CorporationInventors: Jinzhan Peng, Xiaohua Shi, Guei-Yuan Lueh, Gansha Wu
-
Patent number: 7685583Abstract: We present a technique for implementing obstruction-free atomic multi-target transactions that target special “transactionable” locations in shared memory. A programming interface for using operations based on these transactions can be structured in several ways, including as n-word compare-and-swap (NCAS) operations or as atomic sequences of single-word loads and stores (e.g., as transactional memory).Type: GrantFiled: July 16, 2003Date of Patent: March 23, 2010Assignee: Sun Microsystems, Inc.Inventors: Mark S. Moir, Victor M. Luchangco, Maurice Herlihy
-
Publication number: 20100070958Abstract: Provided is a program parallelizing method and a program parallelizing apparatus that enable to efficiently generate a parallelized program with shorter parallel execution time. An instruction is scheduled by referring to inter-instruction dependency. A dependency between an instruction in a function fp/f0 and an instruction of a function fq of its descendant is analyzed, and parallelization is performed with the analysis result. First, an instruction of a deeper function fq is relatively scheduled to analyze whether each instruction has dependency with an instruction of another function fp. When there is inter-instruction dependency, scheduling of the instruction of the function fq is performed so as to maintain the dependency and realize the shortest execution time.Type: ApplicationFiled: November 15, 2007Publication date: March 18, 2010Inventor: Masamichi Takagi
-
Patent number: 7681016Abstract: A low overhead mechanism for supporting speculative execution and code compression in a Very Long Instruction Word (VLIW) microprocessor. Profitable speculations can be determined statically at compile time and a low overhead hardware recovery mechanism used that does not require compensation code.Type: GrantFiled: June 30, 2003Date of Patent: March 16, 2010Assignee: Critical Blue Ltd.Inventor: Richard Michael Taylor
-
Patent number: 7673294Abstract: This invention modifies an irregular software pipelined loop conditioned upon data in a condition register in a compiler scheduled very long instruction word data processor to prevent over-execution upon loop exit. The method replaces a register modifying instruction with an instruction conditional upon the inverse condition register if possible. The method inserts a conditional register move instruction to a previously unused register within the loop if possible without disturbing the schedule. Then a restoring instruction is added after the loop. Alternatively, both these two functions can be performed by a delayed register move instruction. Instruction insertion is into a previously unused instruction slot of an execute packet. These changes can be performed manually or automatically by the compiler.Type: GrantFiled: January 18, 2006Date of Patent: March 2, 2010Assignee: Texas Instruments IncorporatedInventors: Elana D. Granston, Jagadeesh Sankaran
-
Patent number: 7673295Abstract: Compile-time non-concurrency analysis of parallel programs improves execution efficiency by detecting possible data race conditions within program barriers. Subroutines are modeled with control flow graphs and region trees having plural nodes related by edges that represent the hierarchical loop structure and construct relationship of statements. Phase partitioning of the control flow graph allows analysis of statement relationships with programming semantics, such as those of the OpenMP language, that define permitted operations and execution orders.Type: GrantFiled: April 27, 2004Date of Patent: March 2, 2010Assignee: Sun Microsystems, Inc.Inventor: Yuan Lin
-
Publication number: 20100042981Abstract: Generating parallelized executable code from input code includes statically analyzing the input code to determine aspects of data flow and control flow of the input code; dynamically analyzing the input code to determine additional aspects of data flow and control flow of the input code; generating an intermediate representation of the input code based at least in part on the aspects of data flow and control flow of the input code identified by the static analysis and the additional aspects of data and control flow of the input code identified by the dynamic analysis; and processing the intermediate representation to determine portions of the intermediate representation that are eligible for parallel execution; and generating parallelized executable code from the processed intermediate representationType: ApplicationFiled: August 13, 2009Publication date: February 18, 2010Inventors: Robert Scott Dreyer, Joel Kevin Jones, Michael Douglas Sharp, Ivan Dimitrov Baev
-
Publication number: 20100031241Abstract: A method and apparatus for optimizing source code for use in a parallel computing environment by compiling an application source code, performing analysis, and optimizing the application source code. At the time of compilation, a compiler adds instrumentation to a prepared executable. An analysis program then analyzes the prepared executable and generates an analysis result. The analysis result is then used by the analysis program to optimize the application source code for parallel processing.Type: ApplicationFiled: November 17, 2008Publication date: February 4, 2010Inventor: Leon Schwartz
-
Patent number: 7657882Abstract: A dataflow instruction set architecture and execution model, referred to as WaveScalar, which is designed for scalable, low-complexity/high-performance processors, while efficiently providing traditional memory semantics through a mechanism called wave-ordered memory. Wave-ordered memory enables “real-world” programs, written in any language, to be run on the WaveScalar architecture, as well as any out-of-order execution unit. Because it is software-controlled, wave-ordered memory can be disabled to obtain greater parallelism. Wavescalar also includes a software-controlled tag management system.Type: GrantFiled: January 21, 2005Date of Patent: February 2, 2010Assignee: University of WashingtonInventors: Mark H. Oskin, Steven J. Swanson, Susan J. Eggers
-
Patent number: 7657877Abstract: A method and device for translating a program to a system including at least one first processor and a reconfigurable unit. Code portions of the program which are suitable for the reconfigurable unit are determined. The remaining code of the program is extracted and/or separated for processing by the first processor.Type: GrantFiled: June 20, 2002Date of Patent: February 2, 2010Assignee: Pact XPP Technologies AGInventors: Martin Vorbach, Armin Nückel, Frank May, Markus Weinhardt, Joao Manuel Paiva Cardoso
-
Patent number: 7657880Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.Type: GrantFiled: August 1, 2003Date of Patent: February 2, 2010Assignee: Intel CorporationInventors: Hong Wang, Tor Aamodt, Per Hammarlund, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao
-
Publication number: 20090320005Abstract: A parallelism policy object provides a control parallelism interface whose implementation evaluates parallelism conditions that are left unspecified in the interface. User-defined and other parallelism policy procedures can make recommendations to a worker program for transitioning between sequential program execution and parallel execution. Parallelizing assistance values obtained at runtime can be used in the parallelism conditions on which the recommendations are based. A consistent parallelization policy can be employed across a range of parallel constructs, and inside recursive procedures.Type: ApplicationFiled: June 4, 2008Publication date: December 24, 2009Applicant: MICROSOFT CORPORATIONInventors: Stephen Toub, Igor Ostrovsky, Joe Duffy, Vance Morrison, Huseyin Yildiz
-
Publication number: 20090307671Abstract: A system and method for modeling simulation and game artificial intelligence as a data management problem. A scripting language that provides game designers and players with a data-driven artificial intelligence scheme for customizing behavior for individual agents. Query processing and indexing techniques to efficiently execute large numbers of agent scripts, thus providing a framework for games with a large number of agents.Type: ApplicationFiled: June 8, 2009Publication date: December 10, 2009Applicant: CORNELL UNIVERSITYInventors: Walker White, Johannes Gehrke, Alan John Demers, Christoph Emanuel Koch
-
Publication number: 20090307655Abstract: Systems and methods for parallelizing applications that operate on irregular data structures. In an embodiment, the methods and systems enable programmers to use set iterators to express algorithms containing amorphous data parallelism. Parallelization can be achieved by speculatively executing multiple iterations of the iterator in parallel. Conflicts between speculatively executing iterations can be detected and handled using information in class libraries.Type: ApplicationFiled: June 10, 2009Publication date: December 10, 2009Inventors: Keshav Kumar Pingali, Milind Vidyadhar Kulkarni
-
Publication number: 20090300591Abstract: Parallel tasks are created, and the tasks include a first task and a second task. Each task resolves a future. At least one of three possible continuations for each of the tasks is supplied. The three continuations include a success continuation, a cancellation continuation, and a failure continuation. A value is returned as the future of the first task upon a success continuation for the first task. The value from the first task is used in the second task to compute a second future. The cancellation continuation is supplied if the task is cancelled and the failure continuation is supplied if the task does not return a value and the task is not cancelled.Type: ApplicationFiled: June 2, 2008Publication date: December 3, 2009Applicant: MICROSOFT CORPORATIONInventors: John Duffy, Stephen H. Toub
-
Patent number: 7627864Abstract: A method to optimize speculative parallel thread execution comprises selecting a plurality of partition candidate pairs for speculative parallel thread execution, transforming each partition candidate pair of the plurality of partition candidate pairs to improve the expected performance gain of each pair, and selecting a set of one or more transformed partition candidate pairs that do not interfere with each other and produce a maximum expected performance gain.Type: GrantFiled: June 27, 2005Date of Patent: December 1, 2009Assignee: Intel CorporationInventors: Zhao Hui Du, Tin-fook Ngai, Chu-cheow Lim
-
Patent number: 7620945Abstract: One embodiment of the present invention provides a system that supports parallelized generic reduction operations in a parallel programming language, wherein a reduction operation is an associative operation that can be divided into a group of sub-operations that can execute in parallel. During operation, the system detects generic reduction operations in source code. In doing so, the system identifies a set of reduction variables upon which the generic reduction operation will operate, along with a set of initial values for the variables. The system additionally identifies a merge operation that merges partial results from the parallel generic reduction operations into a final result. The system then compiles the program's source code into a form which facilitates executing the generic reduction operations in parallel.Type: GrantFiled: August 16, 2005Date of Patent: November 17, 2009Assignee: Sun Microsystems, Inc.Inventors: Yonghong Song, Yuan Lin, Prashanth Narayanaswamy
-
Patent number: 7617494Abstract: The program to be executed is compiled by translating it into native instructions of the instruction-set architecture of the processor system, organizing the instructions deriving from the translation of the program into respective bundles in an order of successive bundles, each bundle grouping together instructions adapted to be executed in parallel by the processor system. The bundles of instructions are ordered into respective sub-bundles, said sub-bundles identifying a first set of instructions, which must be executed before the instructions belonging to the next bundle of said order, and a second set of instructions, which can be executed both before and in parallel with respect to the instructions belonging to said subsequent bundle of said order.Type: GrantFiled: July 1, 2003Date of Patent: November 10, 2009Assignee: STMicroelectronics S.r.l.Inventors: Fabrizio Simone Rovati, Antonio Maria Borneo, Danilo Pietro Pau
-
Patent number: 7617488Abstract: A method and an apparatus for determining processor utilization have been disclosed. In one embodiment, the method includes determining processor utilization in a data processing system and synchronizing execution of a number of threads in the data processing system to prevent interrupting the determining of the processor utilization. Other embodiments have been claimed and described.Type: GrantFiled: December 30, 2003Date of Patent: November 10, 2009Assignee: Intel CorporationInventors: Vasudevan Srinivasan, Avinash P. Chakravarthy
-
Publication number: 20090271774Abstract: A Veil program analyzes the source code and/or data of an existing sequential target program and determines how best to distribute the target program and data among the processing elements of a multi-processing element computing system. The Veil program analyzes source code loops, data sizes and types to prepare a set of distribution attempts, whereby each distribution is run under a run-time evaluation wrapper and evaluated to determine the optimal distribution across the available processing elements.Type: ApplicationFiled: July 3, 2009Publication date: October 29, 2009Applicant: MANAGEMENT SERVICES GROUP, INC. d/b/a Global Technical SystemsInventors: Robert Stephen Gordy, Terry Spitzer
-
Patent number: 7606698Abstract: A method and apparatus for sharing data between processors within first and second discrete clusters of processors. The method comprises supplying a first amount of data from a first data array in a first discrete cluster of processors to selector logic. A second amount of data from a second data array in a second discrete cluster of processors is also supplied to the selector logic. The first or second amount of data is then selected using the selector logic, and supplied to a shared input port on a processor in the first discrete cluster of processors. The apparatus comprises selector logic for selecting between input data supplied by a first data array and a second data array. The data arrays are located within different discrete clusters of processors. The selected data is then supplied to a shared input port on a processor.Type: GrantFiled: September 26, 2006Date of Patent: October 20, 2009Assignee: Cadence Design Systems, Inc.Inventors: Beshara G. Elmufdi, Mitchell G. Poplack
-
Patent number: 7590976Abstract: The present invention relates a compiler program, a computer-readable storage medium storing such a compiler program, a compiling method and a compiling unit, and an object thereof is to automatically generate a reentrant object program. In order to accomplish this object, an address saving program generator 16a generates an address saving program for saving a data area address of a calling program module; an address setting program generator 16b generates an address setting program for setting a data area address of an other program module; a transferring program generator 16c generates a transferring program for the transfer from the calling program module to the other program module; an address resetting program generator 16d generates an address resetting program for reading and resetting the saved data area address; and an accessing program generator 16e generates an accessing program for accessing a data area for the other program module using a relative address from the set data area address.Type: GrantFiled: December 26, 2003Date of Patent: September 15, 2009Assignee: Panasonic CorporationInventors: Masaki Kawai, Takuji Kawamoto, Shusuke Haruna, Yutaka Fujihara