For A Parallel Or Multiprocessor System Patents (Class 717/149)
-
Patent number: 8756590Abstract: A compile environment is provided in a computer system that allows programmers to program both CPUs and data parallel devices (e.g., GPUs) using a high level general purpose programming language that has data parallel (DP) extensions. A compilation process translates modular DP code written in the general purpose language into DP device source code in a high level DP device programming language using a set of binding descriptors for the DP device source code. A binder generates a single, self-contained DP device source code unit from the set of binding descriptors. A DP device compiler generates a DP device executable for execution on one or more data parallel devices from the DP device source code unit.Type: GrantFiled: June 22, 2010Date of Patent: June 17, 2014Assignee: Microsoft CorporationInventors: Weirong Zhu, Lingli Zhang, Sukhdeep S. Sodhi, Yosseff Levanoni
-
Patent number: 8752018Abstract: One embodiment of the present invention sets forth a technique for emitting coherent output from multiple threads for the printf( ) function. Additionally, parallel (not divergent) execution of the threads for the printf( ) function is maintained when possible to improve run-time performance. Processing of the printf( ) function is separated into two tasks, gathering of the per thread data and formatting the gathered data according to the formatting codes for display. The threads emit a coherent stream of contiguous segments, where each segment includes the format string for the printf( ) function and the gathered data for a thread. The coherent stream is written by the threads and read by a display processor. The display processor executes a single thread to format the gathered data according to the format string for display.Type: GrantFiled: June 21, 2011Date of Patent: June 10, 2014Assignee: NVIDIA CorporationInventors: Stephen Jones, Geoffrey Gerfin
-
Patent number: 8752036Abstract: Embodiments of the invention provide systems and methods for throughput-aware software pipelining in compilers to produce optimal code for single-thread and multi-thread execution on multi-threaded systems. A loop is identified within source code as a candidate for software pipelining. An attempt is made to generate pipelined code (e.g., generate an instruction schedule and a set of register assignments) for the loop in satisfaction of throughput-aware pipelining criteria, like maximum register count, minimum trip count, target core pipeline resource utilization, maximum code size, etc. If the attempt fails to generate code in satisfaction of the criteria, embodiments adjust one or more settings (e.g., by reducing scalarity or latency settings being used to generate the instruction schedule).Type: GrantFiled: October 31, 2011Date of Patent: June 10, 2014Assignee: Oracle International CorporationInventors: Spiros Kalogeropulos, Partha Tirumalai
-
Patent number: 8745603Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.Type: GrantFiled: May 10, 2013Date of Patent: June 3, 2014Assignee: Google Inc.Inventors: Morgan S. McGuire, Christopher G. Demetriou, Brian K. Grant, Matthew N. Papakipos
-
Patent number: 8745604Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a processor, a switch including switching circuitry to forward data over data paths from other tiles to the processor and to switches of other tiles, and a switch memory that stores instruction streams that are able to operate independently for respective output ports of the switch.Type: GrantFiled: February 25, 2008Date of Patent: June 3, 2014Assignee: Massachusetts Institute of TechnologyInventor: Anant Agarwal
-
Patent number: 8739142Abstract: Disclosed are a method and system for optimized, dynamic data-dependent program execution. The disclosed system comprises a statistics computer which computes statistics of the incoming data at the current time instant, where the said statistics include the probability distribution of the incoming data, the probability distribution over program modules induced by the incoming data, the probability distribution induced over program outputs by the incoming data, and the time-complexity of each program module for the incoming data, wherein the said statistics are computed on as a function of current and past data, and previously computed statistics; a plurality of alternative execution path orders designed prior to run-time by the use of an appropriate source code; a source code selector which selects one of the execution path orders as a function of the statistics computed by the statistics computer; a complexity measurement which measures the time-complexity of the currently selected execution path-order.Type: GrantFiled: December 21, 2012Date of Patent: May 27, 2014Assignee: International Business Machines CorporationInventors: Dake He, Ashish Jagmohan, Jian Lou, Ligang Lu
-
Patent number: 8732684Abstract: According to one embodiment, a first program code including a plurality of variables is converted to a second program code to be executed by a multi-core processor including a plurality of cores. Specifically, an access pattern of each variable in the first program code is decided. All variables in the first program code are classified into a plurality of groups each of which variables belong to the same access pattern. A member structure of each group having variables belonging to the same access pattern is created. Each member structure includes variables of one group. A route-pointer indicating an address (in a memory) of variables of the member structure is created. The variables in the first program code are converted to the member structure and the route-pointer (in the second program code) that indicate the variables. The second program code is outputted to the multi-core processor.Type: GrantFiled: January 25, 2011Date of Patent: May 20, 2014Assignee: Kabushiki Kaisha ToshibaInventors: Nobuaki Tojo, Ken Tanabe, Hidenori Matsuzaki
-
Patent number: 8732685Abstract: A method and medium are disclosed for executing a technical computing program in parallel in multiple execution environments. A program is invoked for execution in a first execution environment and from the invocation the program is executed in the first execution environment and one or more additional execution environments to provide for parallel execution of the program. New constructs in a technical computing programming language are disclosed for parallel programming of a technical computing program for execution in multiple execution environments. It is also further disclosed a system and method for changing the mode of operation of an execution environment from a sequential mode to a parallel mode of operation and vice-versa.Type: GrantFiled: February 3, 2011Date of Patent: May 20, 2014Assignee: The Mathworks, Inc.Inventor: Cleve Moler
-
Patent number: 8726249Abstract: A bootup device and method for an application program on a mobile equipment to improve the bootup speed of the application program on the mobile equipment. The bootup device has an application management module, that boots up a virtual machine module based on the application program to be run. A virtual machine module, loads codes of the application program and Just in Time (JIT) compilation results of a bootup process of the application program into a memory, search, in the JIT compilation results, for local JIT compiled codes corresponding to the bootup process code segment to be executed, and executes the found local JIT compiled codes when executing each bootup process code segment of the application program. A storage management module, store and reads the codes of the application program and the JIT compilation results obtained from the JIT compilation of the bootup process of the application program.Type: GrantFiled: February 21, 2011Date of Patent: May 13, 2014Assignee: ZTE CorportaionInventors: Youpeng Gu, Lifeng Xu, Wei Hu, Sheng Zhong, Wei Wang, Zemin Wang
-
Patent number: 8726250Abstract: Programming of modules which can be reprogrammed during operation is described. Partitioning of code sequences is also described.Type: GrantFiled: March 10, 2010Date of Patent: May 13, 2014Assignee: Pact XPP Technologies AGInventors: Martin Vorbach, Armin Nückel
-
Patent number: 8726251Abstract: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.Type: GrantFiled: March 29, 2011Date of Patent: May 13, 2014Assignee: Oracle International CorporationInventors: Spiros Kalogeropulos, Partha Pal Tirumalai
-
Patent number: 8726238Abstract: Interactive iterative program parallelization based on dynamic feedback program parallelization, in one aspect, may identify a ranked list of one or more candidate pieces of code each with one or more source refactorings that can be applied to parallelize the code, apply at least one of the one or more refactorings to create a revised code, and determine performance data associated with the revised code. The performance data may be used to make decisions on identifying next possible ranked list of refactorings.Type: GrantFiled: February 22, 2010Date of Patent: May 13, 2014Assignee: International Business Machines CorporationInventors: Evelyn Duesterwald, Robert M. Fuhrer, Vijay Saraswat
-
Patent number: 8719803Abstract: A parallelism policy object provides a control parallelism interface whose implementation evaluates parallelism conditions that are left unspecified in the interface. User-defined and other parallelism policy procedures can make recommendations to a worker program for transitioning between sequential program execution and parallel execution. Parallelizing assistance values obtained at runtime can be used in the parallelism conditions on which the recommendations are based. A consistent parallelization policy can be employed across a range of parallel constructs, and inside recursive procedures.Type: GrantFiled: June 4, 2008Date of Patent: May 6, 2014Assignee: Microsoft CorporationInventors: Stephen Toub, Igor Ostrovsky, Joe Duffy, Vance Morrison, Huseyin Yildiz
-
Patent number: 8719806Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is a speculative prefetch thread to perform instruction prefetch and/or trace pre-build for the main thread.Type: GrantFiled: September 10, 2010Date of Patent: May 6, 2014Assignee: Intel CorporationInventors: Hong Wang, Tor M. Aamodt, Pedro Marcuello, Jared W. Stark, IV, John P. Shen, Antonio Gonzalez, Per Hammarlund, Gerolf F. Hoflehner, Perry H. Wang, Steve Shih-wei Liao
-
Patent number: 8713545Abstract: A data processing system includes a host computer, an additional computer, an application module including a first executable code, a module for analyzing said first executable code and a module for generating a second executable code segmented notably into code blocks which are executed in a preferential manner on one of the two computers. The second executable code includes a sub-module for managing the distribution of the processing operations between the host computer and the additional computer and a sub-module for managing the additional computer as a virtual machine which executes the blocks allocated to the additional computer.Type: GrantFiled: March 13, 2009Date of Patent: April 29, 2014Assignee: SilkanInventor: Pierre Fiorini
-
Patent number: 8706702Abstract: A method for managing data in a collaborative service-oriented workshop, which is adapted to treat objects associated with data representative of real or process data, is provided to share data and resources in an architecture of a workspace. The architecture is adapted to design complex objects and manipulate information technology objects that represent data, which may be representative of a real object or a process based on metadata representing characteristic data. The metadata includes a generic part that is common to all data, a specific part that is inherent to the type of data, and links to other objects. The links make it possible to establish, at a later time, the traceability of the data, or in other words the traceability between the different data produced or used during execution of processes.Type: GrantFiled: May 14, 2009Date of Patent: April 22, 2014Assignee: Airbus Operations S.A.S.Inventors: Bernard Marquez, Thierry Chevalier, Philippe Sauvage
-
Patent number: 8707281Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.Type: GrantFiled: July 23, 2012Date of Patent: April 22, 2014Assignee: The MathWorks, Inc.Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
-
Patent number: 8707280Abstract: A computing device-implemented method includes receiving a program, analyzing and transforming the program, determining an inner context and an outer context of the program based on the analysis of the program, and allocating one or more portions of the inner context of the program to two or more labs for parallel execution. The method also includes receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to the outer context of the program.Type: GrantFiled: June 29, 2012Date of Patent: April 22, 2014Assignee: The MathWorks, Inc.Inventors: Halldor N Stefansson, Brett Baker, Edric Ellis, Joseph F Hicklin, John N Little, Jocelyn Luke Martin, Piotr R Luszczek, Nausheen B Moulana, Loren Dean, Roy E. Lurie
-
Publication number: 20140109069Abstract: A method of compiling a program to be executed on a multicore processor is provided. The method may include generating an initial solution by mapping a task to a source processing element (PE) and a destination PE, and selecting a communication scheme for transmission of the task from the source PE to the destination PE, approximately optimizing the mapping and communication scheme included in the initial solution, and scheduling the task, wherein the communication scheme is designated in a compiling process.Type: ApplicationFiled: October 11, 2013Publication date: April 17, 2014Applicants: SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.Inventors: Jin-Hoo LEE, Moo-Kyoung CHUNG, Key-Young CHOI, Yeon-Gon CHO, Soo-Jung RYU
-
Patent number: 8701099Abstract: A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates.Type: GrantFiled: November 2, 2010Date of Patent: April 15, 2014Assignee: International Business Machines CorporationInventors: Ganesh Bikshandi, Dibyendu Das, Smruti Ranjan Sarangi
-
Patent number: 8694975Abstract: A first compiler generates one or more object codes from a program code for a first processor included in an arithmetic processing system to which a plurality of processors are mutually connected. A first linker links the generated one or more object codes to generate an execution file for the first processor. A parameter information generation unit generates, based on the information acquired from the first linker, parameter information used in a second processor included in the arithmetic processing system. A second compiler refers to a program code and the parameter information for the second processor to generate one or more object codes. A second linker links the generated one or more object codes to generate an execution file for the second processor.Type: GrantFiled: July 23, 2009Date of Patent: April 8, 2014Assignee: NEC CorporationInventor: Tomoyoshi Kobori
-
Publication number: 20140096117Abstract: A data dependence analysis support device calculates pointer information by performing a context-sensitive pointer analysis on every pointer used in a program; calculates dataflow information between statements by performing a context-sensitive dataflow analysis, using the context-sensitive pointer information, on all statements in an analysis target region and all statements that might be called upon execution of the analysis target region; and calculates inter-region data dependence information, using the dataflow information, for two or more threaded regions included in the source program.Type: ApplicationFiled: September 28, 2012Publication date: April 3, 2014Applicant: PANASONIC CORPORATIONInventor: Akira Tanaka
-
Patent number: 8689231Abstract: One or more embodiments of the invention enable a system and method for ordering tasks with complex interrelationships. The present invention as described herein may be used to produce a linear ordering of tasks with complex interrelationships including dependencies and constraints. In one or more embodiments optional tasks may be permitted such that a given task may or may not be added to the execution queue depending on the scheduling of earlier tasks following evaluation of their dependencies—that is, the system of the invention supports the management of optional tasks in a task ordering operation where some or all of tasks have complex interdependencies.Type: GrantFiled: June 30, 2009Date of Patent: April 1, 2014Assignee: SAP AGInventor: Brent Milnor
-
Patent number: 8689196Abstract: The display of a debugging interface for use with parallel computing. When a break state has been entered in a particular code context (such as a method) by a particular execution context (such as a thread), related execution contexts are found that were also executing in the particular code context. While in the break state, multiple expressions are then evaluated for each of the execution contexts. The results are then displayed with perhaps navigation controls that allow the results to be efficiently navigated.Type: GrantFiled: December 10, 2010Date of Patent: April 1, 2014Assignee: Microsoft CorporationInventors: Paul E. Maybee, Daniel Moth
-
Patent number: 8677329Abstract: A method and an apparatus that instructs a compiler server to build or otherwise obtain a compiled code corresponding to a compilation request received from an application are described. The compiler server may be configured to compile source codes for a plurality of independent applications, each running in a separate process, using a plurality of independent compilers, each running in a separate compiler process. A search may be performed in a cache for a compiled code that satisfies a compilation request received from an application. A reply message including the compiled code can be provided for the application, wherein the compiled code is compiled in direct response to the request, or is obtained from the cache if the search identifies in the cache the compiled code that satisfies the compilation request.Type: GrantFiled: June 3, 2009Date of Patent: March 18, 2014Assignee: Apple Inc.Inventors: Robert Beretta, Nicholas William Burns, Nathaniel Begeman, Phillip Kent Miller, Geoffrey Grant Stahl
-
Patent number: 8677334Abstract: A computer-implemented method, system, and article of manufacture for parallelizing a code configured by coupling a functional block having an internal state and a functional block without any internal state. The method includes: creating and storing a graphical representation where functional blocks are chosen as nodes and connections between functional blocks are chosen as links; visiting the nodes on the graphical representation sequentially, detecting inputs from functional blocks without any internal state to functional blocks having an internal state and storing these functional blocks as a set of use blocks, and detecting inputs from functional blocks having an internal state to functional blocks without any internal state and storing these functional blocks as a set of definition blocks; and forming strands of functional blocks based on information on the set of use blocks and information on the set of definition blocks stored in association with the functional blocks.Type: GrantFiled: October 28, 2010Date of Patent: March 18, 2014Assignee: International Business Machines CorporationInventors: Arquimedes Martinez Canedo, Hideaki Komatsu, Takeo Yoshizawa
-
Patent number: 8677332Abstract: Systems and methods for compiling one or more code blocks written in programming language are provided. In some aspects, display associated with application is provided. Display includes plurality of graphical objects. That each of plurality of graphical objects is associated with child code block in one-to-one association between graphical objects and child code blocks is determined. Each child code block is written in programming language. The child code blocks associated with plurality of graphical objects are transformed into single parent code block. Parent code block, upon compiling, is configured to be reused across execution contexts and to allow injection of global scope. Parent code block, upon specific execution, includes execution context for specified child code block. Parent code block is configured to receive indication of specified child code block for initiating execution of parent code block. Parent code block is compiled.Type: GrantFiled: July 24, 2012Date of Patent: March 18, 2014Assignee: Google Inc.Inventors: John Hjelmstad, Malte Ubl
-
Patent number: 8677330Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.Type: GrantFiled: June 9, 2010Date of Patent: March 18, 2014Assignee: Altera CorporationInventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
-
Patent number: 8671400Abstract: A technique includes providing first objects that are associated with an application session and in a processor-based system, identifying second objects in another application session corresponding to the first objects based at least in part on a comparison of the second objects to matching rules associated with the first objects.Type: GrantFiled: December 23, 2009Date of Patent: March 11, 2014Assignee: Intel CorporationInventors: Christopher J. Cormack, Nathaniel Duca, Joseph D. Matarazzo
-
Publication number: 20140068578Abstract: An embodiment of the invention provides a method for exploiting stateless and stateful data parallelism in a streaming application, wherein a compiler determines whether an operator of the streaming application is safe to parallelize based on a definition of the operator and an instance of the definition. The operator is not safe to parallelize when the operator has selectivity greater than 1, wherein the selectivity is the number of output tuples generated for each input tuple. Parallel regions are formed within the streaming application with the compiler when the operator is safe to parallelize. Synchronization strategies for the parallel regions are determined with the compiler, wherein the synchronization strategies are determined based on the definition of the operator and the instance of the definition. The synchronization strategies of the parallel regions are enforced with a runtime system.Type: ApplicationFiled: October 12, 2012Publication date: March 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bugra Gedik, Martin J. Hirzel, Scott A. Schneider, Kun-Lung Wu
-
Publication number: 20140068582Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.Type: ApplicationFiled: September 10, 2012Publication date: March 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
-
Publication number: 20140068581Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.Type: ApplicationFiled: August 30, 2012Publication date: March 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
-
Publication number: 20140068577Abstract: An embodiment of the invention provides a method for exploiting stateless and stateful data parallelism in a streaming application, wherein a compiler determines whether an operator of the streaming application is safe to parallelize based on a definition of the operator and an instance of the definition. The operator is not safe to parallelize when the operator has selectivity greater than 1, wherein the selectivity is the number of output tuples generated for each input tuple. Parallel regions are formed within the streaming application with the compiler when the operator is safe to parallelize. Synchronization strategies for the parallel regions are determined with the compiler, wherein the synchronization strategies are determined based on the definition of the operator and the instance of the definition. The synchronization strategies of the parallel regions are enforced with a runtime system.Type: ApplicationFiled: August 28, 2012Publication date: March 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bugra Gedik, Martin J. Hirzel, Scott A. Schneider, Kun-Lung Wu
-
Patent number: 8667476Abstract: Multiple instructions including branch instructions are grouped into a condensed variable length instruction. A trimmed and grouped branch instruction branches to one of the instructions in the same group. Therefore, a grouped instruction including branch(es) inherently executes correct branch behaviors without deploying any branch prediction schemes. In addition, a grouped instruction in a condensed form delivers multiple operations to execute without fetching all of the instructions grouped separately from the instruction memory via its caches while conserving instruction memory and/or cache as well as decreasing the number of bit switching on the bus. Software developers can make their own compatible, compact and ciphered instruction sets after grouping existing instructions in their software compiled with existing software compilers and the associated microprocessors.Type: GrantFiled: January 6, 2010Date of Patent: March 4, 2014Assignee: Adaptmicrosys LLCInventor: Yong-Kyu Jung
-
Patent number: 8661424Abstract: A code generation system comprises a model analyzer configured to identify data dependencies in a data flow diagram that describes functional behavior of an application, wherein the model analyzer is further configured to compute a data and computation map based on the data dependencies and to compute one or more implementation constraints; a model partitioner configured to compute one or more partition boundaries based on the data and computation map and the one or more implementation constraints; and a code generator configured to generate parallelized code based on the data flow diagram, the one or more implementation constraints, and the one or more partition boundaries, wherein the code generator is configured to map the code corresponding to each partition defined by the one or more partition boundaries to one of a plurality of cores of a multi-core processor, and to generate inter-core communication code for at least one line of the data and computation map crossed by the one or more partition boundariType: GrantFiled: September 2, 2010Date of Patent: February 25, 2014Assignee: Honeywell International Inc.Inventors: Kirk Schloegel, Devesh Bhatt
-
Patent number: 8656141Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a pipelined processor configured to process multiple streams of instructions for the processor; and a switch including switching circuitry to forward data over data paths from other tiles to one or more pipeline stages of the processor and to switches of other tiles. At least some of the data is forwarded based on one or more streams of instructions for the switch.Type: GrantFiled: December 13, 2005Date of Patent: February 18, 2014Assignee: Massachusetts Institute of TechnologyInventor: Anant Agarwal
-
Patent number: 8656375Abstract: A cross-logical entity group is created that includes one or more accelerators to be shared by a plurality of logical entities. Instantiated on the accelerators are functions that are common across multiple logical entities. The functions to be instantiated are determined, for instance, dynamically during run-time.Type: GrantFiled: November 2, 2009Date of Patent: February 18, 2014Assignee: International Business Machines CorporationInventors: Rajaram B. Krishnamurthy, Thomas A. Gregg
-
Patent number: 8656376Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.Type: GrantFiled: September 1, 2011Date of Patent: February 18, 2014Assignee: National Tsing Hua UniversityInventors: Jenq Kuen Lee, Chi Bang Kuan
-
Publication number: 20140047421Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.Type: ApplicationFiled: August 21, 2013Publication date: February 13, 2014Applicant: International Business Machines CorporationInventors: Shuhichi Shimizu, Takeo Yoshizawa
-
Patent number: 8650554Abstract: A mechanism is provided for improving single-thread performance for a multi-threaded, in-order processor core. In a first phase, a compiler analyzes application code to identify instructions that can be executed in parallel with focus on instruction-level parallelism and removing any register interference between the threads. The compiler inserts as appropriate synchronization instructions supported by the apparatus to ensure that the resulting execution of the threads is equivalent to the execution of the application code in a single thread. In a second phase, an operating system schedules the threads produced in the first phase on the hardware threads of a single processor core such that they execute simultaneously. In a third phase, the microprocessor core executes the threads specified by the second phase such that there is one hardware thread executing an application thread.Type: GrantFiled: April 27, 2010Date of Patent: February 11, 2014Assignee: International Business Machines CorporationInventors: Elmootazbellah N. Elnozahy, Ahmed Gheith
-
Patent number: 8650384Abstract: Provided is a method and system for dynamically parallelizing an application program. Specifically, provided is a method and system having multi-core control that may verify a number of available threads according to an application program and dynamically parallelize data based on the verified number of available threads. The method and system for dynamically parallelizing the application program may divide a data block to be processed according to the application program based on a relevant data characteristic and dynamically map the threads to division blocks, and thereby enhance a system performance.Type: GrantFiled: April 27, 2010Date of Patent: February 11, 2014Assignees: Samsung Electronics Co., Ltd., University of Southern CaliforniaInventors: Seung Won Lee, Shi Hwa Lee, Dong-In Kang, Mikyung Kang
-
Patent number: 8648621Abstract: Disclosed are methods and devices, among which is a device that includes a finite state machine lattice. The lattice may include a counter suitable for counting a number of times a programmable element in the lattice detects a condition. The counter may be configured to output in response to counting the condition was detected a certain number of times. For example, the counter may be configured to output in response to determining a condition was detected at least (or no more than) the certain number of times, determining the condition was detected exactly the certain number of times, or determining the condition was detected within a certain range of times. The counter may be coupled to other counters in the device for determining high-count operations and/or certain quantifiers.Type: GrantFiled: December 15, 2011Date of Patent: February 11, 2014Assignee: Micron Technology, Inc.Inventors: Harold B Noyes, David R. Brown, Paul Glendenning
-
Patent number: 8645920Abstract: The debugging of a kernel in a data parallel environment. A debugger engine interfaces with a data parallel environment that is running one or more data parallel kernels through a first interface. For each of at least one of the one or more kernels, a program object is formulated that abstractly represents the data parallel kernel including data parallel functionality of the kernel. The program object has a second interface that allows information regarding the kernel to be discovered by the debugger user interface module.Type: GrantFiled: December 10, 2010Date of Patent: February 4, 2014Assignee: Microsoft CorporationInventor: Paul E. Maybee
-
Patent number: 8640112Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.Type: GrantFiled: March 30, 2011Date of Patent: January 28, 2014Assignee: National Instruments CorporationInventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
-
Patent number: 8635606Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.Type: GrantFiled: October 13, 2009Date of Patent: January 21, 2014Assignee: Empire Technology Development LLCInventor: Ezekiel John Joseph Kruglick
-
Patent number: 8627300Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.Type: GrantFiled: October 13, 2009Date of Patent: January 7, 2014Assignee: Empire Technology Development LLCInventor: Ezekiel John Joseph Kruglick
-
Patent number: 8627301Abstract: A method for concurrent management of adaptive programs is disclosed wherein changes in a set of modifiable references are initially identified. A list of uses of the changed references is next computed using records made in structures of the references. The list is next inserted into an elimination queue. Comparison is next made of each of the uses to the other uses to determine independence or dependence thereon. Determined dependent uses are eliminated and the preceding steps are repeated for all determined independent uses until all dependencies have been eliminated.Type: GrantFiled: May 18, 2007Date of Patent: January 7, 2014Assignee: Intel CorporationInventors: Matthew Hammer, Mohan Rajagopalan, Anwar Ghuloum
-
Patent number: 8620991Abstract: Technologies for enabling a continuation based runtime to accept or reject external stimulus and, in addition, to determine if an external stimulus may be valid for processing at a later point in execution.Type: GrantFiled: August 1, 2012Date of Patent: December 31, 2013Assignee: Microsoft CorporationInventors: Kenneth David Wolf, Justin David Brown, Karthik Raman, Nathan Christopher Talbert, Edmund Samuel Victor Pinto
-
Patent number: 8621446Abstract: Compiling software for a hierarchical distributed processing system including providing to one or more compiling nodes software to be compiled, wherein at least a portion of the software to be compiled is to be executed by one or more other nodes; compiling, by the compiling node, the software; maintaining, by the compiling node, any compiled software to be executed on the compiling node; selecting, by the compiling node, one or more nodes in a next tier of the hierarchy of the distributed processing system in dependence upon whether any compiled software is for the selected node or the selected node's descendants; sending to the selected node only the compiled software to be executed by the selected node or selected node's descendant.Type: GrantFiled: April 29, 2010Date of Patent: December 31, 2013Assignee: International Business Machines CorporationInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 8612510Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values.Type: GrantFiled: January 12, 2010Date of Patent: December 17, 2013Assignee: Google Inc.Inventors: Jeffrey Dean, Sanjay Ghemawat