For A Parallel Or Multiprocessor System Patents (Class 717/149)
  • Patent number: 8756590
    Abstract: A compile environment is provided in a computer system that allows programmers to program both CPUs and data parallel devices (e.g., GPUs) using a high level general purpose programming language that has data parallel (DP) extensions. A compilation process translates modular DP code written in the general purpose language into DP device source code in a high level DP device programming language using a set of binding descriptors for the DP device source code. A binder generates a single, self-contained DP device source code unit from the set of binding descriptors. A DP device compiler generates a DP device executable for execution on one or more data parallel devices from the DP device source code unit.
    Type: Grant
    Filed: June 22, 2010
    Date of Patent: June 17, 2014
    Assignee: Microsoft Corporation
    Inventors: Weirong Zhu, Lingli Zhang, Sukhdeep S. Sodhi, Yosseff Levanoni
  • Patent number: 8752018
    Abstract: One embodiment of the present invention sets forth a technique for emitting coherent output from multiple threads for the printf( ) function. Additionally, parallel (not divergent) execution of the threads for the printf( ) function is maintained when possible to improve run-time performance. Processing of the printf( ) function is separated into two tasks, gathering of the per thread data and formatting the gathered data according to the formatting codes for display. The threads emit a coherent stream of contiguous segments, where each segment includes the format string for the printf( ) function and the gathered data for a thread. The coherent stream is written by the threads and read by a display processor. The display processor executes a single thread to format the gathered data according to the format string for display.
    Type: Grant
    Filed: June 21, 2011
    Date of Patent: June 10, 2014
    Assignee: NVIDIA Corporation
    Inventors: Stephen Jones, Geoffrey Gerfin
  • Patent number: 8752036
    Abstract: Embodiments of the invention provide systems and methods for throughput-aware software pipelining in compilers to produce optimal code for single-thread and multi-thread execution on multi-threaded systems. A loop is identified within source code as a candidate for software pipelining. An attempt is made to generate pipelined code (e.g., generate an instruction schedule and a set of register assignments) for the loop in satisfaction of throughput-aware pipelining criteria, like maximum register count, minimum trip count, target core pipeline resource utilization, maximum code size, etc. If the attempt fails to generate code in satisfaction of the criteria, embodiments adjust one or more settings (e.g., by reducing scalarity or latency settings being used to generate the instruction schedule).
    Type: Grant
    Filed: October 31, 2011
    Date of Patent: June 10, 2014
    Assignee: Oracle International Corporation
    Inventors: Spiros Kalogeropulos, Partha Tirumalai
  • Patent number: 8745603
    Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
    Type: Grant
    Filed: May 10, 2013
    Date of Patent: June 3, 2014
    Assignee: Google Inc.
    Inventors: Morgan S. McGuire, Christopher G. Demetriou, Brian K. Grant, Matthew N. Papakipos
  • Patent number: 8745604
    Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a processor, a switch including switching circuitry to forward data over data paths from other tiles to the processor and to switches of other tiles, and a switch memory that stores instruction streams that are able to operate independently for respective output ports of the switch.
    Type: Grant
    Filed: February 25, 2008
    Date of Patent: June 3, 2014
    Assignee: Massachusetts Institute of Technology
    Inventor: Anant Agarwal
  • Patent number: 8739142
    Abstract: Disclosed are a method and system for optimized, dynamic data-dependent program execution. The disclosed system comprises a statistics computer which computes statistics of the incoming data at the current time instant, where the said statistics include the probability distribution of the incoming data, the probability distribution over program modules induced by the incoming data, the probability distribution induced over program outputs by the incoming data, and the time-complexity of each program module for the incoming data, wherein the said statistics are computed on as a function of current and past data, and previously computed statistics; a plurality of alternative execution path orders designed prior to run-time by the use of an appropriate source code; a source code selector which selects one of the execution path orders as a function of the statistics computed by the statistics computer; a complexity measurement which measures the time-complexity of the currently selected execution path-order.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: May 27, 2014
    Assignee: International Business Machines Corporation
    Inventors: Dake He, Ashish Jagmohan, Jian Lou, Ligang Lu
  • Patent number: 8732684
    Abstract: According to one embodiment, a first program code including a plurality of variables is converted to a second program code to be executed by a multi-core processor including a plurality of cores. Specifically, an access pattern of each variable in the first program code is decided. All variables in the first program code are classified into a plurality of groups each of which variables belong to the same access pattern. A member structure of each group having variables belonging to the same access pattern is created. Each member structure includes variables of one group. A route-pointer indicating an address (in a memory) of variables of the member structure is created. The variables in the first program code are converted to the member structure and the route-pointer (in the second program code) that indicate the variables. The second program code is outputted to the multi-core processor.
    Type: Grant
    Filed: January 25, 2011
    Date of Patent: May 20, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Nobuaki Tojo, Ken Tanabe, Hidenori Matsuzaki
  • Patent number: 8732685
    Abstract: A method and medium are disclosed for executing a technical computing program in parallel in multiple execution environments. A program is invoked for execution in a first execution environment and from the invocation the program is executed in the first execution environment and one or more additional execution environments to provide for parallel execution of the program. New constructs in a technical computing programming language are disclosed for parallel programming of a technical computing program for execution in multiple execution environments. It is also further disclosed a system and method for changing the mode of operation of an execution environment from a sequential mode to a parallel mode of operation and vice-versa.
    Type: Grant
    Filed: February 3, 2011
    Date of Patent: May 20, 2014
    Assignee: The Mathworks, Inc.
    Inventor: Cleve Moler
  • Patent number: 8726249
    Abstract: A bootup device and method for an application program on a mobile equipment to improve the bootup speed of the application program on the mobile equipment. The bootup device has an application management module, that boots up a virtual machine module based on the application program to be run. A virtual machine module, loads codes of the application program and Just in Time (JIT) compilation results of a bootup process of the application program into a memory, search, in the JIT compilation results, for local JIT compiled codes corresponding to the bootup process code segment to be executed, and executes the found local JIT compiled codes when executing each bootup process code segment of the application program. A storage management module, store and reads the codes of the application program and the JIT compilation results obtained from the JIT compilation of the bootup process of the application program.
    Type: Grant
    Filed: February 21, 2011
    Date of Patent: May 13, 2014
    Assignee: ZTE Corportaion
    Inventors: Youpeng Gu, Lifeng Xu, Wei Hu, Sheng Zhong, Wei Wang, Zemin Wang
  • Patent number: 8726250
    Abstract: Programming of modules which can be reprogrammed during operation is described. Partitioning of code sequences is also described.
    Type: Grant
    Filed: March 10, 2010
    Date of Patent: May 13, 2014
    Assignee: Pact XPP Technologies AG
    Inventors: Martin Vorbach, Armin Nückel
  • Patent number: 8726251
    Abstract: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.
    Type: Grant
    Filed: March 29, 2011
    Date of Patent: May 13, 2014
    Assignee: Oracle International Corporation
    Inventors: Spiros Kalogeropulos, Partha Pal Tirumalai
  • Patent number: 8726238
    Abstract: Interactive iterative program parallelization based on dynamic feedback program parallelization, in one aspect, may identify a ranked list of one or more candidate pieces of code each with one or more source refactorings that can be applied to parallelize the code, apply at least one of the one or more refactorings to create a revised code, and determine performance data associated with the revised code. The performance data may be used to make decisions on identifying next possible ranked list of refactorings.
    Type: Grant
    Filed: February 22, 2010
    Date of Patent: May 13, 2014
    Assignee: International Business Machines Corporation
    Inventors: Evelyn Duesterwald, Robert M. Fuhrer, Vijay Saraswat
  • Patent number: 8719803
    Abstract: A parallelism policy object provides a control parallelism interface whose implementation evaluates parallelism conditions that are left unspecified in the interface. User-defined and other parallelism policy procedures can make recommendations to a worker program for transitioning between sequential program execution and parallel execution. Parallelizing assistance values obtained at runtime can be used in the parallelism conditions on which the recommendations are based. A consistent parallelization policy can be employed across a range of parallel constructs, and inside recursive procedures.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: May 6, 2014
    Assignee: Microsoft Corporation
    Inventors: Stephen Toub, Igor Ostrovsky, Joe Duffy, Vance Morrison, Huseyin Yildiz
  • Patent number: 8719806
    Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is a speculative prefetch thread to perform instruction prefetch and/or trace pre-build for the main thread.
    Type: Grant
    Filed: September 10, 2010
    Date of Patent: May 6, 2014
    Assignee: Intel Corporation
    Inventors: Hong Wang, Tor M. Aamodt, Pedro Marcuello, Jared W. Stark, IV, John P. Shen, Antonio Gonzalez, Per Hammarlund, Gerolf F. Hoflehner, Perry H. Wang, Steve Shih-wei Liao
  • Patent number: 8713545
    Abstract: A data processing system includes a host computer, an additional computer, an application module including a first executable code, a module for analyzing said first executable code and a module for generating a second executable code segmented notably into code blocks which are executed in a preferential manner on one of the two computers. The second executable code includes a sub-module for managing the distribution of the processing operations between the host computer and the additional computer and a sub-module for managing the additional computer as a virtual machine which executes the blocks allocated to the additional computer.
    Type: Grant
    Filed: March 13, 2009
    Date of Patent: April 29, 2014
    Assignee: Silkan
    Inventor: Pierre Fiorini
  • Patent number: 8706702
    Abstract: A method for managing data in a collaborative service-oriented workshop, which is adapted to treat objects associated with data representative of real or process data, is provided to share data and resources in an architecture of a workspace. The architecture is adapted to design complex objects and manipulate information technology objects that represent data, which may be representative of a real object or a process based on metadata representing characteristic data. The metadata includes a generic part that is common to all data, a specific part that is inherent to the type of data, and links to other objects. The links make it possible to establish, at a later time, the traceability of the data, or in other words the traceability between the different data produced or used during execution of processes.
    Type: Grant
    Filed: May 14, 2009
    Date of Patent: April 22, 2014
    Assignee: Airbus Operations S.A.S.
    Inventors: Bernard Marquez, Thierry Chevalier, Philippe Sauvage
  • Patent number: 8707281
    Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.
    Type: Grant
    Filed: July 23, 2012
    Date of Patent: April 22, 2014
    Assignee: The MathWorks, Inc.
    Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
  • Patent number: 8707280
    Abstract: A computing device-implemented method includes receiving a program, analyzing and transforming the program, determining an inner context and an outer context of the program based on the analysis of the program, and allocating one or more portions of the inner context of the program to two or more labs for parallel execution. The method also includes receiving one or more results associated with the parallel execution of the one or more portions from the two or more labs, and providing the one or more results to the outer context of the program.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: April 22, 2014
    Assignee: The MathWorks, Inc.
    Inventors: Halldor N Stefansson, Brett Baker, Edric Ellis, Joseph F Hicklin, John N Little, Jocelyn Luke Martin, Piotr R Luszczek, Nausheen B Moulana, Loren Dean, Roy E. Lurie
  • Publication number: 20140109069
    Abstract: A method of compiling a program to be executed on a multicore processor is provided. The method may include generating an initial solution by mapping a task to a source processing element (PE) and a destination PE, and selecting a communication scheme for transmission of the task from the source PE to the destination PE, approximately optimizing the mapping and communication scheme included in the initial solution, and scheduling the task, wherein the communication scheme is designated in a compiling process.
    Type: Application
    Filed: October 11, 2013
    Publication date: April 17, 2014
    Applicants: SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jin-Hoo LEE, Moo-Kyoung CHUNG, Key-Young CHOI, Yeon-Gon CHO, Soo-Jung RYU
  • Patent number: 8701099
    Abstract: A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates.
    Type: Grant
    Filed: November 2, 2010
    Date of Patent: April 15, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ganesh Bikshandi, Dibyendu Das, Smruti Ranjan Sarangi
  • Patent number: 8694975
    Abstract: A first compiler generates one or more object codes from a program code for a first processor included in an arithmetic processing system to which a plurality of processors are mutually connected. A first linker links the generated one or more object codes to generate an execution file for the first processor. A parameter information generation unit generates, based on the information acquired from the first linker, parameter information used in a second processor included in the arithmetic processing system. A second compiler refers to a program code and the parameter information for the second processor to generate one or more object codes. A second linker links the generated one or more object codes to generate an execution file for the second processor.
    Type: Grant
    Filed: July 23, 2009
    Date of Patent: April 8, 2014
    Assignee: NEC Corporation
    Inventor: Tomoyoshi Kobori
  • Publication number: 20140096117
    Abstract: A data dependence analysis support device calculates pointer information by performing a context-sensitive pointer analysis on every pointer used in a program; calculates dataflow information between statements by performing a context-sensitive dataflow analysis, using the context-sensitive pointer information, on all statements in an analysis target region and all statements that might be called upon execution of the analysis target region; and calculates inter-region data dependence information, using the dataflow information, for two or more threaded regions included in the source program.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Applicant: PANASONIC CORPORATION
    Inventor: Akira Tanaka
  • Patent number: 8689231
    Abstract: One or more embodiments of the invention enable a system and method for ordering tasks with complex interrelationships. The present invention as described herein may be used to produce a linear ordering of tasks with complex interrelationships including dependencies and constraints. In one or more embodiments optional tasks may be permitted such that a given task may or may not be added to the execution queue depending on the scheduling of earlier tasks following evaluation of their dependencies—that is, the system of the invention supports the management of optional tasks in a task ordering operation where some or all of tasks have complex interdependencies.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: April 1, 2014
    Assignee: SAP AG
    Inventor: Brent Milnor
  • Patent number: 8689196
    Abstract: The display of a debugging interface for use with parallel computing. When a break state has been entered in a particular code context (such as a method) by a particular execution context (such as a thread), related execution contexts are found that were also executing in the particular code context. While in the break state, multiple expressions are then evaluated for each of the execution contexts. The results are then displayed with perhaps navigation controls that allow the results to be efficiently navigated.
    Type: Grant
    Filed: December 10, 2010
    Date of Patent: April 1, 2014
    Assignee: Microsoft Corporation
    Inventors: Paul E. Maybee, Daniel Moth
  • Patent number: 8677329
    Abstract: A method and an apparatus that instructs a compiler server to build or otherwise obtain a compiled code corresponding to a compilation request received from an application are described. The compiler server may be configured to compile source codes for a plurality of independent applications, each running in a separate process, using a plurality of independent compilers, each running in a separate compiler process. A search may be performed in a cache for a compiled code that satisfies a compilation request received from an application. A reply message including the compiled code can be provided for the application, wherein the compiled code is compiled in direct response to the request, or is obtained from the cache if the search identifies in the cache the compiled code that satisfies the compilation request.
    Type: Grant
    Filed: June 3, 2009
    Date of Patent: March 18, 2014
    Assignee: Apple Inc.
    Inventors: Robert Beretta, Nicholas William Burns, Nathaniel Begeman, Phillip Kent Miller, Geoffrey Grant Stahl
  • Patent number: 8677334
    Abstract: A computer-implemented method, system, and article of manufacture for parallelizing a code configured by coupling a functional block having an internal state and a functional block without any internal state. The method includes: creating and storing a graphical representation where functional blocks are chosen as nodes and connections between functional blocks are chosen as links; visiting the nodes on the graphical representation sequentially, detecting inputs from functional blocks without any internal state to functional blocks having an internal state and storing these functional blocks as a set of use blocks, and detecting inputs from functional blocks having an internal state to functional blocks without any internal state and storing these functional blocks as a set of definition blocks; and forming strands of functional blocks based on information on the set of use blocks and information on the set of definition blocks stored in association with the functional blocks.
    Type: Grant
    Filed: October 28, 2010
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Arquimedes Martinez Canedo, Hideaki Komatsu, Takeo Yoshizawa
  • Patent number: 8677332
    Abstract: Systems and methods for compiling one or more code blocks written in programming language are provided. In some aspects, display associated with application is provided. Display includes plurality of graphical objects. That each of plurality of graphical objects is associated with child code block in one-to-one association between graphical objects and child code blocks is determined. Each child code block is written in programming language. The child code blocks associated with plurality of graphical objects are transformed into single parent code block. Parent code block, upon compiling, is configured to be reused across execution contexts and to allow injection of global scope. Parent code block, upon specific execution, includes execution context for specified child code block. Parent code block is configured to receive indication of specified child code block for initiating execution of parent code block. Parent code block is compiled.
    Type: Grant
    Filed: July 24, 2012
    Date of Patent: March 18, 2014
    Assignee: Google Inc.
    Inventors: John Hjelmstad, Malte Ubl
  • Patent number: 8677330
    Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.
    Type: Grant
    Filed: June 9, 2010
    Date of Patent: March 18, 2014
    Assignee: Altera Corporation
    Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
  • Patent number: 8671400
    Abstract: A technique includes providing first objects that are associated with an application session and in a processor-based system, identifying second objects in another application session corresponding to the first objects based at least in part on a comparison of the second objects to matching rules associated with the first objects.
    Type: Grant
    Filed: December 23, 2009
    Date of Patent: March 11, 2014
    Assignee: Intel Corporation
    Inventors: Christopher J. Cormack, Nathaniel Duca, Joseph D. Matarazzo
  • Publication number: 20140068578
    Abstract: An embodiment of the invention provides a method for exploiting stateless and stateful data parallelism in a streaming application, wherein a compiler determines whether an operator of the streaming application is safe to parallelize based on a definition of the operator and an instance of the definition. The operator is not safe to parallelize when the operator has selectivity greater than 1, wherein the selectivity is the number of output tuples generated for each input tuple. Parallel regions are formed within the streaming application with the compiler when the operator is safe to parallelize. Synchronization strategies for the parallel regions are determined with the compiler, wherein the synchronization strategies are determined based on the definition of the operator and the instance of the definition. The synchronization strategies of the parallel regions are enforced with a runtime system.
    Type: Application
    Filed: October 12, 2012
    Publication date: March 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bugra Gedik, Martin J. Hirzel, Scott A. Schneider, Kun-Lung Wu
  • Publication number: 20140068582
    Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
  • Publication number: 20140068581
    Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
  • Publication number: 20140068577
    Abstract: An embodiment of the invention provides a method for exploiting stateless and stateful data parallelism in a streaming application, wherein a compiler determines whether an operator of the streaming application is safe to parallelize based on a definition of the operator and an instance of the definition. The operator is not safe to parallelize when the operator has selectivity greater than 1, wherein the selectivity is the number of output tuples generated for each input tuple. Parallel regions are formed within the streaming application with the compiler when the operator is safe to parallelize. Synchronization strategies for the parallel regions are determined with the compiler, wherein the synchronization strategies are determined based on the definition of the operator and the instance of the definition. The synchronization strategies of the parallel regions are enforced with a runtime system.
    Type: Application
    Filed: August 28, 2012
    Publication date: March 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bugra Gedik, Martin J. Hirzel, Scott A. Schneider, Kun-Lung Wu
  • Patent number: 8667476
    Abstract: Multiple instructions including branch instructions are grouped into a condensed variable length instruction. A trimmed and grouped branch instruction branches to one of the instructions in the same group. Therefore, a grouped instruction including branch(es) inherently executes correct branch behaviors without deploying any branch prediction schemes. In addition, a grouped instruction in a condensed form delivers multiple operations to execute without fetching all of the instructions grouped separately from the instruction memory via its caches while conserving instruction memory and/or cache as well as decreasing the number of bit switching on the bus. Software developers can make their own compatible, compact and ciphered instruction sets after grouping existing instructions in their software compiled with existing software compilers and the associated microprocessors.
    Type: Grant
    Filed: January 6, 2010
    Date of Patent: March 4, 2014
    Assignee: Adaptmicrosys LLC
    Inventor: Yong-Kyu Jung
  • Patent number: 8661424
    Abstract: A code generation system comprises a model analyzer configured to identify data dependencies in a data flow diagram that describes functional behavior of an application, wherein the model analyzer is further configured to compute a data and computation map based on the data dependencies and to compute one or more implementation constraints; a model partitioner configured to compute one or more partition boundaries based on the data and computation map and the one or more implementation constraints; and a code generator configured to generate parallelized code based on the data flow diagram, the one or more implementation constraints, and the one or more partition boundaries, wherein the code generator is configured to map the code corresponding to each partition defined by the one or more partition boundaries to one of a plurality of cores of a multi-core processor, and to generate inter-core communication code for at least one line of the data and computation map crossed by the one or more partition boundari
    Type: Grant
    Filed: September 2, 2010
    Date of Patent: February 25, 2014
    Assignee: Honeywell International Inc.
    Inventors: Kirk Schloegel, Devesh Bhatt
  • Patent number: 8656141
    Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a pipelined processor configured to process multiple streams of instructions for the processor; and a switch including switching circuitry to forward data over data paths from other tiles to one or more pipeline stages of the processor and to switches of other tiles. At least some of the data is forwarded based on one or more streams of instructions for the switch.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: February 18, 2014
    Assignee: Massachusetts Institute of Technology
    Inventor: Anant Agarwal
  • Patent number: 8656375
    Abstract: A cross-logical entity group is created that includes one or more accelerators to be shared by a plurality of logical entities. Instantiated on the accelerators are functions that are common across multiple logical entities. The functions to be instantiated are determined, for instance, dynamically during run-time.
    Type: Grant
    Filed: November 2, 2009
    Date of Patent: February 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Rajaram B. Krishnamurthy, Thomas A. Gregg
  • Patent number: 8656376
    Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.
    Type: Grant
    Filed: September 1, 2011
    Date of Patent: February 18, 2014
    Assignee: National Tsing Hua University
    Inventors: Jenq Kuen Lee, Chi Bang Kuan
  • Publication number: 20140047421
    Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.
    Type: Application
    Filed: August 21, 2013
    Publication date: February 13, 2014
    Applicant: International Business Machines Corporation
    Inventors: Shuhichi Shimizu, Takeo Yoshizawa
  • Patent number: 8650554
    Abstract: A mechanism is provided for improving single-thread performance for a multi-threaded, in-order processor core. In a first phase, a compiler analyzes application code to identify instructions that can be executed in parallel with focus on instruction-level parallelism and removing any register interference between the threads. The compiler inserts as appropriate synchronization instructions supported by the apparatus to ensure that the resulting execution of the threads is equivalent to the execution of the application code in a single thread. In a second phase, an operating system schedules the threads produced in the first phase on the hardware threads of a single processor core such that they execute simultaneously. In a third phase, the microprocessor core executes the threads specified by the second phase such that there is one hardware thread executing an application thread.
    Type: Grant
    Filed: April 27, 2010
    Date of Patent: February 11, 2014
    Assignee: International Business Machines Corporation
    Inventors: Elmootazbellah N. Elnozahy, Ahmed Gheith
  • Patent number: 8650384
    Abstract: Provided is a method and system for dynamically parallelizing an application program. Specifically, provided is a method and system having multi-core control that may verify a number of available threads according to an application program and dynamically parallelize data based on the verified number of available threads. The method and system for dynamically parallelizing the application program may divide a data block to be processed according to the application program based on a relevant data characteristic and dynamically map the threads to division blocks, and thereby enhance a system performance.
    Type: Grant
    Filed: April 27, 2010
    Date of Patent: February 11, 2014
    Assignees: Samsung Electronics Co., Ltd., University of Southern California
    Inventors: Seung Won Lee, Shi Hwa Lee, Dong-In Kang, Mikyung Kang
  • Patent number: 8648621
    Abstract: Disclosed are methods and devices, among which is a device that includes a finite state machine lattice. The lattice may include a counter suitable for counting a number of times a programmable element in the lattice detects a condition. The counter may be configured to output in response to counting the condition was detected a certain number of times. For example, the counter may be configured to output in response to determining a condition was detected at least (or no more than) the certain number of times, determining the condition was detected exactly the certain number of times, or determining the condition was detected within a certain range of times. The counter may be coupled to other counters in the device for determining high-count operations and/or certain quantifiers.
    Type: Grant
    Filed: December 15, 2011
    Date of Patent: February 11, 2014
    Assignee: Micron Technology, Inc.
    Inventors: Harold B Noyes, David R. Brown, Paul Glendenning
  • Patent number: 8645920
    Abstract: The debugging of a kernel in a data parallel environment. A debugger engine interfaces with a data parallel environment that is running one or more data parallel kernels through a first interface. For each of at least one of the one or more kernels, a program object is formulated that abstractly represents the data parallel kernel including data parallel functionality of the kernel. The program object has a second interface that allows information regarding the kernel to be discovered by the debugger user interface module.
    Type: Grant
    Filed: December 10, 2010
    Date of Patent: February 4, 2014
    Assignee: Microsoft Corporation
    Inventor: Paul E. Maybee
  • Patent number: 8640112
    Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.
    Type: Grant
    Filed: March 30, 2011
    Date of Patent: January 28, 2014
    Assignee: National Instruments Corporation
    Inventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
  • Patent number: 8635606
    Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: January 21, 2014
    Assignee: Empire Technology Development LLC
    Inventor: Ezekiel John Joseph Kruglick
  • Patent number: 8627300
    Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: January 7, 2014
    Assignee: Empire Technology Development LLC
    Inventor: Ezekiel John Joseph Kruglick
  • Patent number: 8627301
    Abstract: A method for concurrent management of adaptive programs is disclosed wherein changes in a set of modifiable references are initially identified. A list of uses of the changed references is next computed using records made in structures of the references. The list is next inserted into an elimination queue. Comparison is next made of each of the uses to the other uses to determine independence or dependence thereon. Determined dependent uses are eliminated and the preceding steps are repeated for all determined independent uses until all dependencies have been eliminated.
    Type: Grant
    Filed: May 18, 2007
    Date of Patent: January 7, 2014
    Assignee: Intel Corporation
    Inventors: Matthew Hammer, Mohan Rajagopalan, Anwar Ghuloum
  • Patent number: 8620991
    Abstract: Technologies for enabling a continuation based runtime to accept or reject external stimulus and, in addition, to determine if an external stimulus may be valid for processing at a later point in execution.
    Type: Grant
    Filed: August 1, 2012
    Date of Patent: December 31, 2013
    Assignee: Microsoft Corporation
    Inventors: Kenneth David Wolf, Justin David Brown, Karthik Raman, Nathan Christopher Talbert, Edmund Samuel Victor Pinto
  • Patent number: 8621446
    Abstract: Compiling software for a hierarchical distributed processing system including providing to one or more compiling nodes software to be compiled, wherein at least a portion of the software to be compiled is to be executed by one or more other nodes; compiling, by the compiling node, the software; maintaining, by the compiling node, any compiled software to be executed on the compiling node; selecting, by the compiling node, one or more nodes in a next tier of the hierarchy of the distributed processing system in dependence upon whether any compiled software is for the selected node or the selected node's descendants; sending to the selected node only the compiled software to be executed by the selected node or selected node's descendant.
    Type: Grant
    Filed: April 29, 2010
    Date of Patent: December 31, 2013
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8612510
    Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values.
    Type: Grant
    Filed: January 12, 2010
    Date of Patent: December 17, 2013
    Assignee: Google Inc.
    Inventors: Jeffrey Dean, Sanjay Ghemawat