For A Parallel Or Multiprocessor System Patents (Class 717/149)
  • Patent number: 9286196
    Abstract: A method, apparatus and computer program, each for optimizing execution of a computer program is disclosed in which a topology-based control flow analysis of basic blocks of the computer program is performed and a data flow analysis of the instructions within the basic blocks is performed to determine if each instruction of said computer program is uniform or non-uniform (variant or invariant). Subsequently, when the computer program is executed, storage of a copy of a variable dependent on a uniform instruction is suppressed.
    Type: Grant
    Filed: January 8, 2015
    Date of Patent: March 15, 2016
    Assignee: ARM Limited
    Inventors: Jiangning Liu, Zhenqiang Chen
  • Patent number: 9286044
    Abstract: Hybrid parallelization strategies for machine learning programs on top of MapReduce are provided. In one embodiment, a method of and computer program product for parallel execution of machine learning programs are provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent.
    Type: Grant
    Filed: June 27, 2014
    Date of Patent: March 15, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
  • Patent number: 9280330
    Abstract: An apparatus and method for executing code are provided. The apparatus includes a memory manager that allocates a stack in memory to store processed data that needs to be retained; a loop generator that divides program code programmed to be processed in parallel into regions based on a barrier function, transforms a region that includes the processed data that needs to be retained in the stack into a first coalescing loop, and transforms a region that uses the processed data stored in the stack into a second coalescing loop such that the transformed program code may be serially processed; and a loop changer that reverses a processing order of the second coalescing loop in comparison to a processing order of the first coalescing loop.
    Type: Grant
    Filed: March 31, 2014
    Date of Patent: March 8, 2016
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jin-Seok Lee, Seong-Gun Kim, Dong-Hoon Yoo, Seok-Joong Hwang
  • Patent number: 9262141
    Abstract: In one embodiment, a computer-implemented method for concurrently processing at least a portion of a graphical model is provided. The method may include obtaining the graphical model; recognizing a pattern in the graphical model, the pattern suitable for concurrent processing; and employing concurrent processing using multi-thread, multi-core, or multi-processor computing device when executing the pattern in the graphical model.
    Type: Grant
    Filed: September 10, 2007
    Date of Patent: February 16, 2016
    Assignee: The MathWorks, Inc.
    Inventors: Donald Paul Orofino, II, Ramamurthy Mani, Michael James Longfritz
  • Patent number: 9250867
    Abstract: A computer-implemented method for creating a program for a multi-processor system comprising a plurality of interspersed processors and memories. A user may specify or create source code using a programming language. The source code specifies a plurality of tasks and communication of data among the plurality of tasks. However, the source code may not (and preferably is not required to) 1) explicitly specify which physical processor will execute each task and 2) explicitly specify which communication mechanism to use among the plurality of tasks. The method then creates machine language instructions based on the source code, wherein the machine language instructions are designed to execute on the plurality of processors. Creation of the machine language instructions comprises assigning tasks for execution on respective processors and selecting communication mechanisms between the processors based on location of the respective processors and required data communication to satisfy system requirements.
    Type: Grant
    Filed: May 22, 2014
    Date of Patent: February 2, 2016
    Assignee: Coherent Logix, Incorporated
    Inventors: John Mark Beardslee, Michael B. Doerr, Tommy K. Eng
  • Patent number: 9213570
    Abstract: A method for conveying a data packet received from a network to a virtual machine instantiated on a computer system coupled to the network, and a medium and system for carrying out the method, is described. In the method, a guest receive pointer queue of a component executing in the virtual machine is inspected in order to identify a location in a guest receive packet data buffer that is available to receive packet data. Data from the data packet received from the network is copied into the guest receive packet data buffer at the identified location, and a standard receive interrupt is raised in the virtual machine.
    Type: Grant
    Filed: January 29, 2015
    Date of Patent: December 15, 2015
    Assignee: VMware, Inc.
    Inventor: Michael Nelson
  • Patent number: 9207946
    Abstract: An approach is provided in which a distributed runtime environment executes a software application that includes isolated runtime constructs corresponding to an isolated runtime environment. During the execution, the distributed runtime environment identifies isolated runtime constructs included in the software application and selects distributed runtime constructs corresponding to the isolated runtime constructs. In turn, the distributed runtime environment executes the distributed runtime constructs in lieu of executing the isolated runtime constructs.
    Type: Grant
    Filed: August 27, 2013
    Date of Patent: December 8, 2015
    Assignee: International Business Machines Corporation
    Inventor: Douglas Davis
  • Patent number: 9170846
    Abstract: A distributed data-parallel execution (DDPE) system splits a computational problem into a plurality of sub-problems using a branch-and-bound algorithm, designates a synchronous stop time for a “plurality of processors” (for example, a cluster) for each round of execution, processes the search tree by recursively using a branch-and-bound algorithm in multiple rounds (without inter-processor communications), determines if further processing is required based on the processing round state data, and terminates processing on the processors when processing is completed.
    Type: Grant
    Filed: March 29, 2011
    Date of Patent: October 27, 2015
    Inventors: Daniel Delling, Mihai Budiu, Renato F. Werneck
  • Patent number: 9148669
    Abstract: A method and system for encoding a digital video signal using a plurality of parallel processors. A digital picture is received that is composed of one or more GOPs. The CPU then determines the number of GOPs that need to be encoded and divides them into groups. The number of GOPs in a group may equal the number of parallel processors in the multi-core platform available to encode. The CPU transfers in a single batch to the multi-core platform, a frame of equal rank from each GOP contained in the first group. The multi-core platform encodes the frames in parallel, rearranges the encoded byte stream chunk into normal display order sequence and stores the encoded byte stream. The process may repeat until all the GOPs in the first group have been encoded. Upon completion the multi-core platform outputs the encoded byte stream in normal display order sequence.
    Type: Grant
    Filed: March 10, 2011
    Date of Patent: September 29, 2015
    Assignee: Sony Corporation
    Inventors: Jonathan Huang, Tsaifa Yu
  • Patent number: 9146732
    Abstract: The invention provides a method and system to execute applications on a mobile device. The applications may be compiled on a remote server and sent to the mobile device before execution. The applications may be updated by the remote server without interaction by the mobile device user.
    Type: Grant
    Filed: May 9, 2014
    Date of Patent: September 29, 2015
    Assignee: MFOUNDRY, INC.
    Inventor: Rodney Aiglstorfer
  • Patent number: 9146834
    Abstract: Various arrangements for debugging code are presented. A computer system, such as a web server, may compile code into compiled code. The code may contain one or more subsections, include a first taskflow. A selection of the first taskflow may be received from a remote, developer computer system via a network. The selection of the first taskflow may indicate that the first taskflow is to be debugged. Execution of the first taskflow of the compiled code may occur by the computer system. While the computer system is executing the first taskflow of the compiled code, debugging functionality of the first taskflow may be provided to the developer computer system.
    Type: Grant
    Filed: May 1, 2014
    Date of Patent: September 29, 2015
    Assignee: Oracle International Corporation
    Inventor: John Smiljanic
  • Patent number: 9128747
    Abstract: Systems and method for optimizing the performance of software applications are described. Embodiments include computer implemented steps for identifying at least two constituent software components for parallel execution, executing the identified software components, profiling the performance of the one or more software components at an execution time, creating an optimization model with the set of data gathered from profiling the execution of the one or more software components, and marking at least two software components for execution in parallel in a subsequent execution on the basis of the optimization model. In additional embodiments, the optimization model may be reconfigured on the basis of a cost-benefit analysis of parallelization, and the software components involved marked for sequential execution if the resource overhead associated with parallelization exceeds the corresponding resource or throughput benefit.
    Type: Grant
    Filed: June 25, 2012
    Date of Patent: September 8, 2015
    Assignee: Infosys Limited
    Inventor: Prasanna Rajaraman
  • Patent number: 9110872
    Abstract: Genetic sequence data occurring in genome sequences is represented for efficient access of the sequence information in a defined storage scheme. A described replet-sequence matrix data structure allows the compression and efficient access of sequence information. The data structure allows the dynamic change of ontology: the replet-information table can evolve by adding, updating, removing replets, and the set of replets present in the table represent the ontology at the moment. The data structure enables the sequence information to be processed in parallel, and also enables multiple views of the sequence data to exist along with replet specific information.
    Type: Grant
    Filed: October 31, 2003
    Date of Patent: August 18, 2015
    Assignee: International Business Machines Corporation
    Inventor: Jagir Razak Jainul Abdeen Hussan
  • Patent number: 9098709
    Abstract: A method of converting an original application into a cloud-hosted application includes splitting the original application into a plurality of application components along security relevant boundaries, mapping the application components to hosting infrastructure boundaries, and using a mechanism to enforce a privacy policy of a user. The mapping may include assigning each application component to a distinct virtual machine, which acts as a container for its assigned component.
    Type: Grant
    Filed: November 13, 2012
    Date of Patent: August 4, 2015
    Assignee: International Business Machines Corporation
    Inventors: Mihai Christodorescu, Dimitrios Pendarakis, Kapil K. Singh
  • Patent number: 9063826
    Abstract: A method of mapping processes to processors in a parallel computing environment where a parallel application is to be run on a cluster of nodes wherein at least one of the nodes has multiple processors sharing a common memory, the method comprising using compiler based communication analysis to map Message Passing Interface processes to processors on the nodes, whereby at least some more heavily communicating processes are mapped to processors within nodes. Other methods, apparatus, and computer readable media are also provided.
    Type: Grant
    Filed: November 28, 2011
    Date of Patent: June 23, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dibyendu Das, Nagarajan Kathiresan, Rajan Ravindran, Bhaskaran Venkatsubramaniam
  • Patent number: 9043769
    Abstract: The present invention relates to a method for compiling code for a multi-core processor, comprising: detecting and optimizing a loop, partitioning the loop into partitions executable and mappable on physical hardware with optimal instruction level parallelism, optimizing the loop iterations and/or loop counter for ideal mapping on hardware, chaining the loop partitions generating a list representing the execution sequence of the partitions.
    Type: Grant
    Filed: December 28, 2010
    Date of Patent: May 26, 2015
    Assignee: Hyperion Core Inc.
    Inventor: Martin Vorbach
  • Patent number: 9043770
    Abstract: In one embodiment, a machine-implemented method programs a heterogeneous multi-processor computer system to run a plurality of program modules, wherein each program module is to be run on one of the processors The system includes a plurality of processors of two or more different processor types. According to the recited method, machine-implemented offline processing is performed using a plurality of SIET tools of a scheduling information extracting toolkit (SIET) and a plurality of SBT tools of a schedule building toolkit (SBT). A program module applicability analyzer (PMAA) determines whether a first processor of a first processor type is capable of running a first program module without compiling the first program module. Machine-implemented online processing is performed using realtime data to test the scheduling software and the selected schedule solution.
    Type: Grant
    Filed: January 23, 2013
    Date of Patent: May 26, 2015
    Assignee: LSI Corporation
    Inventors: Pavel Aleksandrovich Aliseychik, Petrus Sebastiaan Adrianus Daniel Evers, Denis Vasilevich Parfenov, Alexander Nikolaevich Filippov, Denis Vladimirovich Zaytsev
  • Patent number: 9038040
    Abstract: Partitioning programs between a general purpose core and one or more accelerators is provided. A compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.
    Type: Grant
    Filed: January 25, 2006
    Date of Patent: May 19, 2015
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel A. Prener
  • Patent number: 9036162
    Abstract: An image sensing and printing digital camera device includes a housing defining a slot for receiving a printed instruction card having printed thereon an array of dots representing a programming script, the housing further storing therein a roll of print media; an area image sensor for sensing an image and generating pixel data representing the image; a linear image sensor for scanning the array of dots on the card and converting the array of dots into a data signal; a microcontroller provided in the housing, the microcontroller for decoding the data signal into the programming script and applying the programming script on the pixel data; and a printing mechanism for printing the pixel data, having applied thereto the programming script, on the roll of print media. The microcontroller integrates on a single chip a VLIW processor, a printhead interface, and an output buffer effecting communication between the VLIW processor and the printhead interface.
    Type: Grant
    Filed: July 3, 2012
    Date of Patent: May 19, 2015
    Assignee: Google Inc.
    Inventor: Kia Silverbrook
  • Patent number: 9021426
    Abstract: According to one embodiment of the present disclosure, hardware initialization code and error action information are retrieved from separate storage areas. The hardware initialization code includes code that initializes a device, and also includes placeholders corresponding to actions that are performed when the device fails initialization. Likewise, the error action information describes the actions that are performed when the device fails initialization. The error action information is converted into macros that include lines of code. As such, the error action placeholders are matched to the macros and, in turn, each of the error action placeholders is replaced with the lines of code corresponding to the matched macros.
    Type: Grant
    Filed: December 4, 2012
    Date of Patent: April 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Daniel M. Crowell, John Farrugia, Michael J. Jones, David Dean Sanner
  • Publication number: 20150113514
    Abstract: Methods are provided for source-to-source transformations for graph processing on many-core platforms. A method includes receiving a graph application including one graph, expressed by a graph application programming interface configured for defining and manipulating graphs. The method further includes transforming, by a source-to-source compiler, the graph application into a plurality of parallel code variants. Each of the plurality of parallel code variants is specifically configured for parallel execution by a target one of a plurality of different many-core processors. The method also includes selecting and tuning, by a runtime component, a particular one of the parallel code variants for the parallel execution responsive to graph application characteristics, graph data, and an underlying code execution platform of the plurality of different many-core processors.
    Type: Application
    Filed: October 9, 2014
    Publication date: April 23, 2015
    Inventors: Srimat Chakradhar, Michela Becchi, Da Li
  • Patent number: 9015683
    Abstract: Provided is a method of transforming program code written such that a plurality of work-items are allocated respectively to and concurrently executed on a plurality of processing elements included in a computing unit. A program code translator may identify, in the program code, two or more code regions, which are to be enclosed by work-item coalescing loops (WCLs), based on a synchronization barrier function contained in the program code, such that the work-items are serially executable on a smaller number of processing elements than a number of the processing elements, and may enclose the identified code regions with the WCLs, respectively.
    Type: Grant
    Filed: December 23, 2010
    Date of Patent: April 21, 2015
    Assignees: Samsung Electronics Co., Ltd., SNU R&DB Foundation
    Inventors: Seung-Mo Cho, Jong-Deok Choi, Jaejin Lee
  • Patent number: 9015688
    Abstract: Methods and apparatuses associated with vectorization of scalar callee functions are disclosed herein. In various embodiments, compiling a first program may include generating one or more vectorized versions of a scalar callee function of the first program, based at least in part on vectorization annotations of the first program. Additionally, compiling may include generating one or more vectorized function signatures respectively associated with the one or more vectorized versions of the scalar callee function. The one or more vectorized function signatures may enable an appropriate vectorized version of the scalar callee function to be matched and invoked for a generic call from a caller function of a second program to a vectorized version of the scalar callee function.
    Type: Grant
    Filed: April 1, 2011
    Date of Patent: April 21, 2015
    Assignee: Intel Corporation
    Inventors: Xinmin Tian, Sergey Stanislavoich Kozhukhov, Sergey Victorovich Preis, Robert Yehuda Geva, Konstantin Anatolyevich Pyjov, Hideki Sato, Milind Baburao Girkar, Aleksei Gurievich Kasov, Nikolay Vladimirovich Panchenko
  • Patent number: 9009726
    Abstract: A “Concurrent Sharing Model” provides a programming model based on revisions and isolation types for concurrent revisions of states, data, or variables shared between two or more concurrent tasks or programs. This model enables revisions of shared states, data, or variables to maintain determinacy despite nondeterministic scheduling between concurrent tasks or programs. More specifically, the Concurrent Sharing Model provides various techniques wherein shared states, data, or variables are conceptually replicated on forks, and only copied or written if necessary, then deterministically merged on joins such that concurrent tasks or programs can work with independent local copies of the shared states, data, or variables while ensuring automated conflict resolution.
    Type: Grant
    Filed: December 10, 2010
    Date of Patent: April 14, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sebastian Burckhardt, Daniel Johannes Pieter Leijen, Alexandro Baldassin
  • Patent number: 9009660
    Abstract: Programming in a multiprocessor environment includes accepting a program specification that defines a plurality of processing modules and one or more channels for sending data between ports of the modules, mapping each of the processing modules to run on a set of one or more processing engines of a network of interconnected processing engines, and for at least some of the channels, assigning one or more elements of one or more processing engines in the network to the channel for sending data between respective processing modules.
    Type: Grant
    Filed: November 29, 2006
    Date of Patent: April 14, 2015
    Assignee: Tilera Corporation
    Inventors: Patrick Robert Griffin, Walter Lee, Anant Agarwal, David Wentzlaff
  • Publication number: 20150100948
    Abstract: An approach to generating irreducible modules. The approach includes a method that includes receiving, by at least one computing device, data associated with a specification. The method includes defining, by the at least one computing device, a pattern on the received data. The pattern reduces a set of rules into a single condition. The method includes generating, by the at least one computing device, an irreducible module based on the pattern. The irreducible module has one output dependent variable and is associated with a data flow application.
    Type: Application
    Filed: October 8, 2013
    Publication date: April 9, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: William J. Lewis
  • Publication number: 20150100949
    Abstract: A method for processing computer program code to enable different parts of the computer program code to be executed by different processing elements of a plurality of communicating processing elements. The method comprises identifying at least one first part of the computer program code, which is to be executed by a particular one of said processing elements. The method further comprises identifying at least one further part of the computer code which is related to the at least one first part of the computer code. The at least one first part of the computer program code and the at least one further part of the computer program code are caused to be executed by the particular one of said processing elements.
    Type: Application
    Filed: December 15, 2014
    Publication date: April 9, 2015
    Applicant: CODEPLAY SOFTWARE LIMITED
    Inventors: Jens-Uwe Dolinsky, Andrew Richards, Colin Riley
  • Patent number: 9003383
    Abstract: The subject system provides the ability to parallelize pre-existing serial code by importing and encapsulating all of the serial code into an object orientated flowchart language utilizing an analytic engine so that the imported code can be efficiently executed taking advantage of the partially ordered transitive flowchart system. The importation examines the serial code to ascertain what elements may be processed under an atomic time to instantiate them as either Action or Test objects, whereas statements which require more than atomic time are instantiated as Task object, with the Action, Test and Task objects being processable by separate processors to establish parallel processing, or by the multitasking afforded by the partially ordered transitive flowchart system.
    Type: Grant
    Filed: July 5, 2012
    Date of Patent: April 7, 2015
    Assignee: You Know Solutions, LLC
    Inventors: Ronald J. Lavallee, Thomas C. Peacock
  • Patent number: 8997071
    Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: March 31, 2015
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, John K. P. O'Brien, Zehra N. Sura
  • Patent number: 8990791
    Abstract: Partitioned global address space (PGAS) programming language source code is retrieved by an executed PGAS compiler. At least one shared memory array access indexed by an affine expression that includes a distinct thread identifier that is constant and different for each of a group of program execution threads targeted to execute the PGAS source code is identified within the PGAS source code. It is determined whether the at least one shared memory array access results in a local shared memory access by all of the group of program execution threads for all references to the at least one shared memory array access during execution of a compiled executable of the PGAS source code. A direct memory access executable code is generated for each shared memory array access determined to result in the local shared memory access by all of the group of program execution threads.
    Type: Grant
    Filed: July 29, 2011
    Date of Patent: March 24, 2015
    Assignee: International Business Machines Corporation
    Inventors: Salem Derisavi, Ettore Tiotto
  • Patent number: 8972959
    Abstract: A method of converting a program code of a program running in multi-thread to a program code which causes fewer lock collisions. The method includes reading the program code into a memory and searching the program code for a first conditional statement making a branch to a path, which is in a synchronized block and has no side effect on the synchronized block; duplicating the path having no side effect to which the branch is made by the searched first conditional statement into the outside of the synchronized block; and adding a second conditional statement into the program code in response to the duplication, wherein the second conditional statement is a conditional statement making a branch to the duplicated path having no side effect. Also provided is a system and an article of manufacture which causes a computer to carry out the steps of the above method.
    Type: Grant
    Filed: April 27, 2010
    Date of Patent: March 3, 2015
    Assignee: International Business Machines Corporation
    Inventor: Kazuaki Ishizaki
  • Publication number: 20150058832
    Abstract: System and methods for the parallelization of software applications are described. In some embodiments, a compiler may automatically identify within source code dependencies of a function called by another function. A persistent database may be generated to store identified dependencies. When calls the function are encountered within the source code, the persistent database may be checked, and a parallelized implementation of the function may be employed dependent upon the dependency indicated in the persistent database.
    Type: Application
    Filed: November 4, 2014
    Publication date: February 26, 2015
    Inventor: Jeffry E. Gonion
  • Patent number: 8966459
    Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.
    Type: Grant
    Filed: February 20, 2014
    Date of Patent: February 24, 2015
    Assignee: Altera Corporation
    Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
  • Patent number: 8966461
    Abstract: A medium, method, and apparatus are disclosed for eliding superfluous function invocations in a vector-processing environment. A compiler receives program code comprising a width-contingent invocation of a function. The compiler creates a width-specific executable version of the program code by determining a vector width of a target computer system and omitting the function from the width-specific executable if the vector width meets one or more criteria. For example, the compiler may omit the function call if the vector width is greater than a minimum size.
    Type: Grant
    Filed: September 29, 2011
    Date of Patent: February 24, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benedict R. Gaster, Lee W. Howes, Mark D. Hummel
  • Patent number: 8959499
    Abstract: A data parallel pipeline may specify multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects. Based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline may be generated and one or more graph transformations may be applied to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations. The deferred, combined parallel operations may be executed to produce materialized parallel data objects corresponding to the deferred parallel data objects.
    Type: Grant
    Filed: September 20, 2013
    Date of Patent: February 17, 2015
    Assignee: Google Inc.
    Inventors: Craig D. Chambers, Ashish Raniwala, Frances J. Perry, Stephen R. Adams, Robert R. Henry, Robert Bradshaw, Nathan Weizenbaum
  • Patent number: 8959498
    Abstract: A parallelization method, system and program. A program expressed by a block diagram or the like is divided into strands and a balance in calculation time is made among the strands. The functional blocks are divided into strands and the strand involving the maximum calculation time from a strand set is found. One or more movable blocks in the strand involving the maximum calculation time is found. The next step is obtaining calculation time of each strand after the movable block is moved to the strand in the input or output direction according to its property, and moving the block to a strand most largely reducing the calculation time of the strand having the maximum calculation time before the movement. This process loops until calculation time is no longer reduced. Strands are then transformed into source codes. Source codes are compiled and assigned to separate cores or processors for execution.
    Type: Grant
    Filed: February 22, 2011
    Date of Patent: February 17, 2015
    Assignee: International Business Machines Corporation
    Inventors: Hideaki Komatsu, Takeo Yoshizawa
  • Patent number: 8959497
    Abstract: One embodiment of the present invention sets forth a technique for partitioning a predecessor thread program into sub-programs and dynamically spawning a thread grid of the sub-programs based on the outcome of a conditional statement in the predecessor thread program. The programming instructions for the predecessor thread program are analyzed to assess the benefit of partitioning the thread program at a conditional statement into sub-programs. If the predecessor thread program is partitioned, then each branch of the conditional statement may be used to form a separate sub-program. Predicate tables are populated at the predecessor thread program run-time to establish which possible instances of the thread sub-programs should be spawned in subsequent execution phases.
    Type: Grant
    Filed: August 29, 2008
    Date of Patent: February 17, 2015
    Assignee: NVIDIA Corporation
    Inventors: John A. Stratton, David Luebke
  • Patent number: 8954941
    Abstract: Method of generating respective instruction compaction schemes for subsets of instructions to be processed by a programmable processor, comprising the steps of a) receiving at least one input code sample representative for software to be executed on the programmable processor, the input code comprising a plurality of instructions defining a first set of instructions (S1), b) initializing a set of removed instructions as empty (S3), c) determining the most compact representation of the first set of instructions (S4) d) comparing the size of said most compact representation with a threshold value (S5), e) carrying out steps e1 to e3 if the size is larger than said threshold value, e1) determining which instruction of the first set of instructions has a highest coding cost (S6), e2) removing said instruction having the highest coding cost from the first set of instructions and (S7), e3) adding said instruction to the set of removed instructions (S8), f) repeating steps b-f, wherein the first set of instructions
    Type: Grant
    Filed: September 3, 2010
    Date of Patent: February 10, 2015
    Assignee: Intel Corporation
    Inventors: Hendrik Tjeerd Joannes Zwartenkot, Alexander Augusteijn, Yuanging Guo, Jürgen Von Oerthel, Jeroen Anton Johan Leijten, Erwan Yann Maurice Le Thenaff
  • Patent number: 8949786
    Abstract: A method and system for parallelization of sequential computer program code are described. In one embodiment, an automatic parallelization system includes a syntactic analyzer to analyze the structure of the sequential computer program code to identify the positions to insert SPI to the sequential computer code; a profiler for profiling the sequential computer program code by preparing call graph to determine dependency of each line of the sequential computer program code and the time required for the execution of each function of the sequential computer program code; an analyzer to determine parallelizability of the sequential computer program code from the information obtained by analyzing and profiling of the sequential computer program code; and a code generator to insert SPI to the sequential computer program code upon determination of parallelizability to obtain parallel computer program code, which is further outputted to a parallel computing environment for execution and the method thereof.
    Type: Grant
    Filed: December 1, 2009
    Date of Patent: February 3, 2015
    Assignee: KPIT Technologies Limited
    Inventors: Vinay G. Vaidya, Ranadive Priti, Sah Sudhakar
  • Patent number: 8949852
    Abstract: Some embodiments provide a system that increases parallelization in a computer program. During operation, the system obtains a binary associative operator and a ordered set of elements associated with a prefix operation in the computer program. Next, the system divides the elements into multiple sets of contiguous iterations based on a number of processors used to execute the computer program. The system then performs, in parallel on the processors, a set of local reductions on the contiguous iterations using the binary associative operator. Afterwards, the system calculates a set of boundary prefixes between the contiguous iterations using the local reductions. Finally, the system applies, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator to obtain a set of prefixes for the prefix operation.
    Type: Grant
    Filed: June 29, 2009
    Date of Patent: February 3, 2015
    Assignee: Oracle America, Inc.
    Inventor: Robert E. Cypher
  • Patent number: 8949808
    Abstract: Systems and methods for the vectorization of software applications are described. In some embodiments, a compiler may automatically generate both scalar and vector versions of a function from a single source code description. A vector interface may be exposed in a persistent dependency database that is associated with the function. This may allow a compiler to make vector function calls from within vectorized loops, rather than making multiple serialized scalar function calls from within a vectorized loop. This may in turn facilitate the vectorization of hierarchical code, which may improve application performance when vector execution resources are available.
    Type: Grant
    Filed: September 23, 2010
    Date of Patent: February 3, 2015
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8949806
    Abstract: A system comprises a plurality of computation units interconnected by an interconnection network.
    Type: Grant
    Filed: August 17, 2012
    Date of Patent: February 3, 2015
    Assignee: Tilera Corporation
    Inventors: Walter Lee, Robert A. Gottlieb, Vineet Soni, Anant Agarwal, Richard Schooler
  • Patent number: 8949807
    Abstract: A device receives, via a technical computing environment, a program that includes a parallel construct and a command to be executed by graphical processing units, and analyzes the program. The device also creates, based on the parallel construct and the analysis, one or more instances of the command to be executed in parallel by the graphical processing units, and transforms, via the technical computing environment, the one or more command instances into one or more command instances that are executable by the graphical processing units. The device further allocates the one or more transformed command instances to the graphical processing units for parallel execution, and receives, from the graphical processing units, one or more results associated with parallel execution of the one or more transformed command instances by the graphical processing units.
    Type: Grant
    Filed: September 30, 2013
    Date of Patent: February 3, 2015
    Assignee: The MathWorks, Inc.
    Inventors: Halldor N. Stefansson, Edric Ellis
  • Patent number: 8949805
    Abstract: A method for processing computer program code to enable different parts of the computer program code to be executed by different processing elements of a plurality of communicating processing elements. The method comprises identifying at least one first part of the computer program code, which is to be executed by a particular one of said processing elements. The method further comprises identifying at least one further part of the computer code which is related to the at least one first part of the computer code. The at least one first part of the computer program code and the at least one further part of the computer program code are caused to be executed by the particular one of said processing elements.
    Type: Grant
    Filed: June 11, 2010
    Date of Patent: February 3, 2015
    Assignee: Codeplay Software Limited
    Inventors: Jens-Uwe Dolinsky, Andrew Richards, Colin Riley
  • Patent number: 8935682
    Abstract: A device initiates a technical computing environment (TCE), and receives, via the TCE, a program command that permits the TCE to access a graphical processing unit that is remote to the device, where the program command permits the TCE to seamlessly transfer data to the remote GPU. The device transforms, via the TCE, the program command into a program command that is executable by the remote GPU, and provides the transformed program command to the remote GPU for execution. The device also receives, from the remote GPU, one or more results associated with execution of the transformed program command by the remote GPU, and utilizes the one or more results via the TCE.
    Type: Grant
    Filed: September 6, 2013
    Date of Patent: January 13, 2015
    Assignee: The MathWorks, Inc.
    Inventors: Halldor N. Stefansson, Edric Ellis, Jocelyn Luke Martin
  • Patent number: 8935681
    Abstract: A method comprising encrypting an original plain text file and making it available to a user as a protected file, and issuing to said user a user program and a user license to enable said user to decrypt the protected file and view an image of the original file while preventing the image of the original file from being copied to any file, other than as a further protected file. The image is preferably stored in a memory not backed up to the computer swap file. Preferably, the user program comprises an editor program and the user saves editorial changes to the original image in an encrypted difference file, separate from the original file. Both files are then used to re-create the edited image using the editor program and user license. The user program may comprise any computer tool including compilers.
    Type: Grant
    Filed: September 29, 2005
    Date of Patent: January 13, 2015
    Assignees: MStar Semiconductor, Inc., MStar Software R&D (Shenzhen) Ltd., MStar France SAS, MStar Semiconductor, Inc.
    Inventor: John David Mersh
  • Patent number: 8930926
    Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
    Type: Grant
    Filed: April 16, 2010
    Date of Patent: January 6, 2015
    Assignee: Reservoir Labs, Inc.
    Inventors: Cedric Bastoul, Richard A. Lethin, Allen K. Leung, Benoit J. Meister, Peter Szilagyi, Nicolas T. Vasilache, David E. Wohlford
  • Patent number: 8930888
    Abstract: Modelling a serialized object stream can include receiving a stream of bytes corresponding to the serialized form of a first object, creating an empty initial model for containing a generic object and a generic class, and, upon detection of a class from the stream, constructing a corresponding generic class object in the model using a processor. Upon detection of a new object from the stream, a corresponding generic object in the model can be constructed. Further objects and classes in the model that are associated with the generic objects and classes can be referenced.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: January 6, 2015
    Assignee: International Business Machines Corporation
    Inventor: Julien Canches
  • Patent number: 8924929
    Abstract: A system and methods are disclosed for executing a technical computing program in parallel in multiple execution environments. A program is invoked for execution in a first execution environment and from the invocation the program is executed in the first execution environment and one or more additional execution environments to provide for parallel execution of the program. New constructs in a technical computing programming language are disclosed for parallel programming of a technical computing program for execution in multiple execution environments. It is also further disclosed a system and method for changing the mode of operation of an execution environment from a sequential mode to a parallel mode of operation and vice-versa.
    Type: Grant
    Filed: August 28, 2009
    Date of Patent: December 30, 2014
    Assignee: The MathWorks, Inc.
    Inventor: Cleve Moler
  • Patent number: 8924946
    Abstract: Systems and methods for replacing inferior code segments with optimal code segments. Systems and methods for making such replacements for programming languages using Message Passing Interface (MPI) are provided. For example, at the compiler level, point-to-point code segments may be identified and replaced with all-to-all code segments. Programming code may include X10, Chapel and other programming languages that support parallel for loop.
    Type: Grant
    Filed: November 24, 2010
    Date of Patent: December 30, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ganesh Bikshandi, Krishna Nandivada Venkata, Igor Peshansky, Vijay Anand Saraswat