For A Parallel Or Multiprocessor System Patents (Class 717/149)
-
Patent number: 9286196Abstract: A method, apparatus and computer program, each for optimizing execution of a computer program is disclosed in which a topology-based control flow analysis of basic blocks of the computer program is performed and a data flow analysis of the instructions within the basic blocks is performed to determine if each instruction of said computer program is uniform or non-uniform (variant or invariant). Subsequently, when the computer program is executed, storage of a copy of a variable dependent on a uniform instruction is suppressed.Type: GrantFiled: January 8, 2015Date of Patent: March 15, 2016Assignee: ARM LimitedInventors: Jiangning Liu, Zhenqiang Chen
-
Patent number: 9286044Abstract: Hybrid parallelization strategies for machine learning programs on top of MapReduce are provided. In one embodiment, a method of and computer program product for parallel execution of machine learning programs are provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent.Type: GrantFiled: June 27, 2014Date of Patent: March 15, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
-
Patent number: 9280330Abstract: An apparatus and method for executing code are provided. The apparatus includes a memory manager that allocates a stack in memory to store processed data that needs to be retained; a loop generator that divides program code programmed to be processed in parallel into regions based on a barrier function, transforms a region that includes the processed data that needs to be retained in the stack into a first coalescing loop, and transforms a region that uses the processed data stored in the stack into a second coalescing loop such that the transformed program code may be serially processed; and a loop changer that reverses a processing order of the second coalescing loop in comparison to a processing order of the first coalescing loop.Type: GrantFiled: March 31, 2014Date of Patent: March 8, 2016Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jin-Seok Lee, Seong-Gun Kim, Dong-Hoon Yoo, Seok-Joong Hwang
-
Patent number: 9262141Abstract: In one embodiment, a computer-implemented method for concurrently processing at least a portion of a graphical model is provided. The method may include obtaining the graphical model; recognizing a pattern in the graphical model, the pattern suitable for concurrent processing; and employing concurrent processing using multi-thread, multi-core, or multi-processor computing device when executing the pattern in the graphical model.Type: GrantFiled: September 10, 2007Date of Patent: February 16, 2016Assignee: The MathWorks, Inc.Inventors: Donald Paul Orofino, II, Ramamurthy Mani, Michael James Longfritz
-
Patent number: 9250867Abstract: A computer-implemented method for creating a program for a multi-processor system comprising a plurality of interspersed processors and memories. A user may specify or create source code using a programming language. The source code specifies a plurality of tasks and communication of data among the plurality of tasks. However, the source code may not (and preferably is not required to) 1) explicitly specify which physical processor will execute each task and 2) explicitly specify which communication mechanism to use among the plurality of tasks. The method then creates machine language instructions based on the source code, wherein the machine language instructions are designed to execute on the plurality of processors. Creation of the machine language instructions comprises assigning tasks for execution on respective processors and selecting communication mechanisms between the processors based on location of the respective processors and required data communication to satisfy system requirements.Type: GrantFiled: May 22, 2014Date of Patent: February 2, 2016Assignee: Coherent Logix, IncorporatedInventors: John Mark Beardslee, Michael B. Doerr, Tommy K. Eng
-
Patent number: 9213570Abstract: A method for conveying a data packet received from a network to a virtual machine instantiated on a computer system coupled to the network, and a medium and system for carrying out the method, is described. In the method, a guest receive pointer queue of a component executing in the virtual machine is inspected in order to identify a location in a guest receive packet data buffer that is available to receive packet data. Data from the data packet received from the network is copied into the guest receive packet data buffer at the identified location, and a standard receive interrupt is raised in the virtual machine.Type: GrantFiled: January 29, 2015Date of Patent: December 15, 2015Assignee: VMware, Inc.Inventor: Michael Nelson
-
Patent number: 9207946Abstract: An approach is provided in which a distributed runtime environment executes a software application that includes isolated runtime constructs corresponding to an isolated runtime environment. During the execution, the distributed runtime environment identifies isolated runtime constructs included in the software application and selects distributed runtime constructs corresponding to the isolated runtime constructs. In turn, the distributed runtime environment executes the distributed runtime constructs in lieu of executing the isolated runtime constructs.Type: GrantFiled: August 27, 2013Date of Patent: December 8, 2015Assignee: International Business Machines CorporationInventor: Douglas Davis
-
Patent number: 9170846Abstract: A distributed data-parallel execution (DDPE) system splits a computational problem into a plurality of sub-problems using a branch-and-bound algorithm, designates a synchronous stop time for a “plurality of processors” (for example, a cluster) for each round of execution, processes the search tree by recursively using a branch-and-bound algorithm in multiple rounds (without inter-processor communications), determines if further processing is required based on the processing round state data, and terminates processing on the processors when processing is completed.Type: GrantFiled: March 29, 2011Date of Patent: October 27, 2015Inventors: Daniel Delling, Mihai Budiu, Renato F. Werneck
-
Patent number: 9148669Abstract: A method and system for encoding a digital video signal using a plurality of parallel processors. A digital picture is received that is composed of one or more GOPs. The CPU then determines the number of GOPs that need to be encoded and divides them into groups. The number of GOPs in a group may equal the number of parallel processors in the multi-core platform available to encode. The CPU transfers in a single batch to the multi-core platform, a frame of equal rank from each GOP contained in the first group. The multi-core platform encodes the frames in parallel, rearranges the encoded byte stream chunk into normal display order sequence and stores the encoded byte stream. The process may repeat until all the GOPs in the first group have been encoded. Upon completion the multi-core platform outputs the encoded byte stream in normal display order sequence.Type: GrantFiled: March 10, 2011Date of Patent: September 29, 2015Assignee: Sony CorporationInventors: Jonathan Huang, Tsaifa Yu
-
Patent number: 9146732Abstract: The invention provides a method and system to execute applications on a mobile device. The applications may be compiled on a remote server and sent to the mobile device before execution. The applications may be updated by the remote server without interaction by the mobile device user.Type: GrantFiled: May 9, 2014Date of Patent: September 29, 2015Assignee: MFOUNDRY, INC.Inventor: Rodney Aiglstorfer
-
Patent number: 9146834Abstract: Various arrangements for debugging code are presented. A computer system, such as a web server, may compile code into compiled code. The code may contain one or more subsections, include a first taskflow. A selection of the first taskflow may be received from a remote, developer computer system via a network. The selection of the first taskflow may indicate that the first taskflow is to be debugged. Execution of the first taskflow of the compiled code may occur by the computer system. While the computer system is executing the first taskflow of the compiled code, debugging functionality of the first taskflow may be provided to the developer computer system.Type: GrantFiled: May 1, 2014Date of Patent: September 29, 2015Assignee: Oracle International CorporationInventor: John Smiljanic
-
Patent number: 9128747Abstract: Systems and method for optimizing the performance of software applications are described. Embodiments include computer implemented steps for identifying at least two constituent software components for parallel execution, executing the identified software components, profiling the performance of the one or more software components at an execution time, creating an optimization model with the set of data gathered from profiling the execution of the one or more software components, and marking at least two software components for execution in parallel in a subsequent execution on the basis of the optimization model. In additional embodiments, the optimization model may be reconfigured on the basis of a cost-benefit analysis of parallelization, and the software components involved marked for sequential execution if the resource overhead associated with parallelization exceeds the corresponding resource or throughput benefit.Type: GrantFiled: June 25, 2012Date of Patent: September 8, 2015Assignee: Infosys LimitedInventor: Prasanna Rajaraman
-
Patent number: 9110872Abstract: Genetic sequence data occurring in genome sequences is represented for efficient access of the sequence information in a defined storage scheme. A described replet-sequence matrix data structure allows the compression and efficient access of sequence information. The data structure allows the dynamic change of ontology: the replet-information table can evolve by adding, updating, removing replets, and the set of replets present in the table represent the ontology at the moment. The data structure enables the sequence information to be processed in parallel, and also enables multiple views of the sequence data to exist along with replet specific information.Type: GrantFiled: October 31, 2003Date of Patent: August 18, 2015Assignee: International Business Machines CorporationInventor: Jagir Razak Jainul Abdeen Hussan
-
Patent number: 9098709Abstract: A method of converting an original application into a cloud-hosted application includes splitting the original application into a plurality of application components along security relevant boundaries, mapping the application components to hosting infrastructure boundaries, and using a mechanism to enforce a privacy policy of a user. The mapping may include assigning each application component to a distinct virtual machine, which acts as a container for its assigned component.Type: GrantFiled: November 13, 2012Date of Patent: August 4, 2015Assignee: International Business Machines CorporationInventors: Mihai Christodorescu, Dimitrios Pendarakis, Kapil K. Singh
-
Patent number: 9063826Abstract: A method of mapping processes to processors in a parallel computing environment where a parallel application is to be run on a cluster of nodes wherein at least one of the nodes has multiple processors sharing a common memory, the method comprising using compiler based communication analysis to map Message Passing Interface processes to processors on the nodes, whereby at least some more heavily communicating processes are mapped to processors within nodes. Other methods, apparatus, and computer readable media are also provided.Type: GrantFiled: November 28, 2011Date of Patent: June 23, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Dibyendu Das, Nagarajan Kathiresan, Rajan Ravindran, Bhaskaran Venkatsubramaniam
-
Patent number: 9043769Abstract: The present invention relates to a method for compiling code for a multi-core processor, comprising: detecting and optimizing a loop, partitioning the loop into partitions executable and mappable on physical hardware with optimal instruction level parallelism, optimizing the loop iterations and/or loop counter for ideal mapping on hardware, chaining the loop partitions generating a list representing the execution sequence of the partitions.Type: GrantFiled: December 28, 2010Date of Patent: May 26, 2015Assignee: Hyperion Core Inc.Inventor: Martin Vorbach
-
Patent number: 9043770Abstract: In one embodiment, a machine-implemented method programs a heterogeneous multi-processor computer system to run a plurality of program modules, wherein each program module is to be run on one of the processors The system includes a plurality of processors of two or more different processor types. According to the recited method, machine-implemented offline processing is performed using a plurality of SIET tools of a scheduling information extracting toolkit (SIET) and a plurality of SBT tools of a schedule building toolkit (SBT). A program module applicability analyzer (PMAA) determines whether a first processor of a first processor type is capable of running a first program module without compiling the first program module. Machine-implemented online processing is performed using realtime data to test the scheduling software and the selected schedule solution.Type: GrantFiled: January 23, 2013Date of Patent: May 26, 2015Assignee: LSI CorporationInventors: Pavel Aleksandrovich Aliseychik, Petrus Sebastiaan Adrianus Daniel Evers, Denis Vasilevich Parfenov, Alexander Nikolaevich Filippov, Denis Vladimirovich Zaytsev
-
Patent number: 9038040Abstract: Partitioning programs between a general purpose core and one or more accelerators is provided. A compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.Type: GrantFiled: January 25, 2006Date of Patent: May 19, 2015Assignee: International Business Machines CorporationInventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel A. Prener
-
Patent number: 9036162Abstract: An image sensing and printing digital camera device includes a housing defining a slot for receiving a printed instruction card having printed thereon an array of dots representing a programming script, the housing further storing therein a roll of print media; an area image sensor for sensing an image and generating pixel data representing the image; a linear image sensor for scanning the array of dots on the card and converting the array of dots into a data signal; a microcontroller provided in the housing, the microcontroller for decoding the data signal into the programming script and applying the programming script on the pixel data; and a printing mechanism for printing the pixel data, having applied thereto the programming script, on the roll of print media. The microcontroller integrates on a single chip a VLIW processor, a printhead interface, and an output buffer effecting communication between the VLIW processor and the printhead interface.Type: GrantFiled: July 3, 2012Date of Patent: May 19, 2015Assignee: Google Inc.Inventor: Kia Silverbrook
-
Patent number: 9021426Abstract: According to one embodiment of the present disclosure, hardware initialization code and error action information are retrieved from separate storage areas. The hardware initialization code includes code that initializes a device, and also includes placeholders corresponding to actions that are performed when the device fails initialization. Likewise, the error action information describes the actions that are performed when the device fails initialization. The error action information is converted into macros that include lines of code. As such, the error action placeholders are matched to the macros and, in turn, each of the error action placeholders is replaced with the lines of code corresponding to the matched macros.Type: GrantFiled: December 4, 2012Date of Patent: April 28, 2015Assignee: International Business Machines CorporationInventors: Daniel M. Crowell, John Farrugia, Michael J. Jones, David Dean Sanner
-
Publication number: 20150113514Abstract: Methods are provided for source-to-source transformations for graph processing on many-core platforms. A method includes receiving a graph application including one graph, expressed by a graph application programming interface configured for defining and manipulating graphs. The method further includes transforming, by a source-to-source compiler, the graph application into a plurality of parallel code variants. Each of the plurality of parallel code variants is specifically configured for parallel execution by a target one of a plurality of different many-core processors. The method also includes selecting and tuning, by a runtime component, a particular one of the parallel code variants for the parallel execution responsive to graph application characteristics, graph data, and an underlying code execution platform of the plurality of different many-core processors.Type: ApplicationFiled: October 9, 2014Publication date: April 23, 2015Inventors: Srimat Chakradhar, Michela Becchi, Da Li
-
Patent number: 9015683Abstract: Provided is a method of transforming program code written such that a plurality of work-items are allocated respectively to and concurrently executed on a plurality of processing elements included in a computing unit. A program code translator may identify, in the program code, two or more code regions, which are to be enclosed by work-item coalescing loops (WCLs), based on a synchronization barrier function contained in the program code, such that the work-items are serially executable on a smaller number of processing elements than a number of the processing elements, and may enclose the identified code regions with the WCLs, respectively.Type: GrantFiled: December 23, 2010Date of Patent: April 21, 2015Assignees: Samsung Electronics Co., Ltd., SNU R&DB FoundationInventors: Seung-Mo Cho, Jong-Deok Choi, Jaejin Lee
-
Patent number: 9015688Abstract: Methods and apparatuses associated with vectorization of scalar callee functions are disclosed herein. In various embodiments, compiling a first program may include generating one or more vectorized versions of a scalar callee function of the first program, based at least in part on vectorization annotations of the first program. Additionally, compiling may include generating one or more vectorized function signatures respectively associated with the one or more vectorized versions of the scalar callee function. The one or more vectorized function signatures may enable an appropriate vectorized version of the scalar callee function to be matched and invoked for a generic call from a caller function of a second program to a vectorized version of the scalar callee function.Type: GrantFiled: April 1, 2011Date of Patent: April 21, 2015Assignee: Intel CorporationInventors: Xinmin Tian, Sergey Stanislavoich Kozhukhov, Sergey Victorovich Preis, Robert Yehuda Geva, Konstantin Anatolyevich Pyjov, Hideki Sato, Milind Baburao Girkar, Aleksei Gurievich Kasov, Nikolay Vladimirovich Panchenko
-
Patent number: 9009726Abstract: A “Concurrent Sharing Model” provides a programming model based on revisions and isolation types for concurrent revisions of states, data, or variables shared between two or more concurrent tasks or programs. This model enables revisions of shared states, data, or variables to maintain determinacy despite nondeterministic scheduling between concurrent tasks or programs. More specifically, the Concurrent Sharing Model provides various techniques wherein shared states, data, or variables are conceptually replicated on forks, and only copied or written if necessary, then deterministically merged on joins such that concurrent tasks or programs can work with independent local copies of the shared states, data, or variables while ensuring automated conflict resolution.Type: GrantFiled: December 10, 2010Date of Patent: April 14, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Sebastian Burckhardt, Daniel Johannes Pieter Leijen, Alexandro Baldassin
-
Patent number: 9009660Abstract: Programming in a multiprocessor environment includes accepting a program specification that defines a plurality of processing modules and one or more channels for sending data between ports of the modules, mapping each of the processing modules to run on a set of one or more processing engines of a network of interconnected processing engines, and for at least some of the channels, assigning one or more elements of one or more processing engines in the network to the channel for sending data between respective processing modules.Type: GrantFiled: November 29, 2006Date of Patent: April 14, 2015Assignee: Tilera CorporationInventors: Patrick Robert Griffin, Walter Lee, Anant Agarwal, David Wentzlaff
-
Publication number: 20150100948Abstract: An approach to generating irreducible modules. The approach includes a method that includes receiving, by at least one computing device, data associated with a specification. The method includes defining, by the at least one computing device, a pattern on the received data. The pattern reduces a set of rules into a single condition. The method includes generating, by the at least one computing device, an irreducible module based on the pattern. The irreducible module has one output dependent variable and is associated with a data flow application.Type: ApplicationFiled: October 8, 2013Publication date: April 9, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: William J. Lewis
-
Publication number: 20150100949Abstract: A method for processing computer program code to enable different parts of the computer program code to be executed by different processing elements of a plurality of communicating processing elements. The method comprises identifying at least one first part of the computer program code, which is to be executed by a particular one of said processing elements. The method further comprises identifying at least one further part of the computer code which is related to the at least one first part of the computer code. The at least one first part of the computer program code and the at least one further part of the computer program code are caused to be executed by the particular one of said processing elements.Type: ApplicationFiled: December 15, 2014Publication date: April 9, 2015Applicant: CODEPLAY SOFTWARE LIMITEDInventors: Jens-Uwe Dolinsky, Andrew Richards, Colin Riley
-
Patent number: 9003383Abstract: The subject system provides the ability to parallelize pre-existing serial code by importing and encapsulating all of the serial code into an object orientated flowchart language utilizing an analytic engine so that the imported code can be efficiently executed taking advantage of the partially ordered transitive flowchart system. The importation examines the serial code to ascertain what elements may be processed under an atomic time to instantiate them as either Action or Test objects, whereas statements which require more than atomic time are instantiated as Task object, with the Action, Test and Task objects being processable by separate processors to establish parallel processing, or by the multitasking afforded by the partially ordered transitive flowchart system.Type: GrantFiled: July 5, 2012Date of Patent: April 7, 2015Assignee: You Know Solutions, LLCInventors: Ronald J. Lavallee, Thomas C. Peacock
-
Patent number: 8997071Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.Type: GrantFiled: September 10, 2012Date of Patent: March 31, 2015Assignee: International Business Machines CorporationInventors: Tong Chen, John K. P. O'Brien, Zehra N. Sura
-
Patent number: 8990791Abstract: Partitioned global address space (PGAS) programming language source code is retrieved by an executed PGAS compiler. At least one shared memory array access indexed by an affine expression that includes a distinct thread identifier that is constant and different for each of a group of program execution threads targeted to execute the PGAS source code is identified within the PGAS source code. It is determined whether the at least one shared memory array access results in a local shared memory access by all of the group of program execution threads for all references to the at least one shared memory array access during execution of a compiled executable of the PGAS source code. A direct memory access executable code is generated for each shared memory array access determined to result in the local shared memory access by all of the group of program execution threads.Type: GrantFiled: July 29, 2011Date of Patent: March 24, 2015Assignee: International Business Machines CorporationInventors: Salem Derisavi, Ettore Tiotto
-
Patent number: 8972959Abstract: A method of converting a program code of a program running in multi-thread to a program code which causes fewer lock collisions. The method includes reading the program code into a memory and searching the program code for a first conditional statement making a branch to a path, which is in a synchronized block and has no side effect on the synchronized block; duplicating the path having no side effect to which the branch is made by the searched first conditional statement into the outside of the synchronized block; and adding a second conditional statement into the program code in response to the duplication, wherein the second conditional statement is a conditional statement making a branch to the duplicated path having no side effect. Also provided is a system and an article of manufacture which causes a computer to carry out the steps of the above method.Type: GrantFiled: April 27, 2010Date of Patent: March 3, 2015Assignee: International Business Machines CorporationInventor: Kazuaki Ishizaki
-
Publication number: 20150058832Abstract: System and methods for the parallelization of software applications are described. In some embodiments, a compiler may automatically identify within source code dependencies of a function called by another function. A persistent database may be generated to store identified dependencies. When calls the function are encountered within the source code, the persistent database may be checked, and a parallelized implementation of the function may be employed dependent upon the dependency indicated in the persistent database.Type: ApplicationFiled: November 4, 2014Publication date: February 26, 2015Inventor: Jeffry E. Gonion
-
Patent number: 8966459Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.Type: GrantFiled: February 20, 2014Date of Patent: February 24, 2015Assignee: Altera CorporationInventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
-
Patent number: 8966461Abstract: A medium, method, and apparatus are disclosed for eliding superfluous function invocations in a vector-processing environment. A compiler receives program code comprising a width-contingent invocation of a function. The compiler creates a width-specific executable version of the program code by determining a vector width of a target computer system and omitting the function from the width-specific executable if the vector width meets one or more criteria. For example, the compiler may omit the function call if the vector width is greater than a minimum size.Type: GrantFiled: September 29, 2011Date of Patent: February 24, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Benedict R. Gaster, Lee W. Howes, Mark D. Hummel
-
Patent number: 8959499Abstract: A data parallel pipeline may specify multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects. Based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline may be generated and one or more graph transformations may be applied to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations. The deferred, combined parallel operations may be executed to produce materialized parallel data objects corresponding to the deferred parallel data objects.Type: GrantFiled: September 20, 2013Date of Patent: February 17, 2015Assignee: Google Inc.Inventors: Craig D. Chambers, Ashish Raniwala, Frances J. Perry, Stephen R. Adams, Robert R. Henry, Robert Bradshaw, Nathan Weizenbaum
-
Patent number: 8959498Abstract: A parallelization method, system and program. A program expressed by a block diagram or the like is divided into strands and a balance in calculation time is made among the strands. The functional blocks are divided into strands and the strand involving the maximum calculation time from a strand set is found. One or more movable blocks in the strand involving the maximum calculation time is found. The next step is obtaining calculation time of each strand after the movable block is moved to the strand in the input or output direction according to its property, and moving the block to a strand most largely reducing the calculation time of the strand having the maximum calculation time before the movement. This process loops until calculation time is no longer reduced. Strands are then transformed into source codes. Source codes are compiled and assigned to separate cores or processors for execution.Type: GrantFiled: February 22, 2011Date of Patent: February 17, 2015Assignee: International Business Machines CorporationInventors: Hideaki Komatsu, Takeo Yoshizawa
-
Patent number: 8959497Abstract: One embodiment of the present invention sets forth a technique for partitioning a predecessor thread program into sub-programs and dynamically spawning a thread grid of the sub-programs based on the outcome of a conditional statement in the predecessor thread program. The programming instructions for the predecessor thread program are analyzed to assess the benefit of partitioning the thread program at a conditional statement into sub-programs. If the predecessor thread program is partitioned, then each branch of the conditional statement may be used to form a separate sub-program. Predicate tables are populated at the predecessor thread program run-time to establish which possible instances of the thread sub-programs should be spawned in subsequent execution phases.Type: GrantFiled: August 29, 2008Date of Patent: February 17, 2015Assignee: NVIDIA CorporationInventors: John A. Stratton, David Luebke
-
Patent number: 8954941Abstract: Method of generating respective instruction compaction schemes for subsets of instructions to be processed by a programmable processor, comprising the steps of a) receiving at least one input code sample representative for software to be executed on the programmable processor, the input code comprising a plurality of instructions defining a first set of instructions (S1), b) initializing a set of removed instructions as empty (S3), c) determining the most compact representation of the first set of instructions (S4) d) comparing the size of said most compact representation with a threshold value (S5), e) carrying out steps e1 to e3 if the size is larger than said threshold value, e1) determining which instruction of the first set of instructions has a highest coding cost (S6), e2) removing said instruction having the highest coding cost from the first set of instructions and (S7), e3) adding said instruction to the set of removed instructions (S8), f) repeating steps b-f, wherein the first set of instructionsType: GrantFiled: September 3, 2010Date of Patent: February 10, 2015Assignee: Intel CorporationInventors: Hendrik Tjeerd Joannes Zwartenkot, Alexander Augusteijn, Yuanging Guo, Jürgen Von Oerthel, Jeroen Anton Johan Leijten, Erwan Yann Maurice Le Thenaff
-
Patent number: 8949786Abstract: A method and system for parallelization of sequential computer program code are described. In one embodiment, an automatic parallelization system includes a syntactic analyzer to analyze the structure of the sequential computer program code to identify the positions to insert SPI to the sequential computer code; a profiler for profiling the sequential computer program code by preparing call graph to determine dependency of each line of the sequential computer program code and the time required for the execution of each function of the sequential computer program code; an analyzer to determine parallelizability of the sequential computer program code from the information obtained by analyzing and profiling of the sequential computer program code; and a code generator to insert SPI to the sequential computer program code upon determination of parallelizability to obtain parallel computer program code, which is further outputted to a parallel computing environment for execution and the method thereof.Type: GrantFiled: December 1, 2009Date of Patent: February 3, 2015Assignee: KPIT Technologies LimitedInventors: Vinay G. Vaidya, Ranadive Priti, Sah Sudhakar
-
Patent number: 8949852Abstract: Some embodiments provide a system that increases parallelization in a computer program. During operation, the system obtains a binary associative operator and a ordered set of elements associated with a prefix operation in the computer program. Next, the system divides the elements into multiple sets of contiguous iterations based on a number of processors used to execute the computer program. The system then performs, in parallel on the processors, a set of local reductions on the contiguous iterations using the binary associative operator. Afterwards, the system calculates a set of boundary prefixes between the contiguous iterations using the local reductions. Finally, the system applies, in parallel on the processors, the boundary prefixes to the contiguous iterations using the binary associative operator to obtain a set of prefixes for the prefix operation.Type: GrantFiled: June 29, 2009Date of Patent: February 3, 2015Assignee: Oracle America, Inc.Inventor: Robert E. Cypher
-
Patent number: 8949808Abstract: Systems and methods for the vectorization of software applications are described. In some embodiments, a compiler may automatically generate both scalar and vector versions of a function from a single source code description. A vector interface may be exposed in a persistent dependency database that is associated with the function. This may allow a compiler to make vector function calls from within vectorized loops, rather than making multiple serialized scalar function calls from within a vectorized loop. This may in turn facilitate the vectorization of hierarchical code, which may improve application performance when vector execution resources are available.Type: GrantFiled: September 23, 2010Date of Patent: February 3, 2015Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 8949806Abstract: A system comprises a plurality of computation units interconnected by an interconnection network.Type: GrantFiled: August 17, 2012Date of Patent: February 3, 2015Assignee: Tilera CorporationInventors: Walter Lee, Robert A. Gottlieb, Vineet Soni, Anant Agarwal, Richard Schooler
-
Patent number: 8949807Abstract: A device receives, via a technical computing environment, a program that includes a parallel construct and a command to be executed by graphical processing units, and analyzes the program. The device also creates, based on the parallel construct and the analysis, one or more instances of the command to be executed in parallel by the graphical processing units, and transforms, via the technical computing environment, the one or more command instances into one or more command instances that are executable by the graphical processing units. The device further allocates the one or more transformed command instances to the graphical processing units for parallel execution, and receives, from the graphical processing units, one or more results associated with parallel execution of the one or more transformed command instances by the graphical processing units.Type: GrantFiled: September 30, 2013Date of Patent: February 3, 2015Assignee: The MathWorks, Inc.Inventors: Halldor N. Stefansson, Edric Ellis
-
Patent number: 8949805Abstract: A method for processing computer program code to enable different parts of the computer program code to be executed by different processing elements of a plurality of communicating processing elements. The method comprises identifying at least one first part of the computer program code, which is to be executed by a particular one of said processing elements. The method further comprises identifying at least one further part of the computer code which is related to the at least one first part of the computer code. The at least one first part of the computer program code and the at least one further part of the computer program code are caused to be executed by the particular one of said processing elements.Type: GrantFiled: June 11, 2010Date of Patent: February 3, 2015Assignee: Codeplay Software LimitedInventors: Jens-Uwe Dolinsky, Andrew Richards, Colin Riley
-
Patent number: 8935682Abstract: A device initiates a technical computing environment (TCE), and receives, via the TCE, a program command that permits the TCE to access a graphical processing unit that is remote to the device, where the program command permits the TCE to seamlessly transfer data to the remote GPU. The device transforms, via the TCE, the program command into a program command that is executable by the remote GPU, and provides the transformed program command to the remote GPU for execution. The device also receives, from the remote GPU, one or more results associated with execution of the transformed program command by the remote GPU, and utilizes the one or more results via the TCE.Type: GrantFiled: September 6, 2013Date of Patent: January 13, 2015Assignee: The MathWorks, Inc.Inventors: Halldor N. Stefansson, Edric Ellis, Jocelyn Luke Martin
-
Patent number: 8935681Abstract: A method comprising encrypting an original plain text file and making it available to a user as a protected file, and issuing to said user a user program and a user license to enable said user to decrypt the protected file and view an image of the original file while preventing the image of the original file from being copied to any file, other than as a further protected file. The image is preferably stored in a memory not backed up to the computer swap file. Preferably, the user program comprises an editor program and the user saves editorial changes to the original image in an encrypted difference file, separate from the original file. Both files are then used to re-create the edited image using the editor program and user license. The user program may comprise any computer tool including compilers.Type: GrantFiled: September 29, 2005Date of Patent: January 13, 2015Assignees: MStar Semiconductor, Inc., MStar Software R&D (Shenzhen) Ltd., MStar France SAS, MStar Semiconductor, Inc.Inventor: John David Mersh
-
Patent number: 8930926Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.Type: GrantFiled: April 16, 2010Date of Patent: January 6, 2015Assignee: Reservoir Labs, Inc.Inventors: Cedric Bastoul, Richard A. Lethin, Allen K. Leung, Benoit J. Meister, Peter Szilagyi, Nicolas T. Vasilache, David E. Wohlford
-
Patent number: 8930888Abstract: Modelling a serialized object stream can include receiving a stream of bytes corresponding to the serialized form of a first object, creating an empty initial model for containing a generic object and a generic class, and, upon detection of a class from the stream, constructing a corresponding generic class object in the model using a processor. Upon detection of a new object from the stream, a corresponding generic object in the model can be constructed. Further objects and classes in the model that are associated with the generic objects and classes can be referenced.Type: GrantFiled: June 28, 2012Date of Patent: January 6, 2015Assignee: International Business Machines CorporationInventor: Julien Canches
-
Patent number: 8924929Abstract: A system and methods are disclosed for executing a technical computing program in parallel in multiple execution environments. A program is invoked for execution in a first execution environment and from the invocation the program is executed in the first execution environment and one or more additional execution environments to provide for parallel execution of the program. New constructs in a technical computing programming language are disclosed for parallel programming of a technical computing program for execution in multiple execution environments. It is also further disclosed a system and method for changing the mode of operation of an execution environment from a sequential mode to a parallel mode of operation and vice-versa.Type: GrantFiled: August 28, 2009Date of Patent: December 30, 2014Assignee: The MathWorks, Inc.Inventor: Cleve Moler
-
Patent number: 8924946Abstract: Systems and methods for replacing inferior code segments with optimal code segments. Systems and methods for making such replacements for programming languages using Message Passing Interface (MPI) are provided. For example, at the compiler level, point-to-point code segments may be identified and replaced with all-to-all code segments. Programming code may include X10, Chapel and other programming languages that support parallel for loop.Type: GrantFiled: November 24, 2010Date of Patent: December 30, 2014Assignee: International Business Machines CorporationInventors: Ganesh Bikshandi, Krishna Nandivada Venkata, Igor Peshansky, Vijay Anand Saraswat