Including Scheduling Instructions Patents (Class 717/161)
  • Patent number: 11893370
    Abstract: According to one aspect, a method for compiling by a compilation tool a source code into a computer-executable code comprises receiving the source code as input of the compilation tool, translating the source code into an object code comprising machine instructions executable by a processor, then introducing, between machine instructions of the object code, additional instructions selected from illegal instructions and no-operation instructions so as to obtain the executable code, then delivering the executable code as output of the compilation tool.
    Type: Grant
    Filed: October 19, 2021
    Date of Patent: February 6, 2024
    Assignee: STMicroelectronics (Grand Ouest) SAS
    Inventors: Michel Jaouen, Stephane Le Roy, Moise Gergaud
  • Patent number: 11875197
    Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 11775299
    Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: October 3, 2023
    Assignee: Amazon Technologies, Inc.
    Inventor: Drazen Borkovic
  • Patent number: 11556517
    Abstract: An example operation includes one or more of solving, by a scheduler node, integer programming problem of maximizing a sum of organizations' endorsing peers that run chaincodes from a plurality of chaincodes within a consortium, making, by the scheduler node, endorsement policies (EPs) for the chaincodes from the plurality of the chaincodes to be satisfiable at any time, applying administrator's constraint of available endorsing peers to the maximized sum of organizations' endorsing peers, and adding resulting endorsing peers to a maintenance list.
    Type: Grant
    Filed: May 17, 2020
    Date of Patent: January 17, 2023
    Assignee: International Business Machines Corporation
    Inventors: Nir Rozenbaum, Artem Barger, Yacov Manevich
  • Patent number: 11550554
    Abstract: A computer device is provided that includes a processor configured to receive a source code for a program including at least two code files, and process the source code for the program to generate a machine-level code file for each of the at least two code files of the source code. The processor is further configured to generate control flow graph data for each machine-level code file generated for the at least two code files of the source code, generate a machine-level intermediate representation for each machine-level code file using a machine-level code file and the generated control flow graph data for that machine-level code file, merge the machine-level intermediate representations into a merged machine-level intermediate representation, and perform machine-level optimizations on the merged machine-level intermediate representation and output an optimized merged machine-level intermediate representation.
    Type: Grant
    Filed: January 7, 2021
    Date of Patent: January 10, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventor: Xiang Li
  • Patent number: 11483256
    Abstract: Systems and methods are disclosed for reducing latency and power consumption of on-chip movement through an approximate communication framework for network-on-chips (“NoCs”). The technology leverages the fact that big data applications (e.g., recognition, mining, and synthesis) can tolerate modest error and transfers data with the necessary accuracy, thereby improving the energy-efficiency and performance of multi-core processors.
    Type: Grant
    Filed: May 4, 2021
    Date of Patent: October 25, 2022
    Assignee: The George Washington University
    Inventors: Ahmed Louri, Yuechen Chen
  • Patent number: 11379943
    Abstract: To optimize the compilation of shaders for execution within an application, a computer system discovers the context in which the shaders are executed. The application is compiled and executed on a target platform. Snapshots of the application during execution are captured. A snapshot includes data and commands passed between the central processing unit and the graphics processing unit of the target platform to generate a single frame of graphics data. The shaders used in these snapshots are identified. These shaders are compiled with a number of different permutations of available compiler options, resulting in sets of differently compiled shaders. The snapshot is re-executed with the sets of differently compiled shaders, and performance is measured. The set of compiler options that results in compiled shaders providing better performance can be used as the set of compilation parameters for the set of shaders for this application.
    Type: Grant
    Filed: January 31, 2019
    Date of Patent: July 5, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ivan Nevraev, Cole Brooking, J. Andrew Goossen, Eric Christoffersen, Jason Strayer
  • Patent number: 11269711
    Abstract: Failure impact analysis (or “impact analysis”) is a process that involves identifying effects of a network event that are may or will results from the network event. In one example, this disclosure describes a method that includes generating, by a control system managing a resource group, a resource graph that models resource and event dependencies between a plurality of resources within the resource group; detecting, by the control system, a first event affecting a first resource of the plurality of resources, wherein the first event is a network event; and identifying, by the control system and based on the dependencies modeled by the resource graph, a second resource that is expected to be affected by the first event.
    Type: Grant
    Filed: July 14, 2020
    Date of Patent: March 8, 2022
    Assignee: Juniper Networks, Inc.
    Inventors: Jayanthi R, Javier Antich, Chandrasekhar A
  • Patent number: 11265351
    Abstract: A management system manages a plurality of information handling systems by creating custom policies for each information handling system based on information gathered from or about each information handling system indicating, e.g., the user's intent, use, request for usage, security posture, productivity needs, and/or behavior. The management system creates custom policies to avoid unnecessarily impacting a user's productivity.
    Type: Grant
    Filed: January 24, 2019
    Date of Patent: March 1, 2022
    Assignee: Dell Products L.P.
    Inventors: Carlton A. Andrews, Girish S. Dhoble, Joseph Kozlowski
  • Patent number: 11262989
    Abstract: A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.
    Type: Grant
    Filed: October 24, 2019
    Date of Patent: March 1, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Abhilash Bhandari, Venugopal Raghavan, Mohammad Asghar Ahmad Shahid, Anupama Rajesh Rasale
  • Patent number: 11256513
    Abstract: There is provided an apparatus that includes input circuitry to receive input data and output circuitry to output a sequence of instructions to be executed by data processing circuitry. Generation circuitry performs a generation process to generate the sequence of instructions using the input data. The sequence of instructions comprises an indirect control flow instruction having a field that indicates where a target of the indirect control flow instruction is stored. The generation process causes at least one of the instructions in the sequence of instructions to store a state of control flow speculation after execution of the indirect control flow instruction. The at least one of the instructions in the sequence of instructions that stores the state of control flow speculation is inhibited from being subject to data value speculation by the data processing circuitry.
    Type: Grant
    Filed: March 14, 2019
    Date of Patent: February 22, 2022
    Assignee: Arm Limited
    Inventors: Richard William Earnshaw, Kristof Evariste Georges Beyls, James Greenhalgh
  • Patent number: 11200038
    Abstract: Techniques for an ultra-fact software compilation of source code are provided. A compiler receives software code and may divide it into code sections. A map of ordered nodes may be generated, such that each node in the map may include a code section and the order of the nodes indicates an execution order of the software code. Each code section may be compiled into an executable object in parallel and independently from other code sections. A binary executable may be generated by linking executable objects generated from the code sections. The methodology significantly differs from existing source code compilation techniques because conventional compilers build executable sequentially, whereas the embodiments divide the source code into multiple smaller code sections and compile them individually and in parallel. Compiling multiple code sections improves the compilations in order of magnitude from conventional techniques.
    Type: Grant
    Filed: June 25, 2020
    Date of Patent: December 14, 2021
    Assignee: PayPal, Inc.
    Inventor: Abraham Richard Hoffman
  • Patent number: 11093298
    Abstract: A method for job management in an HPC environment includes determining an unallocated subset from a plurality of HPC nodes, with each of the unallocated HPC nodes comprising an integrated fabric. An HPC job is selected from a job queue and executed using at least a portion of the unallocated subset of nodes.
    Type: Grant
    Filed: March 5, 2020
    Date of Patent: August 17, 2021
    Assignee: Raytheon Company
    Inventors: Shannon V. Davidson, Anthony N. Richoux
  • Patent number: 11010302
    Abstract: A mechanism is described for facilitating general purpose input/output data capture and neutral cache system for autonomous machines. A method of embodiments, as described herein, includes capturing, by an image capturing device, one or more images of one or more objects, where the one or more images represent input data associated with a neural network. The method may further include determining accuracy of first output results generated by a default neural caching system by comparing the first output results with second output results predicted by a custom neural caching system. The method may further include outputting, based on the accuracy, a final output results including at least one of the first output results or the second output results.
    Type: Grant
    Filed: October 5, 2016
    Date of Patent: May 18, 2021
    Assignee: INTEL CORPORATION
    Inventors: Liwei Ma, Jiqiang Song
  • Patent number: 10901715
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for lazy compilation and kernel fusion in dynamic computation graphs. One of the operations is performed by generating an input graph based on translation of user code into an expression graph. The expression graph represents control flow dependencies of operations of the generated input graph. Optimization of the input graph is then performed by iterative application of optimization rules to the input graph. An optimized version of the input graph results from the application of the optimization rules. A transformation graph then is generated by comparing changes made from the original input graph to the final optimized version of the input graph. The transformation graph provides a blueprint such that the system may recreate the optimization of a similarly structured later generated input graph without having to reapply the optimization rules.
    Type: Grant
    Filed: September 26, 2019
    Date of Patent: January 26, 2021
    Inventor: Jonathan Raiman
  • Patent number: 10890958
    Abstract: The present disclosure provides a centralized power meter for a signal processing circuit, comprising: M sample buffers, each configured to buffer samples respectively from at least one of N sources, and trigger a request for power calculation of the buffered samples in response to the buffered samples, the request having a corresponding priority; a switch, configured to route the requests from the M sample buffers to one or more power calculation cores; the one or more power calculation cores, each configured to retrieve the samples from the sample buffer in an order of their corresponding priorities, in response to the routed requests, and to perform power calculation of the retrieved samples, wherein N and M are integers no less than 1, and N is no less than M. The present disclosure further provides a centralized power calculation method.
    Type: Grant
    Filed: September 9, 2015
    Date of Patent: January 12, 2021
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Gan Wen, Ge Huang, Kaifeng Zhang
  • Patent number: 10884744
    Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.
    Type: Grant
    Filed: August 18, 2017
    Date of Patent: January 5, 2021
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
  • Patent number: 10606635
    Abstract: Provided is an accelerator control apparatus including a data management table storing a name assigned to data and an identifier for an accelerator that stores the data on a local memory by associating the name and the identifier; a data management unit that is configured to determine, when receiving a first process that accepts data assigned with the name as input data, the accelerator that stores the data on the local memory, by referring to the data management table; and a task processing unit that is configured to control the accelerator being determined by data management unit to execute the first process.
    Type: Grant
    Filed: May 10, 2016
    Date of Patent: March 31, 2020
    Assignee: NEC CORPORATION
    Inventors: Jun Suzuki, Masaki Kan, Yuki Hayashi
  • Patent number: 10402176
    Abstract: Methods, apparatus, systems and articles of manufacture to compiler compile code to generate dataflow code are described. An example compiler apparatus includes an intermediate representation transformer to transform input software code to intermediate representation code; an instruction selector to insert machine instructions of a target execution platform in the intermediate representation code to generate machine intermediate representation code; and a target machine transformer to: convert a portion of the machine intermediate representation code to dataflow code to generate dataflow intermediate representation code; and allocate registers within the dataflow intermediate representation code.
    Type: Grant
    Filed: December 27, 2017
    Date of Patent: September 3, 2019
    Assignee: Intel Corporation
    Inventors: Kent Glossop, Kermin Fleming, Yongzhi Zhang, Simon Steely, Jr., Jim Sukha, Uma Srinivasan
  • Patent number: 10379885
    Abstract: A method and system for enhanced local communing optimization of compilation of a program. Within a first pass of a two pass approach, a determination is made as to where in the program to evaluate volatile expressions that can be commoned. In a second pass of the two pass approach, all remaining expressions that are not volatile expressions are commoned.
    Type: Grant
    Filed: November 16, 2017
    Date of Patent: August 13, 2019
    Assignee: International Business Machines Corporation
    Inventors: Andrew J. Craik, Patrick R. Doyle, Vijay Sundaresan
  • Patent number: 10360652
    Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.
    Type: Grant
    Filed: June 13, 2014
    Date of Patent: July 23, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
  • Patent number: 10102908
    Abstract: A microcomputer comprising a microprocessor unit and a first memory unit is disclosed. In one aspect, the microprocessor unit comprises at least one functional unit and at least one register. Further, the at least one register is a wide register comprising a plurality of second memory units which are capable to each contain one word, the wide register being adapted so that the second memory units are simultaneously accessible by the first memory unit, and at least part of the second memory units are separately accessible by the at least one functional unit. Further, the first memory unit is an embedded non-volatile memory unit.
    Type: Grant
    Filed: February 15, 2018
    Date of Patent: October 16, 2018
    Assignee: IMEC
    Inventors: Francky Catthoor, Komalan Manu Perumkunnil, Stefan Cosemans
  • Patent number: 9513976
    Abstract: A computer-implemented method and a corresponding computer system for emulation of Extended Memory Semantics (EMS) operations. The method and system include obtaining a set of computer instructions that include an EMS operation, converting the EMS operation into a corresponding atomic memory operation (AMO), and executing the AMO on at least one processor of a computer.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: December 6, 2016
    Assignee: Intel Corporation
    Inventor: Roger Golliver
  • Patent number: 9495225
    Abstract: A thread priority control mechanism is provided which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. In one aspect of the present invention, a thread-level speculation mechanism is provided which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting memory synchronization instructions. This supports hardware transaction memory in detecting transaction conflicts. This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a memory synchronization instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.
    Type: Grant
    Filed: October 24, 2013
    Date of Patent: November 15, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christian Jacobi, Marcel Mitran, Moriyoshi Ohara
  • Patent number: 9436582
    Abstract: Embodiments include dividing source code for an application into multiple program fragments by generating a control flow graph for the multiple program fragments. The control flow graph represents a graph structure with nodes representing the multiple program fragments and edges representing an execution order of the program fragments. Aspects include searching for a chosen assertion statement within a program fragment, wherein the chosen assertion statement must be satisfied for correct execution of the chosen program fragment. Aspects also include identifying an immediate parent program fragment for the chosen program fragment using the control flow graph and calculating an immediate parent assertion statement for the immediate parent program fragment using the chosen assertion logic statement. The immediate parent assertion statement is an over-approximate pre-condition of the chosen program fragment.
    Type: Grant
    Filed: November 18, 2015
    Date of Patent: September 6, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Viresh Paruthi, Mitra Purandare
  • Patent number: 9405596
    Abstract: Code versioning for enabling transactional memory region promotion may include receiving a portion of candidate source code; outlining the portion of candidate source code received for parallel execution; wrapping a critical region with entry and exit routines to enter into a speculation sub-process, wherein the entry and exit routines also gather conflict statistics at run time; and generating an outlined code portion comprising multiple loop versions using a processor.
    Type: Grant
    Filed: October 2, 2014
    Date of Patent: August 2, 2016
    Assignee: GlobalFoundries, Inc.
    Inventors: Hans Boettiger, Yaoqing Gao, Martin Ohmacht, Kai-Ting Amy Wang
  • Patent number: 9367658
    Abstract: Embodiments of the invention provide a method and apparatus for generating programmable logic for a hardware accelerator, the method comprising: generating a graph of nodes representing the programmable logic to be implemented in hardware; identifying nodes within the graph that affect external flow control of the programmable logic; retaining the identified nodes and removing or replacing all nodes which do not affect external flow control of the programmable logic in a modified graph; and simulating the modified graph or building a corresponding circuit of the retained nodes.
    Type: Grant
    Filed: June 22, 2011
    Date of Patent: June 14, 2016
    Assignee: Maxeler Technologies Ltd.
    Inventors: Oliver Pell, James Huggett
  • Patent number: 9355175
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing search results. In one aspect, a method includes receiving a query. A plurality of search results responsive to the query are identified. The search results are analyzed to determine that at least a first search result is associated with a first answer box topic. The search results are provided along with an answer box precursor for the first answer box topic.
    Type: Grant
    Filed: May 27, 2011
    Date of Patent: May 31, 2016
    Assignee: Google Inc.
    Inventors: Tal Cohen, Ziv Bar-Yossef, Igor Tsvetkov, Adi Mano, Oren Naim, Nitsan Oz, Nir Andelman, Pravir K. Gupta
  • Patent number: 9348600
    Abstract: A method and a system are provided for prioritizing the fetching of instructions for each of a plurality of executing instruction threads in a multi-threaded processor. Instructions come from at least one source of instructions. Each thread has a number of threads buffered for execution in an instruction buffer. A first metric for each thread is determined based on the number of instructions currently buffered. A second metric is then determined for each thread, this being an execution based metric. A priority order for the threads is determined from the first and second metrics, and an instruction is fetched from the source for the thread with the highest determined priority which is requesting an instruction.
    Type: Grant
    Filed: February 9, 2009
    Date of Patent: May 24, 2016
    Assignee: Imagination Technologies Limited
    Inventor: Andrew Webber
  • Patent number: 9317265
    Abstract: Disclosed here are methods, systems, paradigms and structures for optimizing intermediate representation (IR) of a script code for atomic execution. Atomic execution of the script is achieved by generating portions of the IR as an atomic transaction. In an atomic transaction, a series of operations either all execute, or none executes. The IR includes checkpoints that evaluate to one of two possible values. The checkpoint evaluates to a first value when there is no error during execution, and evaluates to a second value when an error occurs. The IR is optimized for atomic execution by regenerating a portion of the IR including the checkpoint and code associated with the checkpoint as a transaction. When an error occurs during the execution of the transaction, the transaction is aborted and a state of execution of the script code is reverted to a state prior to the beginning of the transaction.
    Type: Grant
    Filed: March 25, 2013
    Date of Patent: April 19, 2016
    Assignee: Facebook, Inc.
    Inventors: Ali-Reza Adl-Tabatabai, Guilherme de Lima Ottoni, Michael Paleczny
  • Patent number: 9292287
    Abstract: Provided is a loop scheduling method including scheduling a first loop using execution units, and scheduling a second loop using execution units available as a result of the scheduling of the first loop. An n-th loop (n>2) may be scheduled using a result of scheduling an (n?1)-th loop, similar to the (n?1)-th loop. The first loop may be a higher priority loop than the second loop.
    Type: Grant
    Filed: July 14, 2014
    Date of Patent: March 22, 2016
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yeon Bok Lee, Young Hwan Park, Ho Yang, Keshava Prasad
  • Patent number: 9239712
    Abstract: Apparatuses and methods may provide for determining a level of performance for processing one or more loops by a dynamic compiler and executing code optimizations to generate a pipelined schedule for the one or more loops that achieves the determined level of performance within a prescribed time period. In one example, a dependence graph may be established for the one or more loops, and each dependence graph may be partitioned into stages based on the level of performance.
    Type: Grant
    Filed: March 29, 2013
    Date of Patent: January 19, 2016
    Assignee: Intel Corporation
    Inventors: Hongbo Rong, Hyunchul Park, Youfeng Wu
  • Patent number: 9158658
    Abstract: A method, system, and computer program product for detecting merge conflicts and compilation errors in a collaborative integrated development environment are provided in the illustrative embodiments. Prior to at least one user committing a set of uncommitted changes associated with a source code to a repository, the computer receives the set of uncommitted changes associated with the source code. The computer creates at least one temporary branch corresponding to the set of uncommitted changes associated with the source code. The computer device merges the at least one temporary branch to corresponding portions of the source code. The computer determines whether a merge conflict has occurred. If the merge conflict occurred, the computer communicates a first notification to the at least one user, the first notification indicating the merge conflict.
    Type: Grant
    Filed: October 15, 2013
    Date of Patent: October 13, 2015
    Assignee: International Business Machines Corporation
    Inventors: George T. Bigwood, Jason T. McMann, Michael G. Nikitaides, Kaleb D. Walton
  • Patent number: 9152422
    Abstract: An apparatus and method for compressing trace data is provided. The apparatus includes a detection unit configured to detect trace data corresponding to one or more function units performing a substantially significant operation in a reconfigurable processor as valid trace data, and a compression unit configured to compress the valid trace data.
    Type: Grant
    Filed: July 7, 2011
    Date of Patent: October 6, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jae-Young Kim, Dong-Hoon Yoo, Yeon-Gon Cho, Hee-Jun Shim, Chang-Moo Kim
  • Patent number: 9063735
    Abstract: Provided are a reconfigurable processor, which is capable of reducing the probability of an incorrect computation by analyzing the dependence between memory access instructions and allocating the memory access instructions between a plurality of processing elements (PEs) based on the results of the analysis, and a method of controlling the reconfigurable processor. The reconfigurable processor extracts an execution trace from simulation results, and analyzes the memory dependence between instructions included in different iterations based on parts of the execution trace of memory access instructions.
    Type: Grant
    Filed: October 13, 2011
    Date of Patent: June 23, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hee-Jin Ahn, Dong-Hoon Yoo, Bernhard Egger, Min-Wook Ahn, Jin-Seok Lee, Tai-Song Jin, Won-Sub Kim
  • Patent number: 9043582
    Abstract: Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: May 26, 2015
    Assignee: Qualcomm Innovation Center, Inc.
    Inventor: Sergei Larin
  • Publication number: 20150100950
    Abstract: A method for scheduling loop processing of a reconfigurable processor includes generating a dependence graph of instructions for the loop processing; mapping a first register file of the reconfigurable processor on an arrow indicating inter-iteration dependence on the dependence graph; and searching for schedules of the instructions based on the mapping result.
    Type: Application
    Filed: October 6, 2014
    Publication date: April 9, 2015
    Inventors: Min-wook AHN, Won-sub Kim, Tai-song Jin, Seung-won Lee, Jin-seok Lee
  • Patent number: 8990547
    Abstract: Systems, methodologies, computer-readable media, and other embodiments associated with ordering instructions are described. One exemplary system embodiment can include an analysis logic configured to analyze executable instructions from an executable program. A re-write logic can be configured to re-order selected load instructions within the executable program based on latency times for the selected load instructions.
    Type: Grant
    Filed: August 23, 2005
    Date of Patent: March 24, 2015
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: James R. Callister, Richard E. Hank, Teresa L. Johnson
  • Patent number: 8984525
    Abstract: A method for job management in an HPC environment includes determining an unallocated subset from a plurality of HPC nodes, with each of the unallocated HPC nodes comprising an integrated fabric. An HPC job is selected from a job queue and executed using at least a portion of the unallocated subset of nodes.
    Type: Grant
    Filed: October 11, 2013
    Date of Patent: March 17, 2015
    Assignee: Raytheon Company
    Inventors: Shannon V. Davidson, Anthony N. Richoux
  • Patent number: 8972961
    Abstract: A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream.
    Type: Grant
    Filed: May 11, 2011
    Date of Patent: March 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Juergen Koehl, Jens Leenstra, Philipp Panitz, Hans Schlenker
  • Patent number: 8959497
    Abstract: One embodiment of the present invention sets forth a technique for partitioning a predecessor thread program into sub-programs and dynamically spawning a thread grid of the sub-programs based on the outcome of a conditional statement in the predecessor thread program. The programming instructions for the predecessor thread program are analyzed to assess the benefit of partitioning the thread program at a conditional statement into sub-programs. If the predecessor thread program is partitioned, then each branch of the conditional statement may be used to form a separate sub-program. Predicate tables are populated at the predecessor thread program run-time to establish which possible instances of the thread sub-programs should be spawned in subsequent execution phases.
    Type: Grant
    Filed: August 29, 2008
    Date of Patent: February 17, 2015
    Assignee: NVIDIA Corporation
    Inventors: John A. Stratton, David Luebke
  • Patent number: 8954946
    Abstract: A static branch prediction method and code execution method for a pipeline processor, and a code compiling method for static branch prediction, are provided herein. The static branch prediction method includes predicting a conditional branch code as taken or not-taken, adding the prediction information, converting the conditional branch code into a jump target address setting (JTS) code including target address information, branch time information, and a test code, and scheduling codes in a block. The code may be scheduled into a last slot of the block, and the JTS code may be scheduled into an empty slot after all the other codes in the block are scheduled. When the conditional branch code is predicted as taken in the prediction operation, a target address indicated by the target address information may be fetched at a cycle time indicated by the branch time information.
    Type: Grant
    Filed: January 25, 2010
    Date of Patent: February 10, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Tai-song Jin, Dong-kwan Suh, Suk-jin Kim
  • Patent number: 8949809
    Abstract: A system and associated method for automatically pipeline parallelizing a nested loop in sequential code over a predefined number of threads. Pursuant to task dependencies of the nested loop, each subloop of the nested loop are allocated to a respective thread. Combinations of stage partitions executing the nested loop are configured for parallel execution of a subloop where permitted. For each combination of stage partitions, a respective bottleneck is calculated and a combination with a minimum bottleneck is selected for parallelization.
    Type: Grant
    Filed: March 1, 2012
    Date of Patent: February 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Pradeep Varma, Manish Gupta, Monika Gupta, Naga Praveen Kumar Katta
  • Patent number: 8949820
    Abstract: A technique for streaming from a media device involves enabling a local device to function as a streaming server. An example of a method according to the technique includes inserting a removable storage device that includes programs associated with a streaming application, running one or more of the programs, ensuring that a streaming software player is installed, and executing a streaming-related activity associated with the streaming application. An example of a system according to the technique includes a means for providing a streaming application that expects content to be found on a media drive, a means for intercepting requests for content expected to be found on the media drive, and a means for honoring the requests with content from a different media location.
    Type: Grant
    Filed: November 26, 2012
    Date of Patent: February 3, 2015
    Assignee: Numecent Holdings, Inc.
    Inventors: Jeffrey de Vries, Greg Zavertnik, Ann Hubbell
  • Patent number: 8943487
    Abstract: Particular embodiments optimize a C++ function comprising one or more loops for symbolic execution, comprising for each loop, if there is a branching condition within the loop, then rewrite the loop to move the branching condition outside the loop. Particular embodiments may further optimize the C++ function through simplified symbolic expressions and adding constructs forcing delayed interpretation of symbolic expressions during the symbolic execution.
    Type: Grant
    Filed: January 20, 2011
    Date of Patent: January 27, 2015
    Assignee: Fujitsu Limited
    Inventors: Guodong Li, Sreeranga P. Rajan, Indradeep Ghosh
  • Patent number: 8935685
    Abstract: A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream.
    Type: Grant
    Filed: April 28, 2012
    Date of Patent: January 13, 2015
    Assignee: International Business Machines Corporation
    Inventors: Juergen Koehl, Jens Leenstra, Philipp Panitz, Hans Schlenker
  • Patent number: 8935686
    Abstract: A condition detected by a virtual routine may be treated by setting an error code or raising an exception, depending on circumstances. Enhanced vtable layouts promote availability of both error-ID-based and exception-based virtual routines, while maintaining compatibility. Compilers treat virtual routines based on their circumstances. One enhanced vtable includes error-ID-based routine pointers in a COM-layout-compatible portion and exception-based routine pointers in an extension. For a virtual routine not overridden by a derived class, a compiler generates a direct call. For an object instance of a specific type, the compiler generates a direct exception-based call for the object's routine. For a factory-sourced object's routine, the compiler generates a virtual exception-based call. When the virtual routine belongs to a component having an enhanced vtable, the compiler may generate a virtual call using the exception-based routine pointer. Code wrappers between COM and native format may also be used.
    Type: Grant
    Filed: August 28, 2012
    Date of Patent: January 13, 2015
    Assignee: Microsoft Corporation
    Inventors: Deon Brewis, James Springfield, Sridhar S. Madhugiri
  • Patent number: 8930926
    Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
    Type: Grant
    Filed: April 16, 2010
    Date of Patent: January 6, 2015
    Assignee: Reservoir Labs, Inc.
    Inventors: Cedric Bastoul, Richard A. Lethin, Allen K. Leung, Benoit J. Meister, Peter Szilagyi, Nicolas T. Vasilache, David E. Wohlford
  • Patent number: 8918770
    Abstract: A system and method for compiling includes, for a parallelizable code portion of an application stored on a computer readable storage medium, determining one or more variables that are to be transferred to and/or from a coprocessor if the parallelizable code portion were to be offloaded. A start location and an end location are determined for at least one of the one or more variables as a size in memory. The parallelizable code portion is transformed by inserting an offload construct around the parallelizable code portion and passing the one or more variables and the size as arguments of the offload construct such that the parallelizable code portion is offloaded to a coprocessor at runtime.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: December 23, 2014
    Assignee: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
  • Patent number: 8898648
    Abstract: A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
    Type: Grant
    Filed: November 30, 2012
    Date of Patent: November 25, 2014
    Assignee: International Business Machines Corporation
    Inventors: I-Hsin Chung, Guojing Cong, Hiroki Murata, Yasushi Negishi, Hui-Fang Wen