Including Scheduling Instructions Patents (Class 717/161)

System and process for compiling a source code

Patent number: 11893370

Abstract: According to one aspect, a method for compiling by a compilation tool a source code into a computer-executable code comprises receiving the source code as input of the compilation tool, translating the source code into an object code comprising machine instructions executable by a processor, then introducing, between machine instructions of the object code, additional instructions selected from illegal instructions and no-operation instructions so as to obtain the executable code, then delivering the executable code as output of the compilation tool.

Type: Grant

Filed: October 19, 2021

Date of Patent: February 6, 2024

Assignee: STMicroelectronics (Grand Ouest) SAS

Inventors: Michel Jaouen, Stephane Le Roy, Moise Gergaud
Management of thrashing in a GPU

Patent number: 11875197

Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.

Type: Grant

Filed: December 29, 2020

Date of Patent: January 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
Vector clocks for highly concurrent execution engines

Patent number: 11775299

Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.

Type: Grant

Filed: March 29, 2021

Date of Patent: October 3, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Drazen Borkovic
Blockchain maintenance

Patent number: 11556517

Abstract: An example operation includes one or more of solving, by a scheduler node, integer programming problem of maximizing a sum of organizations' endorsing peers that run chaincodes from a plurality of chaincodes within a consortium, making, by the scheduler node, endorsement policies (EPs) for the chaincodes from the plurality of the chaincodes to be satisfiable at any time, applying administrator's constraint of available endorsing peers to the maximized sum of organizations' endorsing peers, and adding resulting endorsing peers to a maintenance list.

Type: Grant

Filed: May 17, 2020

Date of Patent: January 17, 2023

Assignee: International Business Machines Corporation

Inventors: Nir Rozenbaum, Artem Barger, Yacov Manevich
Merged machine-level intermediate representation optimizations

Patent number: 11550554

Abstract: A computer device is provided that includes a processor configured to receive a source code for a program including at least two code files, and process the source code for the program to generate a machine-level code file for each of the at least two code files of the source code. The processor is further configured to generate control flow graph data for each machine-level code file generated for the at least two code files of the source code, generate a machine-level intermediate representation for each machine-level code file using a machine-level code file and the generated control flow graph data for that machine-level code file, merge the machine-level intermediate representations into a merged machine-level intermediate representation, and perform machine-level optimizations on the merged machine-level intermediate representation and output an optimized merged machine-level intermediate representation.

Type: Grant

Filed: January 7, 2021

Date of Patent: January 10, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor: Xiang Li
Systems and methods for approximate communication framework for network-on-chips

Patent number: 11483256

Abstract: Systems and methods are disclosed for reducing latency and power consumption of on-chip movement through an approximate communication framework for network-on-chips (“NoCs”). The technology leverages the fact that big data applications (e.g., recognition, mining, and synthesis) can tolerate modest error and transfers data with the necessary accuracy, thereby improving the energy-efficiency and performance of multi-core processors.

Type: Grant

Filed: May 4, 2021

Date of Patent: October 25, 2022

Assignee: The George Washington University

Inventors: Ahmed Louri, Yuechen Chen
Optimizing compilation of shaders

Patent number: 11379943

Abstract: To optimize the compilation of shaders for execution within an application, a computer system discovers the context in which the shaders are executed. The application is compiled and executed on a target platform. Snapshots of the application during execution are captured. A snapshot includes data and commands passed between the central processing unit and the graphics processing unit of the target platform to generate a single frame of graphics data. The shaders used in these snapshots are identified. These shaders are compiled with a number of different permutations of available compiler options, resulting in sets of differently compiled shaders. The snapshot is re-executed with the sets of differently compiled shaders, and performance is measured. The set of compiler options that results in compiled shaders providing better performance can be used as the set of compilation parameters for the set of shaders for this application.

Type: Grant

Filed: January 31, 2019

Date of Patent: July 5, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Ivan Nevraev, Cole Brooking, J. Andrew Goossen, Eric Christoffersen, Jason Strayer
Failure impact analysis of network events

Patent number: 11269711

Abstract: Failure impact analysis (or “impact analysis”) is a process that involves identifying effects of a network event that are may or will results from the network event. In one example, this disclosure describes a method that includes generating, by a control system managing a resource group, a resource graph that models resource and event dependencies between a plurality of resources within the resource group; detecting, by the control system, a first event affecting a first resource of the plurality of resources, wherein the first event is a network event; and identifying, by the control system and based on the dependencies modeled by the resource graph, a second resource that is expected to be affected by the first event.

Type: Grant

Filed: July 14, 2020

Date of Patent: March 8, 2022

Assignee: Juniper Networks, Inc.

Inventors: Jayanthi R, Javier Antich, Chandrasekhar A
Dynamic policy creation based on user or system behavior

Patent number: 11265351

Abstract: A management system manages a plurality of information handling systems by creating custom policies for each information handling system based on information gathered from or about each information handling system indicating, e.g., the user's intent, use, request for usage, security posture, productivity needs, and/or behavior. The management system creates custom policies to avoid unnecessarily impacting a user's productivity.

Type: Grant

Filed: January 24, 2019

Date of Patent: March 1, 2022

Assignee: Dell Products L.P.

Inventors: Carlton A. Andrews, Girish S. Dhoble, Joseph Kozlowski
Automatic generation of efficient vector code with low overhead in a time-efficient manner independent of vector width

Patent number: 11262989

Abstract: A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.

Type: Grant

Filed: October 24, 2019

Date of Patent: March 1, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Abhilash Bhandari, Venugopal Raghavan, Mohammad Asghar Ahmad Shahid, Anupama Rajesh Rasale
Indirect control flow instructions and inhibiting data value speculation

Patent number: 11256513

Abstract: There is provided an apparatus that includes input circuitry to receive input data and output circuitry to output a sequence of instructions to be executed by data processing circuitry. Generation circuitry performs a generation process to generate the sequence of instructions using the input data. The sequence of instructions comprises an indirect control flow instruction having a field that indicates where a target of the indirect control flow instruction is stored. The generation process causes at least one of the instructions in the sequence of instructions to store a state of control flow speculation after execution of the indirect control flow instruction. The at least one of the instructions in the sequence of instructions that stores the state of control flow speculation is inhibited from being subject to data value speculation by the data processing circuitry.

Type: Grant

Filed: March 14, 2019

Date of Patent: February 22, 2022

Assignee: Arm Limited

Inventors: Richard William Earnshaw, Kristof Evariste Georges Beyls, James Greenhalgh
Fast compiling source code without dependencies

Patent number: 11200038

Abstract: Techniques for an ultra-fact software compilation of source code are provided. A compiler receives software code and may divide it into code sections. A map of ordered nodes may be generated, such that each node in the map may include a code section and the order of the nodes indicates an execution order of the software code. Each code section may be compiled into an executable object in parallel and independently from other code sections. A binary executable may be generated by linking executable objects generated from the code sections. The methodology significantly differs from existing source code compilation techniques because conventional compilers build executable sequentially, whereas the embodiments divide the source code into multiple smaller code sections and compile them individually and in parallel. Compiling multiple code sections improves the compilations in order of magnitude from conventional techniques.

Type: Grant

Filed: June 25, 2020

Date of Patent: December 14, 2021

Assignee: PayPal, Inc.

Inventor: Abraham Richard Hoffman
System and method for topology-aware job scheduling and backfilling in an HPC environment

Patent number: 11093298

Abstract: A method for job management in an HPC environment includes determining an unallocated subset from a plurality of HPC nodes, with each of the unallocated HPC nodes comprising an integrated fabric. An HPC job is selected from a job queue and executed using at least a portion of the unallocated subset of nodes.

Type: Grant

Filed: March 5, 2020

Date of Patent: August 17, 2021

Assignee: Raytheon Company

Inventors: Shannon V. Davidson, Anthony N. Richoux
General purpose input/output data capture and neural cache system for autonomous machines

Patent number: 11010302

Abstract: A mechanism is described for facilitating general purpose input/output data capture and neutral cache system for autonomous machines. A method of embodiments, as described herein, includes capturing, by an image capturing device, one or more images of one or more objects, where the one or more images represent input data associated with a neural network. The method may further include determining accuracy of first output results generated by a default neural caching system by comparing the first output results with second output results predicted by a custom neural caching system. The method may further include outputting, based on the accuracy, a final output results including at least one of the first output results or the second output results.

Type: Grant

Filed: October 5, 2016

Date of Patent: May 18, 2021

Assignee: INTEL CORPORATION

Inventors: Liwei Ma, Jiqiang Song
Lazy compilation and kernel fusion in dynamic computation graphs

Patent number: 10901715

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for lazy compilation and kernel fusion in dynamic computation graphs. One of the operations is performed by generating an input graph based on translation of user code into an expression graph. The expression graph represents control flow dependencies of operations of the generated input graph. Optimization of the input graph is then performed by iterative application of optimization rules to the input graph. An optimized version of the input graph results from the application of the optimization rules. A transformation graph then is generated by comparing changes made from the original input graph to the final optimized version of the input graph. The transformation graph provides a blueprint such that the system may recreate the optimization of a similarly structured later generated input graph without having to reapply the optimization rules.

Type: Grant

Filed: September 26, 2019

Date of Patent: January 26, 2021

Inventor: Jonathan Raiman
Centralized power meter and centralized power calculation method

Patent number: 10890958

Abstract: The present disclosure provides a centralized power meter for a signal processing circuit, comprising: M sample buffers, each configured to buffer samples respectively from at least one of N sources, and trigger a request for power calculation of the buffered samples in response to the buffered samples, the request having a corresponding priority; a switch, configured to route the requests from the M sample buffers to one or more power calculation cores; the one or more power calculation cores, each configured to retrieve the samples from the sample buffer in an order of their corresponding priorities, in response to the routed requests, and to perform power calculation of the retrieved samples, wherein N and M are integers no less than 1, and N is no less than M. The present disclosure further provides a centralized power calculation method.

Type: Grant

Filed: September 9, 2015

Date of Patent: January 12, 2021

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Gan Wen, Ge Huang, Kaifeng Zhang
System and method of loop vectorization by compressing indices and data elements from iterations based on a control mask

Patent number: 10884744

Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.

Type: Grant

Filed: August 18, 2017

Date of Patent: January 5, 2021

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
Accelerator control apparatus, accelerator control method, and storage medium

Patent number: 10606635

Abstract: Provided is an accelerator control apparatus including a data management table storing a name assigned to data and an identifier for an accelerator that stores the data on a local memory by associating the name and the identifier; a data management unit that is configured to determine, when receiving a first process that accepts data assigned with the name as input data, the accelerator that stores the data on the local memory, by referring to the data management table; and a task processing unit that is configured to control the accelerator being determined by data management unit to execute the first process.

Type: Grant

Filed: May 10, 2016

Date of Patent: March 31, 2020

Assignee: NEC CORPORATION

Inventors: Jun Suzuki, Masaki Kan, Yuki Hayashi
Methods and apparatus to compile code to generate data flow code

Patent number: 10402176

Abstract: Methods, apparatus, systems and articles of manufacture to compiler compile code to generate dataflow code are described. An example compiler apparatus includes an intermediate representation transformer to transform input software code to intermediate representation code; an instruction selector to insert machine instructions of a target execution platform in the intermediate representation code to generate machine intermediate representation code; and a target machine transformer to: convert a portion of the machine intermediate representation code to dataflow code to generate dataflow intermediate representation code; and allocate registers within the dataflow intermediate representation code.

Type: Grant

Filed: December 27, 2017

Date of Patent: September 3, 2019

Assignee: Intel Corporation

Inventors: Kent Glossop, Kermin Fleming, Yongzhi Zhang, Simon Steely, Jr., Jim Sukha, Uma Srinivasan
Enhanced local commoning

Patent number: 10379885

Abstract: A method and system for enhanced local communing optimization of compilation of a program. Within a first pass of a two pass approach, a determination is made as to where in the program to evaluate volatile expressions that can be commoned. In a second pass of the two pass approach, all remaining expressions that are not volatile expressions are commoned.

Type: Grant

Filed: November 16, 2017

Date of Patent: August 13, 2019

Assignee: International Business Machines Corporation

Inventors: Andrew J. Craik, Patrick R. Doyle, Vijay Sundaresan
Wavefront resource virtualization

Patent number: 10360652

Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.

Type: Grant

Filed: June 13, 2014

Date of Patent: July 23, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
Method and device to reduce leakage and dynamic energy consumption in high-speed memories

Patent number: 10102908

Abstract: A microcomputer comprising a microprocessor unit and a first memory unit is disclosed. In one aspect, the microprocessor unit comprises at least one functional unit and at least one register. Further, the at least one register is a wide register comprising a plurality of second memory units which are capable to each contain one word, the wide register being adapted so that the second memory units are simultaneously accessible by the first memory unit, and at least part of the second memory units are separately accessible by the at least one functional unit. Further, the first memory unit is an embedded non-volatile memory unit.

Type: Grant

Filed: February 15, 2018

Date of Patent: October 16, 2018

Assignee: IMEC

Inventors: Francky Catthoor, Komalan Manu Perumkunnil, Stefan Cosemans
Providing extended memory semantics with atomic memory operations

Patent number: 9513976

Abstract: A computer-implemented method and a corresponding computer system for emulation of Extended Memory Semantics (EMS) operations. The method and system include obtaining a set of computer instructions that include an EMS operation, converting the EMS operation into a corresponding atomic memory operation (AMO), and executing the AMO on at least one processor of a computer.

Type: Grant

Filed: December 30, 2011

Date of Patent: December 6, 2016

Assignee: Intel Corporation

Inventor: Roger Golliver
Parallel execution mechanism and operating method thereof

Patent number: 9495225

Abstract: A thread priority control mechanism is provided which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. In one aspect of the present invention, a thread-level speculation mechanism is provided which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting memory synchronization instructions. This supports hardware transaction memory in detecting transaction conflicts. This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a memory synchronization instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.

Type: Grant

Filed: October 24, 2013

Date of Patent: November 15, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Christian Jacobi, Marcel Mitran, Moriyoshi Ohara
Calculating an immediate parent assertion statement for program verification

Patent number: 9436582

Abstract: Embodiments include dividing source code for an application into multiple program fragments by generating a control flow graph for the multiple program fragments. The control flow graph represents a graph structure with nodes representing the multiple program fragments and edges representing an execution order of the program fragments. Aspects include searching for a chosen assertion statement within a program fragment, wherein the chosen assertion statement must be satisfied for correct execution of the chosen program fragment. Aspects also include identifying an immediate parent program fragment for the chosen program fragment using the control flow graph and calculating an immediate parent assertion statement for the immediate parent program fragment using the chosen assertion logic statement. The immediate parent assertion statement is an over-approximate pre-condition of the chosen program fragment.

Type: Grant

Filed: November 18, 2015

Date of Patent: September 6, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Viresh Paruthi, Mitra Purandare
Code versioning for enabling transactional memory promotion

Patent number: 9405596

Abstract: Code versioning for enabling transactional memory region promotion may include receiving a portion of candidate source code; outlining the portion of candidate source code received for parallel execution; wrapping a critical region with entry and exit routines to enter into a speculation sub-process, wherein the entry and exit routines also gather conflict statistics at run time; and generating an outlined code portion comprising multiple loop versions using a processor.

Type: Grant

Filed: October 2, 2014

Date of Patent: August 2, 2016

Assignee: GlobalFoundries, Inc.

Inventors: Hans Boettiger, Yaoqing Gao, Martin Ohmacht, Kai-Ting Amy Wang
Method and apparatus for designing and generating a stream processor

Patent number: 9367658

Abstract: Embodiments of the invention provide a method and apparatus for generating programmable logic for a hardware accelerator, the method comprising: generating a graph of nodes representing the programmable logic to be implemented in hardware; identifying nodes within the graph that affect external flow control of the programmable logic; retaining the identified nodes and removing or replacing all nodes which do not affect external flow control of the programmable logic in a modified graph; and simulating the modified graph or building a corresponding circuit of the retained nodes.

Type: Grant

Filed: June 22, 2011

Date of Patent: June 14, 2016

Assignee: Maxeler Technologies Ltd.

Inventors: Oliver Pell, James Huggett
Triggering answer boxes

Patent number: 9355175

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing search results. In one aspect, a method includes receiving a query. A plurality of search results responsive to the query are identified. The search results are analyzed to determine that at least a first search result is associated with a first answer box topic. The search results are provided along with an answer box precursor for the first answer box topic.

Type: Grant

Filed: May 27, 2011

Date of Patent: May 31, 2016

Assignee: Google Inc.

Inventors: Tal Cohen, Ziv Bar-Yossef, Igor Tsvetkov, Adi Mano, Oren Naim, Nitsan Oz, Nir Andelman, Pravir K. Gupta
Prioritising of instruction fetching in microprocessor systems

Patent number: 9348600

Abstract: A method and a system are provided for prioritizing the fetching of instructions for each of a plurality of executing instruction threads in a multi-threaded processor. Instructions come from at least one source of instructions. Each thread has a number of threads buffered for execution in an instruction buffer. A first metric for each thread is determined based on the number of instructions currently buffered. A second metric is then determined for each thread, this being an execution based metric. A priority order for the threads is determined from the first and second metrics, and an instruction is fetched from the source for the thread with the highest determined priority which is requesting an instruction.

Type: Grant

Filed: February 9, 2009

Date of Patent: May 24, 2016

Assignee: Imagination Technologies Limited

Inventor: Andrew Webber
Optimizing intermediate representation of script code for atomic execution

Patent number: 9317265

Abstract: Disclosed here are methods, systems, paradigms and structures for optimizing intermediate representation (IR) of a script code for atomic execution. Atomic execution of the script is achieved by generating portions of the IR as an atomic transaction. In an atomic transaction, a series of operations either all execute, or none executes. The IR includes checkpoints that evaluate to one of two possible values. The checkpoint evaluates to a first value when there is no error during execution, and evaluates to a second value when an error occurs. The IR is optimized for atomic execution by regenerating a portion of the IR including the checkpoint and code associated with the checkpoint as a transaction. When an error occurs during the execution of the transaction, the transaction is aborted and a state of execution of the script code is reverted to a state prior to the beginning of the transaction.

Type: Grant

Filed: March 25, 2013

Date of Patent: April 19, 2016

Assignee: Facebook, Inc.

Inventors: Ali-Reza Adl-Tabatabai, Guilherme de Lima Ottoni, Michael Paleczny
Method of scheduling loops for processor having a plurality of functional units

Patent number: 9292287

Abstract: Provided is a loop scheduling method including scheduling a first loop using execution units, and scheduling a second loop using execution units available as a result of the scheduling of the first loop. An n-th loop (n>2) may be scheduled using a result of scheduling an (n?1)-th loop, similar to the (n?1)-th loop. The first loop may be a higher priority loop than the second loop.

Type: Grant

Filed: July 14, 2014

Date of Patent: March 22, 2016

Assignee: Samsung Electronics Co., Ltd.

Inventors: Yeon Bok Lee, Young Hwan Park, Ho Yang, Keshava Prasad
Software pipelining at runtime

Patent number: 9239712

Abstract: Apparatuses and methods may provide for determining a level of performance for processing one or more loops by a dynamic compiler and executing code optimizations to generate a pipelined schedule for the one or more loops that achieves the determined level of performance within a prescribed time period. In one example, a dependence graph may be established for the one or more loops, and each dependence graph may be partitioned into stages based on the level of performance.

Type: Grant

Filed: March 29, 2013

Date of Patent: January 19, 2016

Assignee: Intel Corporation

Inventors: Hongbo Rong, Hyunchul Park, Youfeng Wu
Detecting merge conflicts and compilation errors in a collaborative integrated development environment

Patent number: 9158658

Abstract: A method, system, and computer program product for detecting merge conflicts and compilation errors in a collaborative integrated development environment are provided in the illustrative embodiments. Prior to at least one user committing a set of uncommitted changes associated with a source code to a repository, the computer receives the set of uncommitted changes associated with the source code. The computer creates at least one temporary branch corresponding to the set of uncommitted changes associated with the source code. The computer device merges the at least one temporary branch to corresponding portions of the source code. The computer determines whether a merge conflict has occurred. If the merge conflict occurred, the computer communicates a first notification to the at least one user, the first notification indicating the merge conflict.

Type: Grant

Filed: October 15, 2013

Date of Patent: October 13, 2015

Assignee: International Business Machines Corporation

Inventors: George T. Bigwood, Jason T. McMann, Michael G. Nikitaides, Kaleb D. Walton
Apparatus and method for compressing trace data

Patent number: 9152422

Abstract: An apparatus and method for compressing trace data is provided. The apparatus includes a detection unit configured to detect trace data corresponding to one or more function units performing a substantially significant operation in a reconfigurable processor as valid trace data, and a compression unit configured to compress the valid trace data.

Type: Grant

Filed: July 7, 2011

Date of Patent: October 6, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jae-Young Kim, Dong-Hoon Yoo, Yeon-Gon Cho, Hee-Jun Shim, Chang-Moo Kim
Reconfigurable processor and method for processing loop having memory dependency

Patent number: 9063735

Abstract: Provided are a reconfigurable processor, which is capable of reducing the probability of an incorrect computation by analyzing the dependence between memory access instructions and allocating the memory access instructions between a plurality of processing elements (PEs) based on the results of the analysis, and a method of controlling the reconfigurable processor. The reconfigurable processor extracts an execution trace from simulation results, and analyzes the memory dependence between instructions included in different iterations based on parts of the execution trace of memory access instructions.

Type: Grant

Filed: October 13, 2011

Date of Patent: June 23, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hee-Jin Ahn, Dong-Hoon Yoo, Bernhard Egger, Min-Wook Ahn, Jin-Seok Lee, Tai-Song Jin, Won-Sub Kim
Enhanced instruction scheduling during compilation of high level source code for improved executable code

Patent number: 9043582

Abstract: Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code.

Type: Grant

Filed: September 14, 2012

Date of Patent: May 26, 2015

Assignee: Qualcomm Innovation Center, Inc.

Inventor: Sergei Larin
METHOD AND APPARATUS FOR INSTRUCTION SCHEDULING USING SOFTWARE PIPELINING

Publication number: 20150100950

Abstract: A method for scheduling loop processing of a reconfigurable processor includes generating a dependence graph of instructions for the loop processing; mapping a first register file of the reconfigurable processor on an arrow indicating inter-iteration dependence on the dependence graph; and searching for schedules of the instructions based on the mapping result.

Type: Application

Filed: October 6, 2014

Publication date: April 9, 2015

Inventors: Min-wook AHN, Won-sub Kim, Tai-song Jin, Seung-won Lee, Jin-seok Lee
Systems and methods for re-ordering instructions

Patent number: 8990547

Abstract: Systems, methodologies, computer-readable media, and other embodiments associated with ordering instructions are described. One exemplary system embodiment can include an analysis logic configured to analyze executable instructions from an executable program. A re-write logic can be configured to re-order selected load instructions within the executable program based on latency times for the selected load instructions.

Type: Grant

Filed: August 23, 2005

Date of Patent: March 24, 2015

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: James R. Callister, Richard E. Hank, Teresa L. Johnson
System and method for topology-aware job scheduling and backfilling in an HPC environment

Patent number: 8984525

Abstract: A method for job management in an HPC environment includes determining an unallocated subset from a plurality of HPC nodes, with each of the unallocated HPC nodes comprising an integrated fabric. An HPC job is selected from a job queue and executed using at least a portion of the unallocated subset of nodes.

Type: Grant

Filed: October 11, 2013

Date of Patent: March 17, 2015

Assignee: Raytheon Company

Inventors: Shannon V. Davidson, Anthony N. Richoux
Instruction scheduling approach to improve processor performance

Patent number: 8972961

Abstract: A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream.

Type: Grant

Filed: May 11, 2011

Date of Patent: March 3, 2015

Assignee: International Business Machines Corporation

Inventors: Juergen Koehl, Jens Leenstra, Philipp Panitz, Hans Schlenker
System and method for dynamically spawning thread blocks within multi-threaded processing systems

Patent number: 8959497

Abstract: One embodiment of the present invention sets forth a technique for partitioning a predecessor thread program into sub-programs and dynamically spawning a thread grid of the sub-programs based on the outcome of a conditional statement in the predecessor thread program. The programming instructions for the predecessor thread program are analyzed to assess the benefit of partitioning the thread program at a conditional statement into sub-programs. If the predecessor thread program is partitioned, then each branch of the conditional statement may be used to form a separate sub-program. Predicate tables are populated at the predecessor thread program run-time to establish which possible instances of the thread sub-programs should be spawned in subsequent execution phases.

Type: Grant

Filed: August 29, 2008

Date of Patent: February 17, 2015

Assignee: NVIDIA Corporation

Inventors: John A. Stratton, David Luebke
Static branch prediction method and code execution method for pipeline processor, and code compiling method for static branch prediction

Patent number: 8954946

Abstract: A static branch prediction method and code execution method for a pipeline processor, and a code compiling method for static branch prediction, are provided herein. The static branch prediction method includes predicting a conditional branch code as taken or not-taken, adding the prediction information, converting the conditional branch code into a jump target address setting (JTS) code including target address information, branch time information, and a test code, and scheduling codes in a block. The code may be scheduled into a last slot of the block, and the JTS code may be scheduled into an empty slot after all the other codes in the block are scheduled. When the conditional branch code is predicted as taken in the prediction operation, a target address indicated by the target address information may be fetched at a cycle time indicated by the branch time information.

Type: Grant

Filed: January 25, 2010

Date of Patent: February 10, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Tai-song Jin, Dong-kwan Suh, Suk-jin Kim
Automatic pipeline parallelization of sequential code

Patent number: 8949809

Abstract: A system and associated method for automatically pipeline parallelizing a nested loop in sequential code over a predefined number of threads. Pursuant to task dependencies of the nested loop, each subloop of the nested loop are allocated to a respective thread. Combinations of stage partitions executing the nested loop are configured for parallel execution of a subloop where permitted. For each combination of stage partitions, a respective bottleneck is calculated and a combination with a minimum bottleneck is selected for parallelization.

Type: Grant

Filed: March 1, 2012

Date of Patent: February 3, 2015

Assignee: International Business Machines Corporation

Inventors: Pradeep Varma, Manish Gupta, Monika Gupta, Naga Praveen Kumar Katta
Streaming from a media device

Patent number: 8949820

Abstract: A technique for streaming from a media device involves enabling a local device to function as a streaming server. An example of a method according to the technique includes inserting a removable storage device that includes programs associated with a streaming application, running one or more of the programs, ensuring that a streaming software player is installed, and executing a streaming-related activity associated with the streaming application. An example of a system according to the technique includes a means for providing a streaming application that expects content to be found on a media drive, a means for intercepting requests for content expected to be found on the media drive, and a means for honoring the requests with content from a different media location.

Type: Grant

Filed: November 26, 2012

Date of Patent: February 3, 2015

Assignee: Numecent Holdings, Inc.

Inventors: Jeffrey de Vries, Greg Zavertnik, Ann Hubbell
Optimizing libraries for validating C++ programs using symbolic execution

Patent number: 8943487

Abstract: Particular embodiments optimize a C++ function comprising one or more loops for symbolic execution, comprising for each loop, if there is a branching condition within the loop, then rewrite the loop to move the branching condition outside the loop. Particular embodiments may further optimize the C++ function through simplified symbolic expressions and adding constructs forcing delayed interpretation of symbolic expressions during the symbolic execution.

Type: Grant

Filed: January 20, 2011

Date of Patent: January 27, 2015

Assignee: Fujitsu Limited

Inventors: Guodong Li, Sreeranga P. Rajan, Indradeep Ghosh
Instruction scheduling approach to improve processor performance

Patent number: 8935685

Abstract: A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream.

Type: Grant

Filed: April 28, 2012

Date of Patent: January 13, 2015

Assignee: International Business Machines Corporation

Inventors: Juergen Koehl, Jens Leenstra, Philipp Panitz, Hans Schlenker
Error-code and exception-based function dispatch tables

Patent number: 8935686

Abstract: A condition detected by a virtual routine may be treated by setting an error code or raising an exception, depending on circumstances. Enhanced vtable layouts promote availability of both error-ID-based and exception-based virtual routines, while maintaining compatibility. Compilers treat virtual routines based on their circumstances. One enhanced vtable includes error-ID-based routine pointers in a COM-layout-compatible portion and exception-based routine pointers in an extension. For a virtual routine not overridden by a derived class, a compiler generates a direct call. For an object instance of a specific type, the compiler generates a direct exception-based call for the object's routine. For a factory-sourced object's routine, the compiler generates a virtual exception-based call. When the virtual routine belongs to a component having an enhanced vtable, the compiler may generate a virtual call using the exception-based routine pointer. Code wrappers between COM and native format may also be used.

Type: Grant

Filed: August 28, 2012

Date of Patent: January 13, 2015

Assignee: Microsoft Corporation

Inventors: Deon Brewis, James Springfield, Sridhar S. Madhugiri
System, methods and apparatus for program optimization for multi-threaded processor architectures

Patent number: 8930926

Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

Type: Grant

Filed: April 16, 2010

Date of Patent: January 6, 2015

Assignee: Reservoir Labs, Inc.

Inventors: Cedric Bastoul, Richard A. Lethin, Allen K. Leung, Benoit J. Meister, Peter Szilagyi, Nicolas T. Vasilache, David E. Wohlford
Compiler for X86-based many-core coprocessors

Patent number: 8918770

Abstract: A system and method for compiling includes, for a parallelizable code portion of an application stored on a computer readable storage medium, determining one or more variables that are to be transferred to and/or from a coprocessor if the parallelizable code portion were to be offloaded. A start location and an end location are determined for at least one of the one or more variables as a size in memory. The parallelizable code portion is transformed by inserting an offload construct around the parallelizable code portion and passing the one or more variables and the size as arguments of the offload construct such that the parallelizable code portion is offloaded to a coprocessor at runtime.

Type: Grant

Filed: August 24, 2012

Date of Patent: December 23, 2014

Assignee: NEC Laboratories America, Inc.

Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
Methodology for fast detection of false sharing in threaded scientific codes

Patent number: 8898648

Abstract: A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.

Type: Grant

Filed: November 30, 2012

Date of Patent: November 25, 2014

Assignee: International Business Machines Corporation

Inventors: I-Hsin Chung, Guojing Cong, Hiroki Murata, Yasushi Negishi, Hui-Fang Wen

1 2 3 4 5 … next