Including Scheduling Instructions Patents (Class 717/161)
-
Patent number: 12254348Abstract: An information processing apparatus includes: an obtainer that obtains sensing data; a common neutral network (NN) that inputs the sensing data into an inference model to obtain a result of inference and information on a processing time for a plurality of tasks subsequent to the processing performed by the inference model; and an NN inference computation management unit that determines a task schedule for a task processing unit that processes the plurality of subsequent tasks to process the plurality of subsequent tasks on the basis of the information on the processing time for the plurality of subsequent tasks and inputs the result of the inference into the task processing unit to process the plurality of subsequent tasks according to the determined task schedule.Type: GrantFiled: December 29, 2022Date of Patent: March 18, 2025Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.Inventors: Masaki Takahashi, Yohei Nakata, Yasunori Ishii, Tomoyuki Okuno
-
Patent number: 12182554Abstract: Solutions for improving parallelization of computer programs interleave machine instruction placement in memory. A compiler decomposes a software loop in stages to interleave instructions such that, for contiguous sets of instructions having some minimum length (e.g., each set has at least two to four instructions), instructions within a set have no dependency on prior instructions within that set. This enables the compiled program to be more fully parallelized—for example, either by a superscalar processor executing the compiled program, or by the compiler turning each set of instructions into a very long instruction word (VLIW)—to automatically benefit from the disclosed interleaving of instructions that eliminates dependencies.Type: GrantFiled: September 9, 2022Date of Patent: December 31, 2024Assignee: Microsoft Technology Licensing, LLC.Inventor: Roman Snytsar
-
Patent number: 11893370Abstract: According to one aspect, a method for compiling by a compilation tool a source code into a computer-executable code comprises receiving the source code as input of the compilation tool, translating the source code into an object code comprising machine instructions executable by a processor, then introducing, between machine instructions of the object code, additional instructions selected from illegal instructions and no-operation instructions so as to obtain the executable code, then delivering the executable code as output of the compilation tool.Type: GrantFiled: October 19, 2021Date of Patent: February 6, 2024Assignee: STMicroelectronics (Grand Ouest) SASInventors: Michel Jaouen, Stephane Le Roy, Moise Gergaud
-
Patent number: 11875197Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.Type: GrantFiled: December 29, 2020Date of Patent: January 16, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
-
Patent number: 11775299Abstract: Disclosed are methods, systems, and other techniques for modeling concurrency between a set of nodes to be executed on a set of execution engines of an integrated circuit device. A computation graph that includes the set of nodes is received. A set of edges connecting the set of nodes are determined based on the computation graph. An edge type for each of the set of edges is determined based on the computation graph, the edge type indicating a type of synchronization between connected nodes. A vector clock is generated for each of the set of nodes, the vector clock for a particular node being calculated based on the vector clock for each connected preceding node and the edge type for the one of the set of edges that connects each connected preceding node and the particular node.Type: GrantFiled: March 29, 2021Date of Patent: October 3, 2023Assignee: Amazon Technologies, Inc.Inventor: Drazen Borkovic
-
Patent number: 11556517Abstract: An example operation includes one or more of solving, by a scheduler node, integer programming problem of maximizing a sum of organizations' endorsing peers that run chaincodes from a plurality of chaincodes within a consortium, making, by the scheduler node, endorsement policies (EPs) for the chaincodes from the plurality of the chaincodes to be satisfiable at any time, applying administrator's constraint of available endorsing peers to the maximized sum of organizations' endorsing peers, and adding resulting endorsing peers to a maintenance list.Type: GrantFiled: May 17, 2020Date of Patent: January 17, 2023Assignee: International Business Machines CorporationInventors: Nir Rozenbaum, Artem Barger, Yacov Manevich
-
Patent number: 11550554Abstract: A computer device is provided that includes a processor configured to receive a source code for a program including at least two code files, and process the source code for the program to generate a machine-level code file for each of the at least two code files of the source code. The processor is further configured to generate control flow graph data for each machine-level code file generated for the at least two code files of the source code, generate a machine-level intermediate representation for each machine-level code file using a machine-level code file and the generated control flow graph data for that machine-level code file, merge the machine-level intermediate representations into a merged machine-level intermediate representation, and perform machine-level optimizations on the merged machine-level intermediate representation and output an optimized merged machine-level intermediate representation.Type: GrantFiled: January 7, 2021Date of Patent: January 10, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventor: Xiang Li
-
Patent number: 11483256Abstract: Systems and methods are disclosed for reducing latency and power consumption of on-chip movement through an approximate communication framework for network-on-chips (“NoCs”). The technology leverages the fact that big data applications (e.g., recognition, mining, and synthesis) can tolerate modest error and transfers data with the necessary accuracy, thereby improving the energy-efficiency and performance of multi-core processors.Type: GrantFiled: May 4, 2021Date of Patent: October 25, 2022Assignee: The George Washington UniversityInventors: Ahmed Louri, Yuechen Chen
-
Patent number: 11379943Abstract: To optimize the compilation of shaders for execution within an application, a computer system discovers the context in which the shaders are executed. The application is compiled and executed on a target platform. Snapshots of the application during execution are captured. A snapshot includes data and commands passed between the central processing unit and the graphics processing unit of the target platform to generate a single frame of graphics data. The shaders used in these snapshots are identified. These shaders are compiled with a number of different permutations of available compiler options, resulting in sets of differently compiled shaders. The snapshot is re-executed with the sets of differently compiled shaders, and performance is measured. The set of compiler options that results in compiled shaders providing better performance can be used as the set of compilation parameters for the set of shaders for this application.Type: GrantFiled: January 31, 2019Date of Patent: July 5, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Ivan Nevraev, Cole Brooking, J. Andrew Goossen, Eric Christoffersen, Jason Strayer
-
Patent number: 11269711Abstract: Failure impact analysis (or “impact analysis”) is a process that involves identifying effects of a network event that are may or will results from the network event. In one example, this disclosure describes a method that includes generating, by a control system managing a resource group, a resource graph that models resource and event dependencies between a plurality of resources within the resource group; detecting, by the control system, a first event affecting a first resource of the plurality of resources, wherein the first event is a network event; and identifying, by the control system and based on the dependencies modeled by the resource graph, a second resource that is expected to be affected by the first event.Type: GrantFiled: July 14, 2020Date of Patent: March 8, 2022Assignee: Juniper Networks, Inc.Inventors: Jayanthi R, Javier Antich, Chandrasekhar A
-
Patent number: 11265351Abstract: A management system manages a plurality of information handling systems by creating custom policies for each information handling system based on information gathered from or about each information handling system indicating, e.g., the user's intent, use, request for usage, security posture, productivity needs, and/or behavior. The management system creates custom policies to avoid unnecessarily impacting a user's productivity.Type: GrantFiled: January 24, 2019Date of Patent: March 1, 2022Assignee: Dell Products L.P.Inventors: Carlton A. Andrews, Girish S. Dhoble, Joseph Kozlowski
-
Patent number: 11262989Abstract: A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.Type: GrantFiled: October 24, 2019Date of Patent: March 1, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Abhilash Bhandari, Venugopal Raghavan, Mohammad Asghar Ahmad Shahid, Anupama Rajesh Rasale
-
Patent number: 11256513Abstract: There is provided an apparatus that includes input circuitry to receive input data and output circuitry to output a sequence of instructions to be executed by data processing circuitry. Generation circuitry performs a generation process to generate the sequence of instructions using the input data. The sequence of instructions comprises an indirect control flow instruction having a field that indicates where a target of the indirect control flow instruction is stored. The generation process causes at least one of the instructions in the sequence of instructions to store a state of control flow speculation after execution of the indirect control flow instruction. The at least one of the instructions in the sequence of instructions that stores the state of control flow speculation is inhibited from being subject to data value speculation by the data processing circuitry.Type: GrantFiled: March 14, 2019Date of Patent: February 22, 2022Assignee: Arm LimitedInventors: Richard William Earnshaw, Kristof Evariste Georges Beyls, James Greenhalgh
-
Patent number: 11200038Abstract: Techniques for an ultra-fact software compilation of source code are provided. A compiler receives software code and may divide it into code sections. A map of ordered nodes may be generated, such that each node in the map may include a code section and the order of the nodes indicates an execution order of the software code. Each code section may be compiled into an executable object in parallel and independently from other code sections. A binary executable may be generated by linking executable objects generated from the code sections. The methodology significantly differs from existing source code compilation techniques because conventional compilers build executable sequentially, whereas the embodiments divide the source code into multiple smaller code sections and compile them individually and in parallel. Compiling multiple code sections improves the compilations in order of magnitude from conventional techniques.Type: GrantFiled: June 25, 2020Date of Patent: December 14, 2021Assignee: PayPal, Inc.Inventor: Abraham Richard Hoffman
-
Patent number: 11093298Abstract: A method for job management in an HPC environment includes determining an unallocated subset from a plurality of HPC nodes, with each of the unallocated HPC nodes comprising an integrated fabric. An HPC job is selected from a job queue and executed using at least a portion of the unallocated subset of nodes.Type: GrantFiled: March 5, 2020Date of Patent: August 17, 2021Assignee: Raytheon CompanyInventors: Shannon V. Davidson, Anthony N. Richoux
-
Patent number: 11010302Abstract: A mechanism is described for facilitating general purpose input/output data capture and neutral cache system for autonomous machines. A method of embodiments, as described herein, includes capturing, by an image capturing device, one or more images of one or more objects, where the one or more images represent input data associated with a neural network. The method may further include determining accuracy of first output results generated by a default neural caching system by comparing the first output results with second output results predicted by a custom neural caching system. The method may further include outputting, based on the accuracy, a final output results including at least one of the first output results or the second output results.Type: GrantFiled: October 5, 2016Date of Patent: May 18, 2021Assignee: INTEL CORPORATIONInventors: Liwei Ma, Jiqiang Song
-
Patent number: 10901715Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for lazy compilation and kernel fusion in dynamic computation graphs. One of the operations is performed by generating an input graph based on translation of user code into an expression graph. The expression graph represents control flow dependencies of operations of the generated input graph. Optimization of the input graph is then performed by iterative application of optimization rules to the input graph. An optimized version of the input graph results from the application of the optimization rules. A transformation graph then is generated by comparing changes made from the original input graph to the final optimized version of the input graph. The transformation graph provides a blueprint such that the system may recreate the optimization of a similarly structured later generated input graph without having to reapply the optimization rules.Type: GrantFiled: September 26, 2019Date of Patent: January 26, 2021Inventor: Jonathan Raiman
-
Patent number: 10890958Abstract: The present disclosure provides a centralized power meter for a signal processing circuit, comprising: M sample buffers, each configured to buffer samples respectively from at least one of N sources, and trigger a request for power calculation of the buffered samples in response to the buffered samples, the request having a corresponding priority; a switch, configured to route the requests from the M sample buffers to one or more power calculation cores; the one or more power calculation cores, each configured to retrieve the samples from the sample buffer in an order of their corresponding priorities, in response to the routed requests, and to perform power calculation of the retrieved samples, wherein N and M are integers no less than 1, and N is no less than M. The present disclosure further provides a centralized power calculation method.Type: GrantFiled: September 9, 2015Date of Patent: January 12, 2021Assignee: Telefonaktiebolaget LM Ericsson (publ)Inventors: Gan Wen, Ge Huang, Kaifeng Zhang
-
Patent number: 10884744Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.Type: GrantFiled: August 18, 2017Date of Patent: January 5, 2021Assignee: Intel CorporationInventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
-
Patent number: 10606635Abstract: Provided is an accelerator control apparatus including a data management table storing a name assigned to data and an identifier for an accelerator that stores the data on a local memory by associating the name and the identifier; a data management unit that is configured to determine, when receiving a first process that accepts data assigned with the name as input data, the accelerator that stores the data on the local memory, by referring to the data management table; and a task processing unit that is configured to control the accelerator being determined by data management unit to execute the first process.Type: GrantFiled: May 10, 2016Date of Patent: March 31, 2020Assignee: NEC CORPORATIONInventors: Jun Suzuki, Masaki Kan, Yuki Hayashi
-
Patent number: 10402176Abstract: Methods, apparatus, systems and articles of manufacture to compiler compile code to generate dataflow code are described. An example compiler apparatus includes an intermediate representation transformer to transform input software code to intermediate representation code; an instruction selector to insert machine instructions of a target execution platform in the intermediate representation code to generate machine intermediate representation code; and a target machine transformer to: convert a portion of the machine intermediate representation code to dataflow code to generate dataflow intermediate representation code; and allocate registers within the dataflow intermediate representation code.Type: GrantFiled: December 27, 2017Date of Patent: September 3, 2019Assignee: Intel CorporationInventors: Kent Glossop, Kermin Fleming, Yongzhi Zhang, Simon Steely, Jr., Jim Sukha, Uma Srinivasan
-
Patent number: 10379885Abstract: A method and system for enhanced local communing optimization of compilation of a program. Within a first pass of a two pass approach, a determination is made as to where in the program to evaluate volatile expressions that can be commoned. In a second pass of the two pass approach, all remaining expressions that are not volatile expressions are commoned.Type: GrantFiled: November 16, 2017Date of Patent: August 13, 2019Assignee: International Business Machines CorporationInventors: Andrew J. Craik, Patrick R. Doyle, Vijay Sundaresan
-
Patent number: 10360652Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.Type: GrantFiled: June 13, 2014Date of Patent: July 23, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
-
Patent number: 10102908Abstract: A microcomputer comprising a microprocessor unit and a first memory unit is disclosed. In one aspect, the microprocessor unit comprises at least one functional unit and at least one register. Further, the at least one register is a wide register comprising a plurality of second memory units which are capable to each contain one word, the wide register being adapted so that the second memory units are simultaneously accessible by the first memory unit, and at least part of the second memory units are separately accessible by the at least one functional unit. Further, the first memory unit is an embedded non-volatile memory unit.Type: GrantFiled: February 15, 2018Date of Patent: October 16, 2018Assignee: IMECInventors: Francky Catthoor, Komalan Manu Perumkunnil, Stefan Cosemans
-
Patent number: 9513976Abstract: A computer-implemented method and a corresponding computer system for emulation of Extended Memory Semantics (EMS) operations. The method and system include obtaining a set of computer instructions that include an EMS operation, converting the EMS operation into a corresponding atomic memory operation (AMO), and executing the AMO on at least one processor of a computer.Type: GrantFiled: December 30, 2011Date of Patent: December 6, 2016Assignee: Intel CorporationInventor: Roger Golliver
-
Patent number: 9495225Abstract: A thread priority control mechanism is provided which uses the completion event of the preceding transaction to raise the priority of the next transaction in the order of execution when the transaction status has been changed from speculative to non-speculative. In one aspect of the present invention, a thread-level speculation mechanism is provided which has content-addressable memory, an address register and a comparator for recording transaction footprints, and a control logic circuit for supporting memory synchronization instructions. This supports hardware transaction memory in detecting transaction conflicts. This thread-level speculation mechanism includes a priority up bit for recording an attribute operand in a memory synchronization instruction, a means for generating a priority up event when a thread wake-up event has occurred and the priority up bit is 1, and a means for preventing the CAM from storing the load/store address when the instruction is a non-transaction instruction.Type: GrantFiled: October 24, 2013Date of Patent: November 15, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Christian Jacobi, Marcel Mitran, Moriyoshi Ohara
-
Patent number: 9436582Abstract: Embodiments include dividing source code for an application into multiple program fragments by generating a control flow graph for the multiple program fragments. The control flow graph represents a graph structure with nodes representing the multiple program fragments and edges representing an execution order of the program fragments. Aspects include searching for a chosen assertion statement within a program fragment, wherein the chosen assertion statement must be satisfied for correct execution of the chosen program fragment. Aspects also include identifying an immediate parent program fragment for the chosen program fragment using the control flow graph and calculating an immediate parent assertion statement for the immediate parent program fragment using the chosen assertion logic statement. The immediate parent assertion statement is an over-approximate pre-condition of the chosen program fragment.Type: GrantFiled: November 18, 2015Date of Patent: September 6, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Viresh Paruthi, Mitra Purandare
-
Patent number: 9405596Abstract: Code versioning for enabling transactional memory region promotion may include receiving a portion of candidate source code; outlining the portion of candidate source code received for parallel execution; wrapping a critical region with entry and exit routines to enter into a speculation sub-process, wherein the entry and exit routines also gather conflict statistics at run time; and generating an outlined code portion comprising multiple loop versions using a processor.Type: GrantFiled: October 2, 2014Date of Patent: August 2, 2016Assignee: GlobalFoundries, Inc.Inventors: Hans Boettiger, Yaoqing Gao, Martin Ohmacht, Kai-Ting Amy Wang
-
Patent number: 9367658Abstract: Embodiments of the invention provide a method and apparatus for generating programmable logic for a hardware accelerator, the method comprising: generating a graph of nodes representing the programmable logic to be implemented in hardware; identifying nodes within the graph that affect external flow control of the programmable logic; retaining the identified nodes and removing or replacing all nodes which do not affect external flow control of the programmable logic in a modified graph; and simulating the modified graph or building a corresponding circuit of the retained nodes.Type: GrantFiled: June 22, 2011Date of Patent: June 14, 2016Assignee: Maxeler Technologies Ltd.Inventors: Oliver Pell, James Huggett
-
Patent number: 9355175Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing search results. In one aspect, a method includes receiving a query. A plurality of search results responsive to the query are identified. The search results are analyzed to determine that at least a first search result is associated with a first answer box topic. The search results are provided along with an answer box precursor for the first answer box topic.Type: GrantFiled: May 27, 2011Date of Patent: May 31, 2016Assignee: Google Inc.Inventors: Tal Cohen, Ziv Bar-Yossef, Igor Tsvetkov, Adi Mano, Oren Naim, Nitsan Oz, Nir Andelman, Pravir K. Gupta
-
Patent number: 9348600Abstract: A method and a system are provided for prioritizing the fetching of instructions for each of a plurality of executing instruction threads in a multi-threaded processor. Instructions come from at least one source of instructions. Each thread has a number of threads buffered for execution in an instruction buffer. A first metric for each thread is determined based on the number of instructions currently buffered. A second metric is then determined for each thread, this being an execution based metric. A priority order for the threads is determined from the first and second metrics, and an instruction is fetched from the source for the thread with the highest determined priority which is requesting an instruction.Type: GrantFiled: February 9, 2009Date of Patent: May 24, 2016Assignee: Imagination Technologies LimitedInventor: Andrew Webber
-
Patent number: 9317265Abstract: Disclosed here are methods, systems, paradigms and structures for optimizing intermediate representation (IR) of a script code for atomic execution. Atomic execution of the script is achieved by generating portions of the IR as an atomic transaction. In an atomic transaction, a series of operations either all execute, or none executes. The IR includes checkpoints that evaluate to one of two possible values. The checkpoint evaluates to a first value when there is no error during execution, and evaluates to a second value when an error occurs. The IR is optimized for atomic execution by regenerating a portion of the IR including the checkpoint and code associated with the checkpoint as a transaction. When an error occurs during the execution of the transaction, the transaction is aborted and a state of execution of the script code is reverted to a state prior to the beginning of the transaction.Type: GrantFiled: March 25, 2013Date of Patent: April 19, 2016Assignee: Facebook, Inc.Inventors: Ali-Reza Adl-Tabatabai, Guilherme de Lima Ottoni, Michael Paleczny
-
Patent number: 9292287Abstract: Provided is a loop scheduling method including scheduling a first loop using execution units, and scheduling a second loop using execution units available as a result of the scheduling of the first loop. An n-th loop (n>2) may be scheduled using a result of scheduling an (n?1)-th loop, similar to the (n?1)-th loop. The first loop may be a higher priority loop than the second loop.Type: GrantFiled: July 14, 2014Date of Patent: March 22, 2016Assignee: Samsung Electronics Co., Ltd.Inventors: Yeon Bok Lee, Young Hwan Park, Ho Yang, Keshava Prasad
-
Patent number: 9239712Abstract: Apparatuses and methods may provide for determining a level of performance for processing one or more loops by a dynamic compiler and executing code optimizations to generate a pipelined schedule for the one or more loops that achieves the determined level of performance within a prescribed time period. In one example, a dependence graph may be established for the one or more loops, and each dependence graph may be partitioned into stages based on the level of performance.Type: GrantFiled: March 29, 2013Date of Patent: January 19, 2016Assignee: Intel CorporationInventors: Hongbo Rong, Hyunchul Park, Youfeng Wu
-
Patent number: 9158658Abstract: A method, system, and computer program product for detecting merge conflicts and compilation errors in a collaborative integrated development environment are provided in the illustrative embodiments. Prior to at least one user committing a set of uncommitted changes associated with a source code to a repository, the computer receives the set of uncommitted changes associated with the source code. The computer creates at least one temporary branch corresponding to the set of uncommitted changes associated with the source code. The computer device merges the at least one temporary branch to corresponding portions of the source code. The computer determines whether a merge conflict has occurred. If the merge conflict occurred, the computer communicates a first notification to the at least one user, the first notification indicating the merge conflict.Type: GrantFiled: October 15, 2013Date of Patent: October 13, 2015Assignee: International Business Machines CorporationInventors: George T. Bigwood, Jason T. McMann, Michael G. Nikitaides, Kaleb D. Walton
-
Patent number: 9152422Abstract: An apparatus and method for compressing trace data is provided. The apparatus includes a detection unit configured to detect trace data corresponding to one or more function units performing a substantially significant operation in a reconfigurable processor as valid trace data, and a compression unit configured to compress the valid trace data.Type: GrantFiled: July 7, 2011Date of Patent: October 6, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Jae-Young Kim, Dong-Hoon Yoo, Yeon-Gon Cho, Hee-Jun Shim, Chang-Moo Kim
-
Patent number: 9063735Abstract: Provided are a reconfigurable processor, which is capable of reducing the probability of an incorrect computation by analyzing the dependence between memory access instructions and allocating the memory access instructions between a plurality of processing elements (PEs) based on the results of the analysis, and a method of controlling the reconfigurable processor. The reconfigurable processor extracts an execution trace from simulation results, and analyzes the memory dependence between instructions included in different iterations based on parts of the execution trace of memory access instructions.Type: GrantFiled: October 13, 2011Date of Patent: June 23, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Hee-Jin Ahn, Dong-Hoon Yoo, Bernhard Egger, Min-Wook Ahn, Jin-Seok Lee, Tai-Song Jin, Won-Sub Kim
-
Patent number: 9043582Abstract: Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code.Type: GrantFiled: September 14, 2012Date of Patent: May 26, 2015Assignee: Qualcomm Innovation Center, Inc.Inventor: Sergei Larin
-
Publication number: 20150100950Abstract: A method for scheduling loop processing of a reconfigurable processor includes generating a dependence graph of instructions for the loop processing; mapping a first register file of the reconfigurable processor on an arrow indicating inter-iteration dependence on the dependence graph; and searching for schedules of the instructions based on the mapping result.Type: ApplicationFiled: October 6, 2014Publication date: April 9, 2015Inventors: Min-wook AHN, Won-sub Kim, Tai-song Jin, Seung-won Lee, Jin-seok Lee
-
Patent number: 8990547Abstract: Systems, methodologies, computer-readable media, and other embodiments associated with ordering instructions are described. One exemplary system embodiment can include an analysis logic configured to analyze executable instructions from an executable program. A re-write logic can be configured to re-order selected load instructions within the executable program based on latency times for the selected load instructions.Type: GrantFiled: August 23, 2005Date of Patent: March 24, 2015Assignee: Hewlett-Packard Development Company, L.P.Inventors: James R. Callister, Richard E. Hank, Teresa L. Johnson
-
Patent number: 8984525Abstract: A method for job management in an HPC environment includes determining an unallocated subset from a plurality of HPC nodes, with each of the unallocated HPC nodes comprising an integrated fabric. An HPC job is selected from a job queue and executed using at least a portion of the unallocated subset of nodes.Type: GrantFiled: October 11, 2013Date of Patent: March 17, 2015Assignee: Raytheon CompanyInventors: Shannon V. Davidson, Anthony N. Richoux
-
Patent number: 8972961Abstract: A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream.Type: GrantFiled: May 11, 2011Date of Patent: March 3, 2015Assignee: International Business Machines CorporationInventors: Juergen Koehl, Jens Leenstra, Philipp Panitz, Hans Schlenker
-
Patent number: 8959497Abstract: One embodiment of the present invention sets forth a technique for partitioning a predecessor thread program into sub-programs and dynamically spawning a thread grid of the sub-programs based on the outcome of a conditional statement in the predecessor thread program. The programming instructions for the predecessor thread program are analyzed to assess the benefit of partitioning the thread program at a conditional statement into sub-programs. If the predecessor thread program is partitioned, then each branch of the conditional statement may be used to form a separate sub-program. Predicate tables are populated at the predecessor thread program run-time to establish which possible instances of the thread sub-programs should be spawned in subsequent execution phases.Type: GrantFiled: August 29, 2008Date of Patent: February 17, 2015Assignee: NVIDIA CorporationInventors: John A. Stratton, David Luebke
-
Patent number: 8954946Abstract: A static branch prediction method and code execution method for a pipeline processor, and a code compiling method for static branch prediction, are provided herein. The static branch prediction method includes predicting a conditional branch code as taken or not-taken, adding the prediction information, converting the conditional branch code into a jump target address setting (JTS) code including target address information, branch time information, and a test code, and scheduling codes in a block. The code may be scheduled into a last slot of the block, and the JTS code may be scheduled into an empty slot after all the other codes in the block are scheduled. When the conditional branch code is predicted as taken in the prediction operation, a target address indicated by the target address information may be fetched at a cycle time indicated by the branch time information.Type: GrantFiled: January 25, 2010Date of Patent: February 10, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Tai-song Jin, Dong-kwan Suh, Suk-jin Kim
-
Patent number: 8949820Abstract: A technique for streaming from a media device involves enabling a local device to function as a streaming server. An example of a method according to the technique includes inserting a removable storage device that includes programs associated with a streaming application, running one or more of the programs, ensuring that a streaming software player is installed, and executing a streaming-related activity associated with the streaming application. An example of a system according to the technique includes a means for providing a streaming application that expects content to be found on a media drive, a means for intercepting requests for content expected to be found on the media drive, and a means for honoring the requests with content from a different media location.Type: GrantFiled: November 26, 2012Date of Patent: February 3, 2015Assignee: Numecent Holdings, Inc.Inventors: Jeffrey de Vries, Greg Zavertnik, Ann Hubbell
-
Patent number: 8949809Abstract: A system and associated method for automatically pipeline parallelizing a nested loop in sequential code over a predefined number of threads. Pursuant to task dependencies of the nested loop, each subloop of the nested loop are allocated to a respective thread. Combinations of stage partitions executing the nested loop are configured for parallel execution of a subloop where permitted. For each combination of stage partitions, a respective bottleneck is calculated and a combination with a minimum bottleneck is selected for parallelization.Type: GrantFiled: March 1, 2012Date of Patent: February 3, 2015Assignee: International Business Machines CorporationInventors: Pradeep Varma, Manish Gupta, Monika Gupta, Naga Praveen Kumar Katta
-
Patent number: 8943487Abstract: Particular embodiments optimize a C++ function comprising one or more loops for symbolic execution, comprising for each loop, if there is a branching condition within the loop, then rewrite the loop to move the branching condition outside the loop. Particular embodiments may further optimize the C++ function through simplified symbolic expressions and adding constructs forcing delayed interpretation of symbolic expressions during the symbolic execution.Type: GrantFiled: January 20, 2011Date of Patent: January 27, 2015Assignee: Fujitsu LimitedInventors: Guodong Li, Sreeranga P. Rajan, Indradeep Ghosh
-
Patent number: 8935685Abstract: A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream.Type: GrantFiled: April 28, 2012Date of Patent: January 13, 2015Assignee: International Business Machines CorporationInventors: Juergen Koehl, Jens Leenstra, Philipp Panitz, Hans Schlenker
-
Patent number: 8935686Abstract: A condition detected by a virtual routine may be treated by setting an error code or raising an exception, depending on circumstances. Enhanced vtable layouts promote availability of both error-ID-based and exception-based virtual routines, while maintaining compatibility. Compilers treat virtual routines based on their circumstances. One enhanced vtable includes error-ID-based routine pointers in a COM-layout-compatible portion and exception-based routine pointers in an extension. For a virtual routine not overridden by a derived class, a compiler generates a direct call. For an object instance of a specific type, the compiler generates a direct exception-based call for the object's routine. For a factory-sourced object's routine, the compiler generates a virtual exception-based call. When the virtual routine belongs to a component having an enhanced vtable, the compiler may generate a virtual call using the exception-based routine pointer. Code wrappers between COM and native format may also be used.Type: GrantFiled: August 28, 2012Date of Patent: January 13, 2015Assignee: Microsoft CorporationInventors: Deon Brewis, James Springfield, Sridhar S. Madhugiri
-
Patent number: 8930926Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.Type: GrantFiled: April 16, 2010Date of Patent: January 6, 2015Assignee: Reservoir Labs, Inc.Inventors: Cedric Bastoul, Richard A. Lethin, Allen K. Leung, Benoit J. Meister, Peter Szilagyi, Nicolas T. Vasilache, David E. Wohlford