Branching (e.g., Delayed Branch, Loop Control, Branch Predict, Interrupt) Patents (Class 712/233)
  • Patent number: 9824214
    Abstract: This invention teaches a system and methods of detecting software vulnerabilities in a computer program by analyzing the compiled code and optionally the source code of the computer program. The invention models compiled software to examine both control flow and data flow properties of the target program. A comprehensive instruction model is used for each instruction of the compiled code, and is complemented by a control flow graph that includes all potential control flow paths of the instruction. A data flow model is used to record the flow of unsafe data during the execution of the program. The system analyzes the data flow model and creates a security finding corresponding to each instruction that calls an unsafe function on unsafe data. The security findings are aggregated in a security report. The system further uses precomputation to improve performance by caching 1-to-many data flow mapping for each basic block in the code.
    Type: Grant
    Filed: February 3, 2016
    Date of Patent: November 21, 2017
    Assignee: SECURISEA, INC.
    Inventor: Joshua M. Daymont
  • Patent number: 9715593
    Abstract: This invention discloses a system and methods of detecting software vulnerabilities in a computer program. The invention models compiled software to examine both control flow and data flow properties of the target program. A comprehensive instruction model is used for each instruction of the compiled code, and is complemented by a control flow graph that includes all potential control flow paths of the instruction. A data flow model is used to record the flow of unsafe data during the execution of the program. The system analyzes the data flow model and creates a security finding corresponding to each instruction that calls an unsafe function on unsafe data. These security findings are aggregated in a security report along with the corresponding debug information, remediation recommendations and any ancillary information related to each instruction that triggered the security finding.
    Type: Grant
    Filed: August 30, 2016
    Date of Patent: July 25, 2017
    Assignee: SECURISEA, INC.
    Inventor: Joshua M. Daymont
  • Patent number: 9678866
    Abstract: A transactional memory (TM) includes a control circuit pipeline and an associated memory unit. The memory unit stores a plurality of rings. The pipeline maintains, for each ring, a head pointer and a tail pointer. A ring operation stage of the pipeline maintains the pointers as values are put onto and are taken off the rings. A put command causes the TM to put a value into a ring, provided the ring is not full. A get command causes the TM to take a value off a ring, provided the ring is not empty. A put with low priority command causes the TM to put a value into a ring, provided the ring has at least a predetermined amount of free buffer space. A get from a set of rings command causes the TM to get a value from the highest priority non-empty ring (of a specified set of rings).
    Type: Grant
    Filed: May 29, 2015
    Date of Patent: June 13, 2017
    Assignee: Netronome Systems, Inc.
    Inventor: Gavin J. Stark
  • Patent number: 9659168
    Abstract: A method of generating identification data for identifying software is disclosed. The method includes executing said software so as to alter one or more addresses of a memory stack reserved in memory for execution of the software. Identification data is then generated for identifying the software based on the one or more altered addresses of the memory stack.
    Type: Grant
    Filed: October 25, 2013
    Date of Patent: May 23, 2017
    Assignee: NXP B.V.
    Inventor: Arnaud Collard
  • Patent number: 9639361
    Abstract: A trace unit for generating items of trace data indicative of processing activities of a processor executing a stream of instructions, the unit includes trace circuitry for monitoring a behavior of the processor; storage circuitry for storing current trace control data for controlling the trace circuitry; a data store for storing at least some of the trace control data; the trace circuitry being configured to store the trace control data in the data store in response to detection of execution of the group of instructions, wherein the trace circuitry is responsive to detecting the at least one processor cancelling at least one group of the speculatively executed instructions to retrieve at least some of the trace control data stored in the data store for the group of instructions executed before the cancelled speculatively executed instructions and to store the retrieved trace control data in the storage circuitry.
    Type: Grant
    Filed: March 12, 2014
    Date of Patent: May 2, 2017
    Assignee: ARM LIMITED
    Inventors: Paul Anthony Gilkerson, John Michael Horley
  • Patent number: 9632780
    Abstract: A system serialization capability is provided to facilitate processing in those environments that allow multiple processors to update the same resources. The system serialization capability is used to facilitate processing in a multi-processing environment in which guests and hosts use locks to provide serialization. The system serialization capability includes a diagnose instruction which is issued after the host acquires a lock, eliminating the need for the guest to acquire the lock.
    Type: Grant
    Filed: December 3, 2013
    Date of Patent: April 25, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Lisa C. Heller
  • Patent number: 9612881
    Abstract: Apparatuses, methods, and systems are configured to perform unambiguous parameter sampling in a heterogeneous multi-core or multi-threaded environment by masking one or more thread requests; and, in response to bus activity ceasing for the one or more masked thread requests and completing any routine being processed for the one or more masked threads, processing a command by executing at least one of a command routine or a command thread, wherein the command routine or the command thread reads the parameter using thread atomicity with deterministic synchronization. One or more thread requests may be selected for masking by monitoring thread activity for each of a plurality of threads.
    Type: Grant
    Filed: March 30, 2015
    Date of Patent: April 4, 2017
    Assignee: NXP USA, Inc.
    Inventor: Graham Edmiston
  • Patent number: 9606850
    Abstract: A data processing apparatus comprises processing circuitry for executing a stream of instructions, and exception handling circuitry for selecting, from one or more exceptions, an exception to be handled by the processing circuitry. The unselected exceptions are referred to as pending exceptions. The data processing apparatus further comprises trace generating circuitry that generates trace data packets in dependence on activity of the processing circuitry. The trace generating circuitry detects pending exceptions and, if an exception is detected to be pending, includes an indication of the pending exception in at least one trace data packet. By tracking when a particular exception is pended, rather than when it is selected for handling by the processing circuitry, it is possible to more precisely determine when the exception occurred, as opposed to when it is finally handled.
    Type: Grant
    Filed: March 12, 2013
    Date of Patent: March 28, 2017
    Assignee: ARM Limited
    Inventors: John Michael Horley, Simon John Craske
  • Patent number: 9563430
    Abstract: Embodiments relate to multithreaded branch prediction. An aspect includes a system for dynamically evaluating how to share entries of a multithreaded branch prediction structure. The system includes a first-level branch target buffer coupled to a processor circuit. The processor circuit is configured to perform a method. The method includes receiving a search request to locate branch prediction information associated with the search request, and searching for an entry corresponding to the search request in the first-level branch prediction structure. The entry is not allowed based on a thread state of the entry indicating that the entry has caused a problem on a thread associated with the thread state.
    Type: Grant
    Filed: March 19, 2014
    Date of Patent: February 7, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James J. Bonanno, Daniel Lipetz, Brian R. Prasky, Anthony Saporito
  • Patent number: 9495237
    Abstract: Corruption of call stacks is detected by using guard words placed in the call stacks. A called function executing on a processor of a computing environment checks a guard word in a stack frame of a calling function. The checking determines whether the guard word has an expected value. Based on determining the guard word has an unexpected value, an indication of corruption of the stack frame is provided.
    Type: Grant
    Filed: January 6, 2016
    Date of Patent: November 15, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Ronald I. McIntosh
  • Patent number: 9483391
    Abstract: According to one general aspect, a method may include monitoring the execution of at least a portion of a software application. The method may also include collecting subroutine call information regarding a plurality of subroutine calls included by the portion of the software application, wherein one or more of the subroutine calls is selected for detailed data recording. The method may further include pruning, as the software application is being executed, a subroutine call tree to include only the subroutine calls selected for detailed data recording and one or more parent subroutine calls of each subroutine calls selected for detailed data recording.
    Type: Grant
    Filed: January 21, 2016
    Date of Patent: November 1, 2016
    Assignee: Identify Software Ltd.
    Inventors: Eyal Koren, Asaf Dafner, Shiri Semo Judelman
  • Patent number: 9477478
    Abstract: The disclosure relates to predicting simple and polymorphic branch instructions. An embodiment of the disclosure detects that a program instruction is a branch instruction, determines whether a program counter for the branch instruction is stored in a program counter filter, and, if the program counter is stored in the program counter filter, prevents the program counter from being stored in a first level predictor.
    Type: Grant
    Filed: May 16, 2012
    Date of Patent: October 25, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Kulin N. Kothari, Michael William Morrow, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel, Daren Eugene Streett
  • Patent number: 9465746
    Abstract: Gathering diagnostics during a transactional execution in a transactional memory environment, a transactional memory environment for performing transactional executions is provided. Included is identifying a first indicator, by a computer system, signaling a beginning instruction of a transaction comprising a plurality of instructions; generating, by the computer system, a computed digest based on the execution of at least one of the plurality of instructions; accumulating, by the computer system, a diagnostic data of the transaction based on the execution of the plurality of instructions; identifying, by the computer system, a second indicator associated with the plurality of instructions signaling an ending instruction of the transaction comprising the plurality of instructions; and based on an abort of the transaction, not saving the memory store data of the transaction to memory.
    Type: Grant
    Filed: January 24, 2014
    Date of Patent: October 11, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Valentina Salapura
  • Patent number: 9460020
    Abstract: Gathering diagnostics during a transactional execution in a transactional memory environment, a transactional memory environment for performing transactional executions is provided. Included is identifying a first indicator, by a computer system, signaling a beginning instruction of a transaction comprising a plurality of instructions; generating, by the computer system, a computed digest based on the execution of at least one of the plurality of instructions; accumulating, by the computer system, a diagnostic data of the transaction based on the execution of the plurality of instructions; identifying, by the computer system, a second indicator associated with the plurality of instructions signaling an ending instruction of the transaction comprising the plurality of instructions; and based on an abort of the transaction, not saving the memory store data of the transaction to memory.
    Type: Grant
    Filed: August 11, 2015
    Date of Patent: October 4, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Valentina Salapura
  • Patent number: 9454659
    Abstract: This invention teaches a system and methods of detecting software vulnerabilities in a computer program by analyzing the compiled code and optionally the source code of the computer program. The invention models compiled software to examine both control flow and dataflow properties of the target program. A comprehensive instruction model is used for each instruction of the compiled code, and is complemented by a control flow graph that includes all potential control flow paths of the instruction. A data flow model is used to record the flow of unsafe data during the execution of the program. The system analyzes the data flow model and creates a security finding corresponding to each instruction that calls an unsafe function on unsafe data. These security findings are aggregated in a security report along with the corresponding debug information, any ancillary information, remediation recommendations and the optional source code information for each instruction that triggered the security finding.
    Type: Grant
    Filed: August 15, 2014
    Date of Patent: September 27, 2016
    Assignee: SECURISEA, INC.
    Inventor: Joshua M. Daymont
  • Patent number: 9442755
    Abstract: A method and a system are provided for hardware scheduling of indexed barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated and when each thread reaches the barrier instruction, the thread pauses execution of the instructions. A first sub-group of threads in the plurality of threads is associated with a first sub-barrier index and a second sub-group of threads in the plurality of threads is associated with a second sub-barrier index. When the barrier instruction can be scheduled for execution, threads in the first sub-group are executed serially and threads in the second sub-group are executed serially and at least one thread in the first sub-group is executed in parallel with at least one thread in the second sub-group.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: September 13, 2016
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Tero Tapani Karras
  • Patent number: 9430236
    Abstract: Embodiments relate to code stack management. An aspect includes a processor configured to execute a software application. Another aspect includes a code stack memory area and a data stack memory area, the code stack memory area being separate from the data stack memory area. Another aspect includes maintaining a data stack in the data stack memory area, the data stack comprising a plurality of stack frames comprising one or more data variables corresponding to the execution of the software application. Another aspect includes maintaining a code stack in the code stack memory area, the code stack comprising a plurality of code stack entries comprising executable computer code corresponding to the execution of the software application.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: August 30, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 9395984
    Abstract: Swapping branch direction history(ies) in response to a branch prediction table swap instruction(s), and related systems and methods are disclosed. In one embodiment, a branch history management circuit is configured to process a branch prediction table swap instruction. In response to the branch prediction table swap instruction, the branch history management circuit is configured to swap a prior branch direction history set assigned to a current software code region from cache memory, into a branch prediction table (BPT) for use in branch prediction. The current branch direction history set is swapped out of the BPT and stored in cache memory to avoid being overwritten. In this manner, branch direction history sets assigned to particular software code regions are used for branch prediction when processing the particular software code regions. Therefore, branch prediction accuracy and instruction processing throughput of an instruction processing system are increased.
    Type: Grant
    Filed: September 12, 2012
    Date of Patent: July 19, 2016
    Assignee: QUALCOMM Incorporated
    Inventor: John William Haskins, Jr.
  • Patent number: 9280297
    Abstract: A transactional memory (TM) includes a control circuit pipeline and an associated memory unit. The memory unit stores a plurality of rings. The pipeline maintains, for each ring, a head pointer and a tail pointer. A ring operation stage of the pipeline maintains the pointers as values are put onto and are taken off the rings. A put command causes the TM to put a value into a ring, provided the ring is not full. A get command causes the TM to take a value off a ring, provided the ring is not empty. A put with low priority command causes the TM to put a value into a ring, provided the ring has at least a predetermined amount of free buffer space. A get from a set of rings command causes the TM to get a value from the highest priority non-empty ring (of a specified set of rings).
    Type: Grant
    Filed: February 25, 2015
    Date of Patent: March 8, 2016
    Assignee: Netronome Systems, Inc.
    Inventor: Gavin J. Stark
  • Patent number: 9280389
    Abstract: A device, such as a constrained device that includes a processing device and memory, schedules user-defined independently executable functions to execute from a single stack common to all user-defined independently executable functions according to availability and priority of the user-defined independently executable functions relative to other user-defined independently executable functions and preempts currently running user-defined independently executable function by placing the particular user-defined independently executable function on a single stack that has register values for the currently running user-defined independently executable function.
    Type: Grant
    Filed: December 30, 2014
    Date of Patent: March 8, 2016
    Assignee: Tyco Fire & Security GmbH
    Inventors: Vincent J. Lipsio, Jr., Paul B. Rasband
  • Patent number: 9201689
    Abstract: A method and system for software emulation of hardware support for multi-threaded processing using virtual hardware threads is provided. A software threading system executes on a node that has one or more processors, each with one or more hardware threads. The node has access to local memory and access to remote memory. The software threading system manages the execution of tasks of a user program. The software threading system switches between the virtual hardware threads representing the tasks as the tasks issue remote memory access requests while in user privilege mode. Thus, the software threading system emulates more hardware threads than the underlying hardware supports and switches the virtual hardware threads without the overhead of a context switch to the operating system or change in privilege mode.
    Type: Grant
    Filed: April 22, 2011
    Date of Patent: December 1, 2015
    Assignee: Cray Inc.
    Inventors: Steven L. Scott, Gregory B. Titus, Sung-Eun Choi, Troy A. Johnson, David Mizell, Michael F. Ringenburg, Karlon West
  • Patent number: 9183014
    Abstract: Systems and methods of enabling virtual calls in a single instruction multiple data (SIMD) environment may involve detecting a virtual call of a function and using a single dispatch of the function to invoke the virtual call for two or more channels of the virtual call. In one example, it is determined that the two or more channels share a common target address and a single dispatch of the function is conducted with respect to the common target address. The process may be iterated for additional channels of the virtual call that share a common target address.
    Type: Grant
    Filed: February 16, 2011
    Date of Patent: November 10, 2015
    Assignee: Intel Corporation
    Inventors: Wei-Yu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
  • Patent number: 9135145
    Abstract: A system and methods are provided for distributed tracing in a distributed application. In one embodiment, a method includes observing a plurality of messages sent and received among components of the distributed application, generating a probabilistic model of a call flow from observed messages of the distributed system, and constructing a call flow graph based on the probabilistic model for the distributed application. Distributed tracing may include observing messages by performing the subscription-based observation techniques and operations to receive, message traces describing messages being communicated among components of the distributed application. In this regard, the tracing service may merge message traces from different instrumentation points with message traces obtained by observing message queues to generate a probabilistic model and call flow graph.
    Type: Grant
    Filed: January 28, 2013
    Date of Patent: September 15, 2015
    Assignee: Rackspace US, Inc.
    Inventors: Paul Voccio, Matthew Charles Dietz
  • Patent number: 9053325
    Abstract: A decryption key management system includes a memory, a memory controller, a decryption engine, and an on-chip crypto-accelerator. A key blob and an encrypted code are stored in the memory. The memory controller fetches the key blob and stores it in a memory buffer. The decryption engine fetches the key blob and decrypts it using an OTP key to generate a decryption key. The decryption key is used to decrypt the encrypted code and generate a decrypted code.
    Type: Grant
    Filed: August 22, 2013
    Date of Patent: June 9, 2015
    Assignee: FREESCALE SEMICONDUCTOR, INC.
    Inventors: Mohit Arora, Rakesh Pandey
  • Publication number: 20150106604
    Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. When the processor processes a given instruction of a given instruction type, the processor updates a corresponding performance counter. When the performance counter reaches a threshold, the processor generates an interrupt and compares a location of the given instruction with stored locations in a given list. If a match is not found, then the processor processes an instruction following the given instruction in the computer program without processing intermediate instrumentation code. If a match is found, then the processor processes instrumentation code. Regardless of whether or not the instrumentation code is processed, when control flow returns to the computer program, the corresponding performance counter is initialized with a random value.
    Type: Application
    Filed: October 15, 2013
    Publication date: April 16, 2015
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Joseph L. Greathouse, David S. Christie
  • Patent number: 9009734
    Abstract: One or more embodiments of the invention is a computer-implemented method for speculatively executing application event responses. The method includes the steps of identifying one or more event responses that could be issued for execution by an application being executed by a master process, for each event response, generating a child process to execute the event response, determining that a first event response included in the one or more event responses has been issued for execution by the application, committing the child process associated with the first event response as a new master process, and aborting the master process and all child processes other than the child process associated with the first event response.
    Type: Grant
    Filed: March 6, 2012
    Date of Patent: April 14, 2015
    Assignee: AUTODESK, Inc.
    Inventor: Francesco Iorio
  • Publication number: 20150100768
    Abstract: A single instruction multiple thread (SIMT) processor 2 includes scheduling circuitry 8 for calculating a next scheduled execution point for execution circuits 4 which execute respective threads corresponding to a common program. In addition to calculating the next scheduled execution point, the scheduling circuitry determines a runner up execution point which would have been determined as the next scheduled execution point if the threads which actually correspond to the next scheduled execution point had been removed from consideration. This runner up execution point is used to identify points of re-convergence within the program flow and as part of the operation of a static branch predictor 10.
    Type: Application
    Filed: October 8, 2013
    Publication date: April 9, 2015
    Applicant: ARM LIMITED
    Inventors: Rune HOLM, JR., David Hennah MANSELL
  • Patent number: 8959319
    Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: February 17, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
  • Patent number: 8943299
    Abstract: A pointer is for pointing to a next-to-read location within a stack of information. For pushing information onto the stack: a value is saved of the pointer, which points to a first location within the stack as being the next-to-read location; the pointer is updated so that it points to a second location within the stack as being the next-to-read location; and the information is written for storage at the second location. For popping the information from the stack: in response to the pointer, the information is read from the second location as the next-to-read location; and the pointer is restored to equal the saved value so that it points to the first location as being the next-to-read location.
    Type: Grant
    Filed: June 17, 2010
    Date of Patent: January 27, 2015
    Assignee: International Business Machines Corporation
    Inventors: Kattamuri Ekanadham, Brian R. Konigsburg, David S. Levitan, Jose E. Moreira, David Mui, Il Park
  • Publication number: 20150026442
    Abstract: A method, system and computer program product embodied on a computer-readable medium are provided for managing the execution of out-of-order instructions. The method includes the steps of receiving a plurality of instructions and identifying a subset of instructions in the plurality of instructions to be executed out-of-order.
    Type: Application
    Filed: July 18, 2013
    Publication date: January 22, 2015
    Applicant: NVIDIA Corporation
    Inventors: Olivier Giroux, Robert Ohannessian, Jr., Jack H. Choquette, William Parsons Newhall, Jr.
  • Patent number: 8924693
    Abstract: The described embodiments include a processor that executes vector instructions. While dispatching instructions at runtime, the processor encounters a predicate-generating instruction. Upon determining that a result of the predicate-generating instruction is predictable, the processor dispatches a prediction micro-operation associated with the predicate-generating instruction, wherein the prediction micro-operation generates a predicted result vector for the predicate-generating instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. When executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.
    Type: Grant
    Filed: May 12, 2011
    Date of Patent: December 30, 2014
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8914622
    Abstract: Processors may be tested according to various implementations. In one general implementation, a process for processor testing may include randomly generating a first plurality of branch instructions for a first portion of an instruction set, each branch instruction in the first portion branching to a respective instruction in a second portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. The process may also include randomly generating a second plurality of branch instructions for the second portion of the instruction set, each branch instruction in the second portion branching to a respective instruction in the first portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. The process may additionally include generating a plurality of instructions to increment a counter when each branch instruction is encountered during execution.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: December 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Abhishek Bansal, Nitin P. Gupta, Brad L. Herold, Jayakumar N. Sankarannair
  • Publication number: 20140365752
    Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.
    Type: Application
    Filed: June 7, 2013
    Publication date: December 11, 2014
    Inventors: Lee W. HOWES, Benedict R. GASTER, Michael C. HOUSTON
  • Patent number: 8898441
    Abstract: A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.
    Type: Grant
    Filed: April 21, 2012
    Date of Patent: November 25, 2014
    Assignee: International Business Machines Corporation
    Inventors: Giles Roger Frazier, Ronald P. Hall
  • Patent number: 8862861
    Abstract: Techniques are disclosed relating to a processor that is configured to execute control transfer instructions (CTIs). In some embodiments, the processor includes a mechanism that suppresses results of mispredicted younger CTIs on a speculative execution path. This mechanism permits the branch predictor to maintain its fidelity, and eliminates spurious flushes of the pipeline. In one embodiment, a misprediction bit is be used to indicate that a misprediction has occurred, and younger CTIs than the CTI that was mispredicted are suppressed. In some embodiments, the processor may be configured to execute instruction streams from multiple threads. Each thread may include a misprediction indication. CTIs in each thread may execute in program order with respect to other CTIs of the thread, while instructions other than CTIs may execute out of program order.
    Type: Grant
    Filed: September 8, 2011
    Date of Patent: October 14, 2014
    Assignee: Oracle International Corporation
    Inventors: Christopher H. Olson, Manish K. Shah
  • Patent number: 8850436
    Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.
    Type: Grant
    Filed: September 28, 2010
    Date of Patent: September 30, 2014
    Assignee: NVIDIA Corporation
    Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
  • Publication number: 20140258693
    Abstract: A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.
    Type: Application
    Filed: March 11, 2013
    Publication date: September 11, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: John Erik Lindholm, Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine
  • Patent number: 8825958
    Abstract: A digital system is provided for high-performance cache systems. The digital system includes a processor core and a cache control unit. The processor core is capable of being coupled to a first memory containing executable instructions and a second memory with a faster speed than the first memory. Further, the processor core is configured to execute one or more instructions of the executable instructions from the second memory. The cache control unit is configured to be couple to the first memory, the second memory, and the processor core to fill at least the one or more instructions from the first memory to the second memory before the processor core executes the one or more instructions.
    Type: Grant
    Filed: August 8, 2013
    Date of Patent: September 2, 2014
    Assignee: Shanghai Xin Hao Micro Electronics Co. Ltd.
    Inventors: Kenneth Chenghao Lin, Haoqi Ren
  • Publication number: 20140244986
    Abstract: A system and method to select a packet format based on a number of executed threads is disclosed. In a particular embodiment, a method includes determining, at a multi-threaded processor, a number of threads of a plurality of threads executing during a time period. A packet format is determined from a plurality of formats based at least in part on the determined number of threads. Data associated with execution of an instruction by a particular thread is stored in accordance with the selected format in a memory (e.g., a buffer).
    Type: Application
    Filed: February 26, 2013
    Publication date: August 28, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Prasanna Kumar Balasundaram, Suresh K. Venkumahanti
  • Patent number: 8793474
    Abstract: A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.
    Type: Grant
    Filed: September 20, 2010
    Date of Patent: July 29, 2014
    Assignee: International Business Machines Corporation
    Inventors: Giles Roger Frazier, Ronald P. Hall
  • Patent number: 8782382
    Abstract: In one embodiment, a processor includes an execution unit and at least one last branch record (LBR) register to store address information of a branch taken during program execution. This register may further store a transaction indicator to indicate whether the branch was taken during a transactional memory (TM) transaction. This register may further store an abort indicator to indicate whether the branch was caused by a transaction abort. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 6, 2013
    Date of Patent: July 15, 2014
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Peter Lachner, Laura A. Knauth, Konrad K. Lai
  • Patent number: 8782381
    Abstract: Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.
    Type: Grant
    Filed: April 12, 2012
    Date of Patent: July 15, 2014
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, Brian Flachs, Brad W. Michael, Mark R. Nutter, John K. P. O'Brien, Kathryn M. O'Brien, Tao Zhang
  • Patent number: 8769539
    Abstract: A method and apparatus are provided to control the order of execution of load and store operations. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create the apparatus. One embodiment of the method includes determining whether a first group, comprising at least one or more instructions, is to be selected from a scheduling queue of a processor for execution using either a first execution mode or a second execution mode. The method also includes, responsive to determining that the first group is to be selected for execution using the second execution mode, preventing selection of the first group until a second group, comprising at least one or more instructions, that entered the scheduling queue prior to the first group is selected for execution.
    Type: Grant
    Filed: November 16, 2010
    Date of Patent: July 1, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Daniel Hopper, Suzanne Plummer, Christopher D. Bryant
  • Publication number: 20140181485
    Abstract: A data processing apparatus 2 supports speculative execution and the use of sticky bits. A different version of a sticky bit is associated with each segment of the speculative program flow. The segments of the program flow are separated by speculation nodes corresponding to program instructions which may be followed by a plurality of different alternative program instruction serving as the next program instruction. When a speculation node is resolved, then the segments separated by that speculation node are merged and the sticky bit values for those two segments are merged.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: ARM LIMITED
    Inventors: Luca SCALABRINO, Cédric Denis Robert Airaud, Guillaume Schon, Frederic Jean Denis Arsanto
  • Patent number: 8751776
    Abstract: A branch target address table is provided for each branch instruction having a plurality of branch targets. Each branch target address table stores a history of a plurality of branch target addresses determined in the past by executing a corresponding branch instruction. A branch target prediction unit predicts a predicted branch target address with respect to a branch instruction with reference to the history of branch target addresses stored in the branch target address table corresponding to the branch instruction. The predicted branch target address obtained as a result of the prediction is stored, for example, in a predicted branch target address storage unit in association with the branch instruction, and is referenced by an instruction fetch control unit at the time of prefetching a branch target instruction.
    Type: Grant
    Filed: June 10, 2013
    Date of Patent: June 10, 2014
    Assignee: Fujitsu Limited
    Inventor: Megumi Ukai
  • Patent number: 8726292
    Abstract: A system and method for Inter-Thread Communication using software interrupts in a multithread processor are disclosed. Bits in a shared control register and/or a private control register can enable an Inter-Thread Communication path. When the interrupt is triggered, one thread processor raises an interrupt in another thread processor.
    Type: Grant
    Filed: August 25, 2005
    Date of Patent: May 13, 2014
    Assignee: Broadcom Corporation
    Inventors: Kimming So, Jason Leonard
  • Patent number: 8713293
    Abstract: An electronic circuit (4000) includes a bias value generator circuit (3900) operable to supply a varying bias value in a programmable range, and an instruction circuit (3625, 4010) responsive to a first instruction to program the range of said bias value generator circuit (3900) and further responsive to a second instruction having an operand to repeatedly issue said second instruction with said operand varied in an operand value range determined as a function of the varying bias value.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: April 29, 2014
    Assignee: Texas Instruments Incorporated
    Inventors: Kenichi Tashiro, Hiroyuki Mizuno, Yuji Umemoto
  • Patent number: 8694760
    Abstract: A branch prediction mechanism within an information processing device comprises a call stack where function arguments are stacked when function calls are performed. The call stack stores arguments relating to branch instructions within the function. The branch prediction mechanism stores the branch instruction address, the leading value of the call stack, and the branch destination address at branch instruction execution time, which are in correspondence, in a branch result buffer. A branch prediction unit obtains the branch instruction address and leading value of the call stack when notified of branch instruction execution, searches the branch result buffer for a branch destination corresponding to the address and leading value, and predicts the search result as the branch destination of the executed branch instruction. An instruction fetch unit fetches instructions from the branch destination predicted by the branch prediction unit.
    Type: Grant
    Filed: May 19, 2010
    Date of Patent: April 8, 2014
    Assignee: Panasonic Corporation
    Inventor: Katsushige Amano
  • Patent number: 8694973
    Abstract: Methods and systems for executing a code stream of non-native binary code on a computing system are disclosed. One method includes parsing the code stream to detect a plurality of elements including one or more branch destinations, and traversing the code stream to detect a plurality of non-native operators. The method also includes executing a pattern matching algorithm against the plurality of non-native operators to find combinations of two or more non-native operators that do not span across a detected branch destination and that correspond to one or more target operators executable by the computing system. The method further includes generating a second code stream executable on the computing system including the one or more target operators.
    Type: Grant
    Filed: September 27, 2011
    Date of Patent: April 8, 2014
    Assignee: Unisys Corporation
    Inventor: Andrew Ward Beale
  • Publication number: 20140075165
    Abstract: This disclosure is directed to techniques for executing subroutines in a single instruction, multiple data (SIMD) processing system that is subject to divergent thread conditions. In particular, a resume counter-based approach for managing divergent thread state is described that utilizes program module-specific minimum resume counters (MINRCs) for the efficient processing of control flow instructions. In some examples, the techniques of this disclosure may include using a main program MINRC to control the execution of a main program module and subroutine-specific MINRCs to control the execution of subroutine program modules. Techniques are also described for managing the main program MINRC and subroutine-specific MINRCs when subroutine call and return instructions are executed. Techniques are also described for updating a subroutine-specific MINRC to ensure that the updated MINRC value for the subroutine-specific MINRC is within the program space allocated for the subroutine.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 13, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventor: Lin Chen