Branching (e.g., Delayed Branch, Loop Control, Branch Predict, Interrupt) Patents (Class 712/233)
-
Patent number: 9824214Abstract: This invention teaches a system and methods of detecting software vulnerabilities in a computer program by analyzing the compiled code and optionally the source code of the computer program. The invention models compiled software to examine both control flow and data flow properties of the target program. A comprehensive instruction model is used for each instruction of the compiled code, and is complemented by a control flow graph that includes all potential control flow paths of the instruction. A data flow model is used to record the flow of unsafe data during the execution of the program. The system analyzes the data flow model and creates a security finding corresponding to each instruction that calls an unsafe function on unsafe data. The security findings are aggregated in a security report. The system further uses precomputation to improve performance by caching 1-to-many data flow mapping for each basic block in the code.Type: GrantFiled: February 3, 2016Date of Patent: November 21, 2017Assignee: SECURISEA, INC.Inventor: Joshua M. Daymont
-
Patent number: 9715593Abstract: This invention discloses a system and methods of detecting software vulnerabilities in a computer program. The invention models compiled software to examine both control flow and data flow properties of the target program. A comprehensive instruction model is used for each instruction of the compiled code, and is complemented by a control flow graph that includes all potential control flow paths of the instruction. A data flow model is used to record the flow of unsafe data during the execution of the program. The system analyzes the data flow model and creates a security finding corresponding to each instruction that calls an unsafe function on unsafe data. These security findings are aggregated in a security report along with the corresponding debug information, remediation recommendations and any ancillary information related to each instruction that triggered the security finding.Type: GrantFiled: August 30, 2016Date of Patent: July 25, 2017Assignee: SECURISEA, INC.Inventor: Joshua M. Daymont
-
Patent number: 9678866Abstract: A transactional memory (TM) includes a control circuit pipeline and an associated memory unit. The memory unit stores a plurality of rings. The pipeline maintains, for each ring, a head pointer and a tail pointer. A ring operation stage of the pipeline maintains the pointers as values are put onto and are taken off the rings. A put command causes the TM to put a value into a ring, provided the ring is not full. A get command causes the TM to take a value off a ring, provided the ring is not empty. A put with low priority command causes the TM to put a value into a ring, provided the ring has at least a predetermined amount of free buffer space. A get from a set of rings command causes the TM to get a value from the highest priority non-empty ring (of a specified set of rings).Type: GrantFiled: May 29, 2015Date of Patent: June 13, 2017Assignee: Netronome Systems, Inc.Inventor: Gavin J. Stark
-
Patent number: 9659168Abstract: A method of generating identification data for identifying software is disclosed. The method includes executing said software so as to alter one or more addresses of a memory stack reserved in memory for execution of the software. Identification data is then generated for identifying the software based on the one or more altered addresses of the memory stack.Type: GrantFiled: October 25, 2013Date of Patent: May 23, 2017Assignee: NXP B.V.Inventor: Arnaud Collard
-
Patent number: 9639361Abstract: A trace unit for generating items of trace data indicative of processing activities of a processor executing a stream of instructions, the unit includes trace circuitry for monitoring a behavior of the processor; storage circuitry for storing current trace control data for controlling the trace circuitry; a data store for storing at least some of the trace control data; the trace circuitry being configured to store the trace control data in the data store in response to detection of execution of the group of instructions, wherein the trace circuitry is responsive to detecting the at least one processor cancelling at least one group of the speculatively executed instructions to retrieve at least some of the trace control data stored in the data store for the group of instructions executed before the cancelled speculatively executed instructions and to store the retrieved trace control data in the storage circuitry.Type: GrantFiled: March 12, 2014Date of Patent: May 2, 2017Assignee: ARM LIMITEDInventors: Paul Anthony Gilkerson, John Michael Horley
-
Patent number: 9632780Abstract: A system serialization capability is provided to facilitate processing in those environments that allow multiple processors to update the same resources. The system serialization capability is used to facilitate processing in a multi-processing environment in which guests and hosts use locks to provide serialization. The system serialization capability includes a diagnose instruction which is issued after the host acquires a lock, eliminating the need for the guest to acquire the lock.Type: GrantFiled: December 3, 2013Date of Patent: April 25, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Lisa C. Heller
-
Patent number: 9612881Abstract: Apparatuses, methods, and systems are configured to perform unambiguous parameter sampling in a heterogeneous multi-core or multi-threaded environment by masking one or more thread requests; and, in response to bus activity ceasing for the one or more masked thread requests and completing any routine being processed for the one or more masked threads, processing a command by executing at least one of a command routine or a command thread, wherein the command routine or the command thread reads the parameter using thread atomicity with deterministic synchronization. One or more thread requests may be selected for masking by monitoring thread activity for each of a plurality of threads.Type: GrantFiled: March 30, 2015Date of Patent: April 4, 2017Assignee: NXP USA, Inc.Inventor: Graham Edmiston
-
Patent number: 9606850Abstract: A data processing apparatus comprises processing circuitry for executing a stream of instructions, and exception handling circuitry for selecting, from one or more exceptions, an exception to be handled by the processing circuitry. The unselected exceptions are referred to as pending exceptions. The data processing apparatus further comprises trace generating circuitry that generates trace data packets in dependence on activity of the processing circuitry. The trace generating circuitry detects pending exceptions and, if an exception is detected to be pending, includes an indication of the pending exception in at least one trace data packet. By tracking when a particular exception is pended, rather than when it is selected for handling by the processing circuitry, it is possible to more precisely determine when the exception occurred, as opposed to when it is finally handled.Type: GrantFiled: March 12, 2013Date of Patent: March 28, 2017Assignee: ARM LimitedInventors: John Michael Horley, Simon John Craske
-
Patent number: 9563430Abstract: Embodiments relate to multithreaded branch prediction. An aspect includes a system for dynamically evaluating how to share entries of a multithreaded branch prediction structure. The system includes a first-level branch target buffer coupled to a processor circuit. The processor circuit is configured to perform a method. The method includes receiving a search request to locate branch prediction information associated with the search request, and searching for an entry corresponding to the search request in the first-level branch prediction structure. The entry is not allowed based on a thread state of the entry indicating that the entry has caused a problem on a thread associated with the thread state.Type: GrantFiled: March 19, 2014Date of Patent: February 7, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: James J. Bonanno, Daniel Lipetz, Brian R. Prasky, Anthony Saporito
-
Patent number: 9495237Abstract: Corruption of call stacks is detected by using guard words placed in the call stacks. A called function executing on a processor of a computing environment checks a guard word in a stack frame of a calling function. The checking determines whether the guard word has an expected value. Based on determining the guard word has an unexpected value, an indication of corruption of the stack frame is provided.Type: GrantFiled: January 6, 2016Date of Patent: November 15, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Ronald I. McIntosh
-
Patent number: 9483391Abstract: According to one general aspect, a method may include monitoring the execution of at least a portion of a software application. The method may also include collecting subroutine call information regarding a plurality of subroutine calls included by the portion of the software application, wherein one or more of the subroutine calls is selected for detailed data recording. The method may further include pruning, as the software application is being executed, a subroutine call tree to include only the subroutine calls selected for detailed data recording and one or more parent subroutine calls of each subroutine calls selected for detailed data recording.Type: GrantFiled: January 21, 2016Date of Patent: November 1, 2016Assignee: Identify Software Ltd.Inventors: Eyal Koren, Asaf Dafner, Shiri Semo Judelman
-
Patent number: 9477478Abstract: The disclosure relates to predicting simple and polymorphic branch instructions. An embodiment of the disclosure detects that a program instruction is a branch instruction, determines whether a program counter for the branch instruction is stored in a program counter filter, and, if the program counter is stored in the program counter filter, prevents the program counter from being stored in a first level predictor.Type: GrantFiled: May 16, 2012Date of Patent: October 25, 2016Assignee: QUALCOMM IncorporatedInventors: Kulin N. Kothari, Michael William Morrow, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel, Daren Eugene Streett
-
Patent number: 9465746Abstract: Gathering diagnostics during a transactional execution in a transactional memory environment, a transactional memory environment for performing transactional executions is provided. Included is identifying a first indicator, by a computer system, signaling a beginning instruction of a transaction comprising a plurality of instructions; generating, by the computer system, a computed digest based on the execution of at least one of the plurality of instructions; accumulating, by the computer system, a diagnostic data of the transaction based on the execution of the plurality of instructions; identifying, by the computer system, a second indicator associated with the plurality of instructions signaling an ending instruction of the transaction comprising the plurality of instructions; and based on an abort of the transaction, not saving the memory store data of the transaction to memory.Type: GrantFiled: January 24, 2014Date of Patent: October 11, 2016Assignee: International Business Machines CorporationInventors: Michael Karl Gschwind, Valentina Salapura
-
Patent number: 9460020Abstract: Gathering diagnostics during a transactional execution in a transactional memory environment, a transactional memory environment for performing transactional executions is provided. Included is identifying a first indicator, by a computer system, signaling a beginning instruction of a transaction comprising a plurality of instructions; generating, by the computer system, a computed digest based on the execution of at least one of the plurality of instructions; accumulating, by the computer system, a diagnostic data of the transaction based on the execution of the plurality of instructions; identifying, by the computer system, a second indicator associated with the plurality of instructions signaling an ending instruction of the transaction comprising the plurality of instructions; and based on an abort of the transaction, not saving the memory store data of the transaction to memory.Type: GrantFiled: August 11, 2015Date of Patent: October 4, 2016Assignee: International Business Machines CorporationInventors: Michael Karl Gschwind, Valentina Salapura
-
Patent number: 9454659Abstract: This invention teaches a system and methods of detecting software vulnerabilities in a computer program by analyzing the compiled code and optionally the source code of the computer program. The invention models compiled software to examine both control flow and dataflow properties of the target program. A comprehensive instruction model is used for each instruction of the compiled code, and is complemented by a control flow graph that includes all potential control flow paths of the instruction. A data flow model is used to record the flow of unsafe data during the execution of the program. The system analyzes the data flow model and creates a security finding corresponding to each instruction that calls an unsafe function on unsafe data. These security findings are aggregated in a security report along with the corresponding debug information, any ancillary information, remediation recommendations and the optional source code information for each instruction that triggered the security finding.Type: GrantFiled: August 15, 2014Date of Patent: September 27, 2016Assignee: SECURISEA, INC.Inventor: Joshua M. Daymont
-
Patent number: 9442755Abstract: A method and a system are provided for hardware scheduling of indexed barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated and when each thread reaches the barrier instruction, the thread pauses execution of the instructions. A first sub-group of threads in the plurality of threads is associated with a first sub-barrier index and a second sub-group of threads in the plurality of threads is associated with a second sub-barrier index. When the barrier instruction can be scheduled for execution, threads in the first sub-group are executed serially and threads in the second sub-group are executed serially and at least one thread in the first sub-group is executed in parallel with at least one thread in the second sub-group.Type: GrantFiled: March 15, 2013Date of Patent: September 13, 2016Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Tero Tapani Karras
-
Patent number: 9430236Abstract: Embodiments relate to code stack management. An aspect includes a processor configured to execute a software application. Another aspect includes a code stack memory area and a data stack memory area, the code stack memory area being separate from the data stack memory area. Another aspect includes maintaining a data stack in the data stack memory area, the data stack comprising a plurality of stack frames comprising one or more data variables corresponding to the execution of the software application. Another aspect includes maintaining a code stack in the code stack memory area, the code stack comprising a plurality of code stack entries comprising executable computer code corresponding to the execution of the software application.Type: GrantFiled: September 30, 2014Date of Patent: August 30, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9395984Abstract: Swapping branch direction history(ies) in response to a branch prediction table swap instruction(s), and related systems and methods are disclosed. In one embodiment, a branch history management circuit is configured to process a branch prediction table swap instruction. In response to the branch prediction table swap instruction, the branch history management circuit is configured to swap a prior branch direction history set assigned to a current software code region from cache memory, into a branch prediction table (BPT) for use in branch prediction. The current branch direction history set is swapped out of the BPT and stored in cache memory to avoid being overwritten. In this manner, branch direction history sets assigned to particular software code regions are used for branch prediction when processing the particular software code regions. Therefore, branch prediction accuracy and instruction processing throughput of an instruction processing system are increased.Type: GrantFiled: September 12, 2012Date of Patent: July 19, 2016Assignee: QUALCOMM IncorporatedInventor: John William Haskins, Jr.
-
Patent number: 9280297Abstract: A transactional memory (TM) includes a control circuit pipeline and an associated memory unit. The memory unit stores a plurality of rings. The pipeline maintains, for each ring, a head pointer and a tail pointer. A ring operation stage of the pipeline maintains the pointers as values are put onto and are taken off the rings. A put command causes the TM to put a value into a ring, provided the ring is not full. A get command causes the TM to take a value off a ring, provided the ring is not empty. A put with low priority command causes the TM to put a value into a ring, provided the ring has at least a predetermined amount of free buffer space. A get from a set of rings command causes the TM to get a value from the highest priority non-empty ring (of a specified set of rings).Type: GrantFiled: February 25, 2015Date of Patent: March 8, 2016Assignee: Netronome Systems, Inc.Inventor: Gavin J. Stark
-
Patent number: 9280389Abstract: A device, such as a constrained device that includes a processing device and memory, schedules user-defined independently executable functions to execute from a single stack common to all user-defined independently executable functions according to availability and priority of the user-defined independently executable functions relative to other user-defined independently executable functions and preempts currently running user-defined independently executable function by placing the particular user-defined independently executable function on a single stack that has register values for the currently running user-defined independently executable function.Type: GrantFiled: December 30, 2014Date of Patent: March 8, 2016Assignee: Tyco Fire & Security GmbHInventors: Vincent J. Lipsio, Jr., Paul B. Rasband
-
Patent number: 9201689Abstract: A method and system for software emulation of hardware support for multi-threaded processing using virtual hardware threads is provided. A software threading system executes on a node that has one or more processors, each with one or more hardware threads. The node has access to local memory and access to remote memory. The software threading system manages the execution of tasks of a user program. The software threading system switches between the virtual hardware threads representing the tasks as the tasks issue remote memory access requests while in user privilege mode. Thus, the software threading system emulates more hardware threads than the underlying hardware supports and switches the virtual hardware threads without the overhead of a context switch to the operating system or change in privilege mode.Type: GrantFiled: April 22, 2011Date of Patent: December 1, 2015Assignee: Cray Inc.Inventors: Steven L. Scott, Gregory B. Titus, Sung-Eun Choi, Troy A. Johnson, David Mizell, Michael F. Ringenburg, Karlon West
-
Patent number: 9183014Abstract: Systems and methods of enabling virtual calls in a single instruction multiple data (SIMD) environment may involve detecting a virtual call of a function and using a single dispatch of the function to invoke the virtual call for two or more channels of the virtual call. In one example, it is determined that the two or more channels share a common target address and a single dispatch of the function is conducted with respect to the common target address. The process may be iterated for additional channels of the virtual call that share a common target address.Type: GrantFiled: February 16, 2011Date of Patent: November 10, 2015Assignee: Intel CorporationInventors: Wei-Yu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
-
Patent number: 9135145Abstract: A system and methods are provided for distributed tracing in a distributed application. In one embodiment, a method includes observing a plurality of messages sent and received among components of the distributed application, generating a probabilistic model of a call flow from observed messages of the distributed system, and constructing a call flow graph based on the probabilistic model for the distributed application. Distributed tracing may include observing messages by performing the subscription-based observation techniques and operations to receive, message traces describing messages being communicated among components of the distributed application. In this regard, the tracing service may merge message traces from different instrumentation points with message traces obtained by observing message queues to generate a probabilistic model and call flow graph.Type: GrantFiled: January 28, 2013Date of Patent: September 15, 2015Assignee: Rackspace US, Inc.Inventors: Paul Voccio, Matthew Charles Dietz
-
Patent number: 9053325Abstract: A decryption key management system includes a memory, a memory controller, a decryption engine, and an on-chip crypto-accelerator. A key blob and an encrypted code are stored in the memory. The memory controller fetches the key blob and stores it in a memory buffer. The decryption engine fetches the key blob and decrypts it using an OTP key to generate a decryption key. The decryption key is used to decrypt the encrypted code and generate a decrypted code.Type: GrantFiled: August 22, 2013Date of Patent: June 9, 2015Assignee: FREESCALE SEMICONDUCTOR, INC.Inventors: Mohit Arora, Rakesh Pandey
-
Publication number: 20150106604Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. When the processor processes a given instruction of a given instruction type, the processor updates a corresponding performance counter. When the performance counter reaches a threshold, the processor generates an interrupt and compares a location of the given instruction with stored locations in a given list. If a match is not found, then the processor processes an instruction following the given instruction in the computer program without processing intermediate instrumentation code. If a match is found, then the processor processes instrumentation code. Regardless of whether or not the instrumentation code is processed, when control flow returns to the computer program, the corresponding performance counter is initialized with a random value.Type: ApplicationFiled: October 15, 2013Publication date: April 16, 2015Applicant: Advanced Micro Devices, Inc.Inventors: Joseph L. Greathouse, David S. Christie
-
Patent number: 9009734Abstract: One or more embodiments of the invention is a computer-implemented method for speculatively executing application event responses. The method includes the steps of identifying one or more event responses that could be issued for execution by an application being executed by a master process, for each event response, generating a child process to execute the event response, determining that a first event response included in the one or more event responses has been issued for execution by the application, committing the child process associated with the first event response as a new master process, and aborting the master process and all child processes other than the child process associated with the first event response.Type: GrantFiled: March 6, 2012Date of Patent: April 14, 2015Assignee: AUTODESK, Inc.Inventor: Francesco Iorio
-
Publication number: 20150100768Abstract: A single instruction multiple thread (SIMT) processor 2 includes scheduling circuitry 8 for calculating a next scheduled execution point for execution circuits 4 which execute respective threads corresponding to a common program. In addition to calculating the next scheduled execution point, the scheduling circuitry determines a runner up execution point which would have been determined as the next scheduled execution point if the threads which actually correspond to the next scheduled execution point had been removed from consideration. This runner up execution point is used to identify points of re-convergence within the program flow and as part of the operation of a static branch predictor 10.Type: ApplicationFiled: October 8, 2013Publication date: April 9, 2015Applicant: ARM LIMITEDInventors: Rune HOLM, JR., David Hennah MANSELL
-
Patent number: 8959319Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.Type: GrantFiled: December 2, 2011Date of Patent: February 17, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
-
Patent number: 8943299Abstract: A pointer is for pointing to a next-to-read location within a stack of information. For pushing information onto the stack: a value is saved of the pointer, which points to a first location within the stack as being the next-to-read location; the pointer is updated so that it points to a second location within the stack as being the next-to-read location; and the information is written for storage at the second location. For popping the information from the stack: in response to the pointer, the information is read from the second location as the next-to-read location; and the pointer is restored to equal the saved value so that it points to the first location as being the next-to-read location.Type: GrantFiled: June 17, 2010Date of Patent: January 27, 2015Assignee: International Business Machines CorporationInventors: Kattamuri Ekanadham, Brian R. Konigsburg, David S. Levitan, Jose E. Moreira, David Mui, Il Park
-
Publication number: 20150026442Abstract: A method, system and computer program product embodied on a computer-readable medium are provided for managing the execution of out-of-order instructions. The method includes the steps of receiving a plurality of instructions and identifying a subset of instructions in the plurality of instructions to be executed out-of-order.Type: ApplicationFiled: July 18, 2013Publication date: January 22, 2015Applicant: NVIDIA CorporationInventors: Olivier Giroux, Robert Ohannessian, Jr., Jack H. Choquette, William Parsons Newhall, Jr.
-
Patent number: 8924693Abstract: The described embodiments include a processor that executes vector instructions. While dispatching instructions at runtime, the processor encounters a predicate-generating instruction. Upon determining that a result of the predicate-generating instruction is predictable, the processor dispatches a prediction micro-operation associated with the predicate-generating instruction, wherein the prediction micro-operation generates a predicted result vector for the predicate-generating instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. When executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.Type: GrantFiled: May 12, 2011Date of Patent: December 30, 2014Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 8914622Abstract: Processors may be tested according to various implementations. In one general implementation, a process for processor testing may include randomly generating a first plurality of branch instructions for a first portion of an instruction set, each branch instruction in the first portion branching to a respective instruction in a second portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. The process may also include randomly generating a second plurality of branch instructions for the second portion of the instruction set, each branch instruction in the second portion branching to a respective instruction in the first portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. The process may additionally include generating a plurality of instructions to increment a counter when each branch instruction is encountered during execution.Type: GrantFiled: April 30, 2012Date of Patent: December 16, 2014Assignee: International Business Machines CorporationInventors: Abhishek Bansal, Nitin P. Gupta, Brad L. Herold, Jayakumar N. Sankarannair
-
Publication number: 20140365752Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.Type: ApplicationFiled: June 7, 2013Publication date: December 11, 2014Inventors: Lee W. HOWES, Benedict R. GASTER, Michael C. HOUSTON
-
Patent number: 8898441Abstract: A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.Type: GrantFiled: April 21, 2012Date of Patent: November 25, 2014Assignee: International Business Machines CorporationInventors: Giles Roger Frazier, Ronald P. Hall
-
Patent number: 8862861Abstract: Techniques are disclosed relating to a processor that is configured to execute control transfer instructions (CTIs). In some embodiments, the processor includes a mechanism that suppresses results of mispredicted younger CTIs on a speculative execution path. This mechanism permits the branch predictor to maintain its fidelity, and eliminates spurious flushes of the pipeline. In one embodiment, a misprediction bit is be used to indicate that a misprediction has occurred, and younger CTIs than the CTI that was mispredicted are suppressed. In some embodiments, the processor may be configured to execute instruction streams from multiple threads. Each thread may include a misprediction indication. CTIs in each thread may execute in program order with respect to other CTIs of the thread, while instructions other than CTIs may execute out of program order.Type: GrantFiled: September 8, 2011Date of Patent: October 14, 2014Assignee: Oracle International CorporationInventors: Christopher H. Olson, Manish K. Shah
-
Patent number: 8850436Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.Type: GrantFiled: September 28, 2010Date of Patent: September 30, 2014Assignee: NVIDIA CorporationInventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
-
Publication number: 20140258693Abstract: A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.Type: ApplicationFiled: March 11, 2013Publication date: September 11, 2014Applicant: NVIDIA CORPORATIONInventors: John Erik Lindholm, Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine
-
Patent number: 8825958Abstract: A digital system is provided for high-performance cache systems. The digital system includes a processor core and a cache control unit. The processor core is capable of being coupled to a first memory containing executable instructions and a second memory with a faster speed than the first memory. Further, the processor core is configured to execute one or more instructions of the executable instructions from the second memory. The cache control unit is configured to be couple to the first memory, the second memory, and the processor core to fill at least the one or more instructions from the first memory to the second memory before the processor core executes the one or more instructions.Type: GrantFiled: August 8, 2013Date of Patent: September 2, 2014Assignee: Shanghai Xin Hao Micro Electronics Co. Ltd.Inventors: Kenneth Chenghao Lin, Haoqi Ren
-
Publication number: 20140244986Abstract: A system and method to select a packet format based on a number of executed threads is disclosed. In a particular embodiment, a method includes determining, at a multi-threaded processor, a number of threads of a plurality of threads executing during a time period. A packet format is determined from a plurality of formats based at least in part on the determined number of threads. Data associated with execution of an instruction by a particular thread is stored in accordance with the selected format in a memory (e.g., a buffer).Type: ApplicationFiled: February 26, 2013Publication date: August 28, 2014Applicant: QUALCOMM INCORPORATEDInventors: Prasanna Kumar Balasundaram, Suresh K. Venkumahanti
-
Patent number: 8793474Abstract: A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.Type: GrantFiled: September 20, 2010Date of Patent: July 29, 2014Assignee: International Business Machines CorporationInventors: Giles Roger Frazier, Ronald P. Hall
-
Patent number: 8782382Abstract: In one embodiment, a processor includes an execution unit and at least one last branch record (LBR) register to store address information of a branch taken during program execution. This register may further store a transaction indicator to indicate whether the branch was taken during a transactional memory (TM) transaction. This register may further store an abort indicator to indicate whether the branch was caused by a transaction abort. Other embodiments are described and claimed.Type: GrantFiled: March 6, 2013Date of Patent: July 15, 2014Assignee: Intel CorporationInventors: Ravi Rajwar, Peter Lachner, Laura A. Knauth, Konrad K. Lai
-
Patent number: 8782381Abstract: Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.Type: GrantFiled: April 12, 2012Date of Patent: July 15, 2014Assignee: International Business Machines CorporationInventors: Tong Chen, Brian Flachs, Brad W. Michael, Mark R. Nutter, John K. P. O'Brien, Kathryn M. O'Brien, Tao Zhang
-
Patent number: 8769539Abstract: A method and apparatus are provided to control the order of execution of load and store operations. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create the apparatus. One embodiment of the method includes determining whether a first group, comprising at least one or more instructions, is to be selected from a scheduling queue of a processor for execution using either a first execution mode or a second execution mode. The method also includes, responsive to determining that the first group is to be selected for execution using the second execution mode, preventing selection of the first group until a second group, comprising at least one or more instructions, that entered the scheduling queue prior to the first group is selected for execution.Type: GrantFiled: November 16, 2010Date of Patent: July 1, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Daniel Hopper, Suzanne Plummer, Christopher D. Bryant
-
Publication number: 20140181485Abstract: A data processing apparatus 2 supports speculative execution and the use of sticky bits. A different version of a sticky bit is associated with each segment of the speculative program flow. The segments of the program flow are separated by speculation nodes corresponding to program instructions which may be followed by a plurality of different alternative program instruction serving as the next program instruction. When a speculation node is resolved, then the segments separated by that speculation node are merged and the sticky bit values for those two segments are merged.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Applicant: ARM LIMITEDInventors: Luca SCALABRINO, Cédric Denis Robert Airaud, Guillaume Schon, Frederic Jean Denis Arsanto
-
Patent number: 8751776Abstract: A branch target address table is provided for each branch instruction having a plurality of branch targets. Each branch target address table stores a history of a plurality of branch target addresses determined in the past by executing a corresponding branch instruction. A branch target prediction unit predicts a predicted branch target address with respect to a branch instruction with reference to the history of branch target addresses stored in the branch target address table corresponding to the branch instruction. The predicted branch target address obtained as a result of the prediction is stored, for example, in a predicted branch target address storage unit in association with the branch instruction, and is referenced by an instruction fetch control unit at the time of prefetching a branch target instruction.Type: GrantFiled: June 10, 2013Date of Patent: June 10, 2014Assignee: Fujitsu LimitedInventor: Megumi Ukai
-
Patent number: 8726292Abstract: A system and method for Inter-Thread Communication using software interrupts in a multithread processor are disclosed. Bits in a shared control register and/or a private control register can enable an Inter-Thread Communication path. When the interrupt is triggered, one thread processor raises an interrupt in another thread processor.Type: GrantFiled: August 25, 2005Date of Patent: May 13, 2014Assignee: Broadcom CorporationInventors: Kimming So, Jason Leonard
-
Patent number: 8713293Abstract: An electronic circuit (4000) includes a bias value generator circuit (3900) operable to supply a varying bias value in a programmable range, and an instruction circuit (3625, 4010) responsive to a first instruction to program the range of said bias value generator circuit (3900) and further responsive to a second instruction having an operand to repeatedly issue said second instruction with said operand varied in an operand value range determined as a function of the varying bias value.Type: GrantFiled: September 28, 2011Date of Patent: April 29, 2014Assignee: Texas Instruments IncorporatedInventors: Kenichi Tashiro, Hiroyuki Mizuno, Yuji Umemoto
-
Patent number: 8694760Abstract: A branch prediction mechanism within an information processing device comprises a call stack where function arguments are stacked when function calls are performed. The call stack stores arguments relating to branch instructions within the function. The branch prediction mechanism stores the branch instruction address, the leading value of the call stack, and the branch destination address at branch instruction execution time, which are in correspondence, in a branch result buffer. A branch prediction unit obtains the branch instruction address and leading value of the call stack when notified of branch instruction execution, searches the branch result buffer for a branch destination corresponding to the address and leading value, and predicts the search result as the branch destination of the executed branch instruction. An instruction fetch unit fetches instructions from the branch destination predicted by the branch prediction unit.Type: GrantFiled: May 19, 2010Date of Patent: April 8, 2014Assignee: Panasonic CorporationInventor: Katsushige Amano
-
Patent number: 8694973Abstract: Methods and systems for executing a code stream of non-native binary code on a computing system are disclosed. One method includes parsing the code stream to detect a plurality of elements including one or more branch destinations, and traversing the code stream to detect a plurality of non-native operators. The method also includes executing a pattern matching algorithm against the plurality of non-native operators to find combinations of two or more non-native operators that do not span across a detected branch destination and that correspond to one or more target operators executable by the computing system. The method further includes generating a second code stream executable on the computing system including the one or more target operators.Type: GrantFiled: September 27, 2011Date of Patent: April 8, 2014Assignee: Unisys CorporationInventor: Andrew Ward Beale
-
Publication number: 20140075165Abstract: This disclosure is directed to techniques for executing subroutines in a single instruction, multiple data (SIMD) processing system that is subject to divergent thread conditions. In particular, a resume counter-based approach for managing divergent thread state is described that utilizes program module-specific minimum resume counters (MINRCs) for the efficient processing of control flow instructions. In some examples, the techniques of this disclosure may include using a main program MINRC to control the execution of a main program module and subroutine-specific MINRCs to control the execution of subroutine program modules. Techniques are also described for managing the main program MINRC and subroutine-specific MINRCs when subroutine call and return instructions are executed. Techniques are also described for updating a subroutine-specific MINRC to ensure that the updated MINRC value for the subroutine-specific MINRC is within the program space allocated for the subroutine.Type: ApplicationFiled: September 10, 2012Publication date: March 13, 2014Applicant: QUALCOMM INCORPORATEDInventor: Lin Chen