Simultaneous Parallel Fetching Or Executing Of Both Branch And Fall-through Path Patents (Class 712/235)
  • Patent number: 11409575
    Abstract: The present disclosure provides a computation method and product thereof. The computation method adopts a fusion method to perform machine learning computations. Technical effects of the present disclosure include fewer computations and less power consumption.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: August 9, 2022
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Shaoli Liu, Yuzhe Luo
  • Patent number: 11249758
    Abstract: Establishing a conditional branch frame barrier is described. A conditional branch in a function epilogue is used to provide frame-specific control. The conditional branch evaluates a return condition to determine whether to return from a callee function to a calling function, or to execute a slow path instead. The return condition is evaluated based on a thread local value. The thread local value is set such that returns to potentially unsafe frames in a call stack are prohibited. The prohibition to return to a potentially unsafe frame may be referred to as a “frame barrier.” Additionally, the thread local value may be used to establish safepointing and/or thread local handshakes, both after execution of a function body and after execution of a loop body.
    Type: Grant
    Filed: January 6, 2021
    Date of Patent: February 15, 2022
    Assignee: Oracle International Corporation
    Inventor: Erik Österlund
  • Patent number: 10955900
    Abstract: Examples of techniques for speculation throttling for reliability management are described herein. An aspect includes determining that a power state of a processor is above a speculation throttling threshold. Another aspect includes, based on determining that the power state of the processor is above the speculation throttling threshold, throttling speculation in the processor. Another aspect includes determining that the power state of the processor is above a power proxy threshold, wherein the power proxy threshold is higher than the speculation throttling threshold. Another aspect includes, based on determining that the power state of the processor is above the power proxy threshold, enabling a performance throttle unit of the processor.
    Type: Grant
    Filed: December 4, 2018
    Date of Patent: March 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rahul Rao, Preetham M. Lobo
  • Patent number: 10908913
    Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.
    Type: Grant
    Filed: August 9, 2019
    Date of Patent: February 2, 2021
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 10901743
    Abstract: Systems, methods, and computer-readable media are described for performing speculative execution of both paths/branches of a weakly predicted branch instruction. A branch instruction may be fetched from an instruction queue and determined to be a weakly predicted branch instruction, in which case, both paths of the branch instruction may be dispatched and speculatively executed. When the actual path taken becomes known, instructions corresponding to the path not taken may be flushed. Instructions from both paths of a weakly predicted branch instruction that are speculatively executed may be dispatch and executed in an interleaved manner.
    Type: Grant
    Filed: July 19, 2018
    Date of Patent: January 26, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Ward, Dung Q. Nguyen, Susan E. Eisen, Hung Le
  • Patent number: 10877767
    Abstract: There is provided an apparatus that includes processing circuitry for performing processing operations specified by program instructions and a target register that stores a target program address. A value register stores a data value. There is also provided an architectural register and an instruction decoder that decodes the program instructions to generate control signals to control the processing circuitry to perform the processing operations. The instruction decoder includes branch instruction decoding circuitry that decodes a register restoring branch instruction to cause the processing circuitry to determine whether the target program address and the data value are valid. If the target program address and the data value are both valid then the processing circuitry is caused to branch to the target program address and update the architectural register to store the data value. Otherwise an error action is taken.
    Type: Grant
    Filed: June 15, 2017
    Date of Patent: December 29, 2020
    Assignee: ARM Limited
    Inventors: Alasdair Grant, Edmund Thomas Grimley Evans
  • Patent number: 10838723
    Abstract: Techniques are disclosed relating to speculative writes to special-purpose registers (SPRs). In some embodiments, the disclosed techniques may reduce or avoid system instruction stalls while waiting for SPR writes, which may improve processor performance. In some embodiments, a processor includes a first storage element configured to store a non-speculative value of a special-purpose register and speculative storage circuitry configured to store one or more speculative values of the special-purpose register based on one or more speculatively-performed writes to the special-purpose register. In some embodiments, the processor includes control circuitry configured to: propagate the non-speculative value of the special-purpose register to control other circuitry and provide a youngest speculative value of the special-purpose register in the speculative storage circuitry as a speculative read of the special-purpose register.
    Type: Grant
    Filed: February 27, 2019
    Date of Patent: November 17, 2020
    Assignee: Apple Inc.
    Inventors: Christopher M. Tsay, Conrado Blasco, Deepankar Duggal, Richard F. Russo
  • Patent number: 10817299
    Abstract: A data processing apparatus is provided that includes a plurality of control flow execution circuits to simultaneously execute a first control flow instruction having a first type and a second control flow instruction having a second type from a plurality of instructions. A control flow prediction update circuit updates at most one of: a prediction of the first control flow instruction based on a result of the first control flow instruction, and a prediction of the second control flow instruction based on a result of the second control flow instruction.
    Type: Grant
    Filed: September 7, 2018
    Date of Patent: October 27, 2020
    Assignee: Arm Limited
    Inventors: Yasuo Ishii, Chris Abernathy
  • Patent number: 10802882
    Abstract: A method accelerates memory access in a network using thread progress based arbitration. A memory controller identifies a prioritized thread from multiple threads in an application. The prioritized thread reaches a synchronization barrier after the other threads due to the thread encountering more events than the other threads before reaching the barrier, where the events are from a group consisting of instruction executions, cache misses, and load/store operations in a core. The memory controller detects a cache miss by the prioritized thread during execution of the prioritized thread after the barrier is reached by the multiple threads. The memory controller then retrieves and returns data from the memory that cures the cache miss for the prioritized thread before retrieving data that cures cache misses for the other threads by applying thread progress based arbitration in the network.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: October 13, 2020
    Assignee: International Business Machines Corporation
    Inventors: Su Liu, Jinho Lee, Inseok Hwang, Eric Rozner
  • Patent number: 10698684
    Abstract: Systems, methods, and apparatuses are provided for code injection and code interception in an operating systems having multiple subsystem environments. Code injection into a target process can rely on generation of a virtual process that can permit analysis of information loaded in a memory image of the target process regardless of the host environment in which the target process is executed. Based at least on information collected via the analysis, code can be injected into the target process while preserving integrity of the target process. Code interception also can exploit the analysis for suitable hooking that preserves integrity of target process. Code interception can utilize relocatable tokenized code that can be parameterized through token replacement.
    Type: Grant
    Filed: June 12, 2017
    Date of Patent: June 30, 2020
    Assignee: PEGASYSYTEMS INC.
    Inventor: Stephen M. Beckett
  • Patent number: 10558464
    Abstract: Embodiments include load-balancing a plurality of simultaneous threads of a processor. An example method includes computing a minimum group count for a thread from the plurality of threads. The minimum group count indicates a minimum number of groups of instructions to be assigned to the thread. The method further includes computing a maximum allowed group count for the thread. The maximum allowed group count indicates a maximum number of groups of instructions to be assigned to the thread. The method further includes issuing one or more groups of instructions for execution by the thread based on the minimum group count and the maximum allowed group count for the thread.
    Type: Grant
    Filed: February 9, 2017
    Date of Patent: February 11, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory W. Alexander, Stephen Duffy, David S. Hutton, Christian Jacobi, Anthony Saporito, Somin Song
  • Patent number: 10503471
    Abstract: An electronic device according to some example embodiments includes a clock management circuit configured to control a clock signal and a processor circuit directly connected to the clock management circuit and configured to provide a clock control request for the clock signal to the clock management circuit according to an operation status of the processor circuit.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: December 10, 2019
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Donguk Moon
  • Patent number: 10496413
    Abstract: A processor includes a memory to hold a buffer to store data dependencies comprising nodes and edges for each of a plurality of micro-operations. The nodes include a first node for dispatch, a second node for execution, and a third node for commit. A detector circuit is to queue, in the buffer, the nodes of a micro-operation; add, to determine a node weight for each of the nodes of the micro-operation, an edge weight to a previous node weight of a connected micro-operation that yields a maximum node weight for the node, wherein the node weight comprises a number of execution cycles of an OOO pipeline of the processor and the edge weight comprises a number of execution cycles to execute the connected micro-operation; and identify, as a critical path, a path through the data dependencies that yields the maximum node weight for the micro-operation.
    Type: Grant
    Filed: February 15, 2017
    Date of Patent: December 3, 2019
    Assignee: Intel Corporation
    Inventors: Jayesh Gaur, Pooja Roy, Sreenivas Subramoney, Hong Wang, Ronak Singhal
  • Patent number: 10417000
    Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and sing a front end track table to track both the delayed branch the one branch.
    Type: Grant
    Filed: October 13, 2017
    Date of Patent: September 17, 2019
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 10387159
    Abstract: Methods and apparatuses relate to emulating architectural performance monitoring in a binary translation system. In one embodiment, a processor includes an architectural performance counter to maintain an architectural value associated with instruction execution, a register to store the architectural value of the architectural performance counter, binary translation logic to embed an architectural value from the architectural performance counter into a stream of translated instructions having a transactional code region and to store the architectural value into the register, and an execution unit to execute the transactional code region of the stream of translated instructions. The binary translation logic is configured to add the architectural value from the register to the architectural performance counter upon completion of the transactional code region of the stream of translated instructions.
    Type: Grant
    Filed: February 4, 2015
    Date of Patent: August 20, 2019
    Assignee: Intel Corporation
    Inventors: Jason M Agron, Polychronis Xekalakis, Paul Caprioli, Jiwei Oliver Lu, Koichi Yamada
  • Patent number: 10346172
    Abstract: Embodiments include a technique for caching of perceptron branch patterns using ternary content addressable memory. The technique includes defining a table of perceptrons, each perceptron having a plurality of weights with each weight being associated with a bit location in a history vector, and defining a TCAM, the TCAM having a number of entries, wherein each entry includes a number of bit pairs, the number of bit pairs being equal to a number of weights for each associated perceptron. The technique also includes associating the TCAM with an array of x-bit saturating counters, and performing a branch prediction for a history vector of a given branch, the branch prediction indicating a perceptron prediction. The technique includes determining a most influential bit location in the history vector, the most influential bit location having a greatest weight of an associated perceptron.
    Type: Grant
    Filed: January 13, 2017
    Date of Patent: July 9, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James J. Bonanno, Brian R. Prasky
  • Patent number: 10209994
    Abstract: Provided is a method for predicting a target address using a set of Indirect Target TAgged GEometric (ITTAGE) tables and a target address pattern table. A branch instruction that is to be executed may be identified. A first tag for the branch instruction may be determined. The first tag may be a unique identifier that corresponds to the branch instruction. Using the tag, the branch instruction may be determined to be in a target address pattern table, and an index may be generated. A predicted target address for the branch instruction may be determined using the generated index and the largest ITTAGE table. Instructions associated with the predicted target address may be fetched.
    Type: Grant
    Filed: December 18, 2017
    Date of Patent: February 19, 2019
    Assignee: International Business Machines Corporation
    Inventors: Satish Kumar Sadasivam, Puneeth A. H. Bhat, Shruti Saxena
  • Patent number: 10175983
    Abstract: Exemplary methods, apparatuses, and systems assign a plurality of branch instructions within a computer program to a plurality of prime numbers. Each branch instruction is assigned a unique prime number within the plurality of prime numbers. A run-time branch trace value is determined to be divisible, without a remainder, by a first prime number of the plurality of prime numbers. The run-time branch trace value was generated during execution of the computer program. An output is generated indicating that a first branch instruction assigned to the first prime number was executed.
    Type: Grant
    Filed: September 1, 2016
    Date of Patent: January 8, 2019
    Assignee: VMware, Inc.
    Inventor: Rajiv Madampath
  • Patent number: 9965279
    Abstract: An apparatus for processing data includes first execution circuitry, such as an out-of-order processor, and second execution circuitry, such as an in-order processor. The first execution circuitry is of higher performance but uses more energy than the second execution circuitry. Control circuitry switches between the first execution circuitry being active and the second execution circuitry being active. The control circuitry includes prediction circuitry which is configured to predict a predicted identity of a next sequence of program instructions to be executed in dependence upon a most recently executed sequence of program instructions and then in dependence upon this predicted identity to predict a predicted execution target corresponding to whether the next sequence of program instructions should be executed by the first execution circuitry or the second execution circuitry.
    Type: Grant
    Filed: November 29, 2013
    Date of Patent: May 8, 2018
    Assignee: The Regents of the University of Michigan
    Inventors: Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
  • Patent number: 9952869
    Abstract: A system and method is provided for executing a conditional branch instruction. The system and method may include a branch predictor to predict one or more instructions that depend on the conditional branch instruction and a branch mis-prediction buffer to store correct instructions that were not predicted by the branch predictor during a branch mis-prediction.
    Type: Grant
    Filed: November 4, 2009
    Date of Patent: April 24, 2018
    Assignee: Ceva D.S.P. Ltd.
    Inventors: Jeffrey Allan (Alon) Jacob (Yaakov), Michael Boukaya
  • Patent number: 9910672
    Abstract: A method and load and store buffer for issuing a load instruction to a data cache. The method includes determining whether there are any unresolved store instructions in the store buffer that are older than the load instruction. If there is at least one unresolved store instruction in the store buffer older than the load instruction, it is determined whether the oldest unresolved store instruction in the store buffer is within a speculation window for the load instruction. If the oldest unresolved store instruction is within the speculation window for the load instruction, the load instruction is speculatively issued to the data cache. Otherwise, the load instruction is stalled until any unresolved store instructions outside the speculation window are resolved. The speculation window is a short window that defines a number of instructions or store instructions that immediately precede the load instruction.
    Type: Grant
    Filed: June 15, 2016
    Date of Patent: March 6, 2018
    Assignee: MIPS Tech, LLC
    Inventors: Hugh Jackson, Anand Khot
  • Patent number: 9817666
    Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: November 14, 2017
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 9720880
    Abstract: Embodiments are provided for an asynchronous processor using master and assisted tokens. In an embodiment, an apparatus for an asynchronous processor comprises a memory to cache a plurality of instructions, a feedback engine to decode the instructions from the memory, and a plurality of XUs coupled to the feedback engine and arranged in a token ring architecture. Each one of the XUs is configured to receive an instruction of the instructions form the feedback engine, and receive a master token associated with a resource and further receive an assisted token for the master token. Upon determining that the assisted token and the master token are received in an abnormal order, the XU is configured to detect an operation status for the instruction in association with the assisted token, and upon determining a needed action in accordance with the operation status and the assisted token, perform the needed action.
    Type: Grant
    Filed: September 8, 2014
    Date of Patent: August 1, 2017
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Patent number: 9626188
    Abstract: Embodiments relate to a method and computer program product for relative offset branching in a reduced instruction set computing (RISC) architecture. One aspect is a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A relative offset value is acquired from the instruction stream. The relative offset value is formatted as an offset relative to a program counter value and sized as a multiple of the fixed instruction width. The relative offset value is added with the program counter value to form a branch target address value. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.
    Type: Grant
    Filed: September 5, 2014
    Date of Patent: April 18, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 9619301
    Abstract: A method of operating a multi-core processor. In one embodiment, each processor core is provided with its own private cache and the device comprises or has access to a common memory, and the method comprises executing a processing thread on a selected first processor core, and implementing a normal access mode for executing an operation within a processing thread and comprising allocating sole responsibility for writing data to given blocks of said common memory, to respective processor cores. The method further comprises implementing a speculative execution mode switchable to override said normal access mode. This speculative execution mode comprises, upon identification of said operation within said processing thread, transferring responsibility for performing said operation to a plurality of second processor cores, and optionally performing said operation on the first processor core as well.
    Type: Grant
    Filed: April 5, 2012
    Date of Patent: April 11, 2017
    Assignee: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)
    Inventors: Andras Vajda, Per Stenström
  • Patent number: 9606804
    Abstract: Embodiments relate to a method and computer program product for absolute address branching in a reduced instruction set computing (RISC) architecture. One aspect is a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A branch target address value is acquired from the instruction stream. The branch target address value represents a target address of the branch instruction. The branch target address value is formatted as an absolute address and sized as a multiple of the fixed instruction width. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.
    Type: Grant
    Filed: September 5, 2014
    Date of Patent: March 28, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 9563427
    Abstract: Embodiments relate to a system for relative offset branching in a reduced instruction set computing (RISC) architecture. One aspect is a system that includes memory and a processing circuit communicatively coupled to the memory. The system is configured to perform a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A relative offset value is acquired from the instruction stream. The relative offset value is formatted as an offset relative to a program counter value and sized as a multiple of the fixed instruction width. The relative offset value is added with the program counter value to form a branch target address value. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.
    Type: Grant
    Filed: May 30, 2014
    Date of Patent: February 7, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 9547494
    Abstract: Embodiments relate to a system for absolute address branching in a reduced instruction set computing (RISC) architecture. One aspect is a system that includes memory and a processing circuit communicatively coupled to the memory. The system is configured to perform a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A branch target address value is acquired from the instruction stream. The branch target address value represents a target address of the branch instruction. The branch target address value is formatted as an absolute address and sized as a multiple of the fixed instruction width. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.
    Type: Grant
    Filed: May 30, 2014
    Date of Patent: January 17, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 9501284
    Abstract: A processor includes a mechanism that checks for and flushes only speculative loads and any respective dependent instructions that are younger than an executed wait for event (WEV) instruction, and which also match an address of a store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor. The mechanism may allow speculative loads that do not match the address of any store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: November 22, 2016
    Assignee: Apple Inc.
    Inventors: Pradeep Kanapathipillai, Richard F. Russo, Sandeep Gupta, Conrado Blasco
  • Patent number: 9418224
    Abstract: An IC card stores history information indicating information relating to a command executed for each logical channel in a storage portion and determines the validity of the command based on history information of a logical channel specified by the command stored in the storage portion when the command is supplied from an external device, performs a process corresponding to the command when the validity of the command is determined, and stores information relating to the executed command in the storage portion as history information of the logical channel.
    Type: Grant
    Filed: July 30, 2008
    Date of Patent: August 16, 2016
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventor: Satoshi Sekiya
  • Patent number: 9361110
    Abstract: A method is provided for controlling a pipeline operation of a processor. The processor is coupled to a memory containing executable computer instructions. The method includes determining a branch instruction to be executed by the processor, and providing both an address of a branch target instruction of the branch instruction and an address of a next instruction following the branch instruction in a program sequence. The method also includes determining a branch decision with respect to the branch instruction based on at least the address of the branch target instruction provided, and selecting at least one of the branch target instruction and the next instruction as a proper instruction to be executed by an execution unit of the processor, based on the branch decision and before the branch instruction is executed by the execution unit, such that the pipeline operation is not stalled whether or not a branch is taken with respect to the branch instruction.
    Type: Grant
    Filed: December 31, 2010
    Date of Patent: June 7, 2016
    Inventor: Kenneth Chenghao Lin
  • Patent number: 9361150
    Abstract: Only a particular number of applications on a computing device are active at any given time, with applications that are not active being suspended. A policy is applied to determine when an application is to be suspended. However, an operating system component can have a particular application be exempted from being suspended (e.g., due to an operation being performed by the application). Additionally, an operating system component can have an application that has been suspended resumed (e.g., due to a desire of another application to communicate with the suspended application).
    Type: Grant
    Filed: September 30, 2013
    Date of Patent: June 7, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Benjamin S. Srour, Michael H. Krause, Richard K. Neves, Arun U. Kishan, Hari Pulapaka, David B. Probert, Zinaida A. Pozen
  • Patent number: 9342480
    Abstract: An apparatus and method for generating a very long instruction word (VLIW) command that supports predicated execution, and a VLIW processor and method for processing a VLIW are provided herein. The VLIW command includes an instruction bundle formed of a plurality of instructions to be executed in parallel and a single value indicating predicated execution, and is generated using the apparatus and method for generating a VLIW command. The VLIW processor decodes the instruction bundle and executes the instructions, which are included in the decoded instruction bundle, in parallel, according to the value indicating predicated execution.
    Type: Grant
    Filed: October 28, 2013
    Date of Patent: May 17, 2016
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Bernhard Egger, Soo-jung Ryu, Dong-hoon Yoo, Il-hyun Park
  • Patent number: 9304898
    Abstract: Technologies are generally described herein for compressing an array using hardware-based compression and performing various instructions on the compressed array. Some example technologies may receive an instruction adapted to access an address in an array. The technologies may determine whether address is compressible. If the address is compressible, then the technologies may determine a compressed address of a compressed array based on the address. The compressed array may represent a compressed layout of the array where a reduced size of each compressed element in the compressed array is smaller than an original size of each element in the array. The technologies may access the compressed array at the compressed address in accordance with the instruction.
    Type: Grant
    Filed: August 30, 2011
    Date of Patent: April 5, 2016
    Assignee: Empire Technology Development LLC
    Inventor: Yan Solihin
  • Patent number: 9292470
    Abstract: A microprocessor includes hardware registers that instantiate the Intel 64 Architecture R8-R15 GPRs. The microprocessor associates with each of the R8-R15 GPRs a respective unique MSR address. The microprocessor also includes hardware registers that instantiate the ARM Architecture GPRs. In response to an ARM MRRC instruction that specifies the respective unique MSR address of one of the R8-R15 GPRs, the microprocessor reads the contents of the hardware register that instantiates the specified one of the R8-R15 GPRs into the hardware registers that instantiate two of the ARM GPRs registers. In response to an ARM MCRR instruction that specifies the respective unique MSR address of one of the R8-R15 GPRs, the microprocessor writes into the hardware register that instantiates the specified one of the R8-R15 GPRs the contents of the hardware registers that instantiate two of the ARM Architecture GPRs registers. The hardware registers may be shared by the two Architectures.
    Type: Grant
    Filed: May 1, 2013
    Date of Patent: March 22, 2016
    Assignee: VIA Technologies, Inc.
    Inventor: Mark John Ebersole
  • Patent number: 9286137
    Abstract: Systems and methods may provide for detecting a time critical code section associated with a real time processor core and suspending execution on a suspendable processor core in response to the time critical code section. Additionally, execution on the suspendable core may be resumed when the real time processor core reaches the end of the time critical code section. In one example, execution is suspended by issuing an inter-processor interrupt (IPI) from the real time core to the suspendable core, wherein execution may be resumed when the real time core conducts a write to a memory location that is monitored by the suspendable core during suspension of execution.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: March 15, 2016
    Assignee: Intel Corporation
    Inventors: Ian Betts, Alexander Komarov, Anton Langebner
  • Patent number: 9275002
    Abstract: The present invention relates to a processor which comprises processing elements that execute instructions in parallel and are connected together with point-to-point communication links called data communication links (DCL). The instructions use DCLs to communicate data between them. In order to realize those communications, they specify the DCLs from which they take their operands, and the DCLs to which they write their results. The DCLs allow the instructions to synchronize their executions and to explicitly manage the data they manipulate. Communications are explicit and are used to realize the storage of temporary variables, which is decoupled from the storage of long-living variables.
    Type: Grant
    Filed: January 31, 2011
    Date of Patent: March 1, 2016
    Inventors: Philippe Manet, Bertrand Rousseau
  • Patent number: 9207958
    Abstract: In one general aspect, a system includes an abstract machine instruction stream, a virtual machine coprocessor configured to receive an instruction from the abstract machine instruction stream and to generate one or more native machine instructions in response to the received instruction, and a processor coupled to the virtual machine coprocessor and operable to execute the native machine instructions generated by the virtual machine coprocessor. The virtual machine coprocessor is operable to generate one or more native machine instructions to explicitly control the virtual machine coprocessor.
    Type: Grant
    Filed: August 8, 2003
    Date of Patent: December 8, 2015
    Assignee: ARM FINANCE OVERSEAS LIMITED
    Inventor: Kevin D. Kissell
  • Patent number: 9135215
    Abstract: Communicating among nodes in a network includes: sending a packet from an origin node to a destination node over a route including plural nodes. At each node in the route, routing of the packet is initiated according to a predicted path concurrently with verifying the correctness of the predicted path based on analyzing route information in the packet. In response to results of verifying the correctness of the predicted path, the routing of the packet is completed according to the predicted path or initiating a routing of the packet according to an actual path based on the route information in the packet.
    Type: Grant
    Filed: September 20, 2010
    Date of Patent: September 15, 2015
    Assignee: Tilera Corporation
    Inventors: Ian Rudolf Bratt, Carl G. Ramey, Matthew Mattina
  • Patent number: 8959319
    Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: February 17, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
  • Patent number: 8843928
    Abstract: A method and system of efficient use and programming of a multi-processing core device. The system includes a programming construct that is based on stream-domain code. A programmable core based computing device is disclosed. The computing device includes a plurality of processing cores coupled to each other. A memory stores stream-domain code including a stream defining a stream destination module and a stream source module. The stream source module places data values in the stream and the stream conveys data values from the stream source module to the stream destination module. A runtime system detects when the data values are available to the stream destination module and schedules the stream destination module for execution on one of the plurality of processing cores.
    Type: Grant
    Filed: January 21, 2011
    Date of Patent: September 23, 2014
    Assignee: QST Holdings, LLC
    Inventors: Paul Master, Frederick Furtek
  • Patent number: 8812826
    Abstract: In one implementation, processor testing may include the ability to randomly generate a first plurality of branch instructions for a first portion of an instruction set, each branch instruction in the first portion branching to a respective instruction in a second portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. Processor testing may also include the ability to randomly generate a second plurality of branch instructions for the second portion of the instruction set, each branch instruction in the second portion branching to a respective instruction in the first portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. Processor testing may additionally include the ability to generate a plurality of instructions to increment a counter when each branch instruction is encountered during execution.
    Type: Grant
    Filed: October 20, 2010
    Date of Patent: August 19, 2014
    Assignee: International Business Machines Corporation
    Inventors: Abhishek Bansal, Nitin Gupta, Brad L. Herold, Jayakumar N Sankarannair
  • Publication number: 20130346731
    Abstract: Instructions are tracked in a processor. A completion unit in the processor receives an instruction group to add to a table to form a received instruction group. In response to receiving the received instruction group, the completion unit determines whether an entry is present that contains a previously stored instruction group in a first location and has space for storing the received instruction group. In response to the entry being present, the completion unit stores the received instruction group in a second location in the entry to form a stored instruction group.
    Type: Application
    Filed: August 26, 2013
    Publication date: December 26, 2013
    Applicant: International Business Machines Corporation
    Inventors: Christopher M. Abernathy, Hung Q. Le, Dung Q. Nguyen, Benjamin W. Stolt
  • Patent number: 8601177
    Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.
    Type: Grant
    Filed: June 27, 2012
    Date of Patent: December 3, 2013
    Assignee: Intel Corporation
    Inventor: Thomas A. Piazza
  • Patent number: 8572355
    Abstract: One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.
    Type: Grant
    Filed: September 13, 2010
    Date of Patent: October 29, 2013
    Assignee: Nvidia Corporation
    Inventors: Guillermo Juan Rozas, Brett W. Coon
  • Patent number: 8539501
    Abstract: Processes requiring access to shared resources are adapted to issue a reservation request, such that a place in a resource access queue, such as one administered by means of a semaphore system, can be reserved for the process. The reservation is issued by a Reservation Management module at a time calculated to ensure that the reservation reaches the head of the queue as closely as possible to the moment at which the process actually needs access to the resource. The calculation may be made on the basis of priority information concerning the process itself, and statistical information gathered concerning historical performance of the queue.
    Type: Grant
    Filed: March 6, 2012
    Date of Patent: September 17, 2013
    Assignee: International Business Machines Corporation
    Inventors: Chiara Conti, Mariella Corbacio, Giuseppe Longobardi, Alessandra Masci, Enrico Nocerini, Pia Toro
  • Patent number: 8539211
    Abstract: A multi-threaded processor comprises a processing unit (PU) for concurrently processing multiple threads. A register file means (RF) is provided having a plurality of registers, wherein a first register (LI) is used for storing loop invariant values and N second registers (LVI-LVN) are each used for storing loop variant values. Furthermore N program counters (PCI-PCN) are provided each being associated to one of the multiple threads, wherein N being the number of threads being processed.
    Type: Grant
    Filed: January 17, 2006
    Date of Patent: September 17, 2013
    Assignee: Nytell Software LLC
    Inventor: Jan Hoogerbrugge
  • Patent number: 8528000
    Abstract: The execution environment provides for scalability where components will execute in parallel and exploit various patterns of parallelism. Dataflow applications are represented by reusable dataflow graphs called map components, while the executable version is called a prepared map. Using runtime properties the prepared map is executed in parallel with a thread allocated to each map process. The execution environment not only monitors threads, detects and corrects deadlocks, logs and controls program exceptions, but also data input and output ports of the map components are processed in parallel to take advantage of data partitioning schemes. Port implementation supports multi-state null value tokens to more accurately report exceptions. Data tokens are batched to minimize synchronization and transportation overhead and thread contention.
    Type: Grant
    Filed: May 6, 2010
    Date of Patent: September 3, 2013
    Assignee: Pervasive Software, Inc.
    Inventors: Larry Lee Schumacher, Agustin Gonzales-Tuchmann, Laurence Tobin Yogman, Paul C. Dingman
  • Patent number: 8522250
    Abstract: Processes requiring access to shared resources are adapted to issue a reservation request, such that a place in a resource access queue, such as one administered by means of a semaphore system, can be reserved for the process. The reservation is issued by a Reservation Management module at a time calculated to ensure that the reservation reaches the head of the queue as closely as possible to the moment at which the process actually needs access to the resource. The calculation may be made on the basis of priority information concerning the process itself, and statistical information gathered concerning historical performance of the queue.
    Type: Grant
    Filed: August 17, 2011
    Date of Patent: August 27, 2013
    Assignee: International Business Machines Corporation
    Inventors: Chiara Conti, Mariella Corbacio, Giuseppe Longobardi, Alessandra Masci, Enrico Nocerini, Pia Toro
  • Patent number: 8521998
    Abstract: A method and apparatus for tracking instructions in a processor. A completion unit in the processor receives an instruction group to add to a table to form a received instruction group. In response to receiving the received instruction group, the completion unit determines whether an entry is present that contains a previously stored instruction group in a first location and has space for storing the received instruction group. In response to the entry being present, the completion unit stores the received instruction group in a second location in the entry to form a stored instruction group.
    Type: Grant
    Filed: June 4, 2010
    Date of Patent: August 27, 2013
    Assignee: International Business Machines Corporation
    Inventors: Christopher Michael Abernathy, Hung Qui Le, Dung Quoc Nguyen, Benjamin Walter Stolt