Simultaneous Parallel Fetching Or Executing Of Both Branch And Fall-through Path Patents (Class 712/235)
-
Patent number: 12111913Abstract: A secure processor with fault detection has a core thread which executes with a redundant branch processor thread. In one configuration, the core thread is operative on a fully functional core processor configured to execute a complete instruction set, and the redundant branch processor thread contains only initialization instructions and flow control instructions such as branch instructions and is operative on a redundant branch processor which is configured to execute a subset of the complete instruction set, specifically a branch control variable initialization and a branch instruction, thereby greatly simplifying the redundant branch processor architecture. Fault conditions are detected by comparing either a history of branch taken/not taken and branch targets, or a comparison of program counter activity for the core thread and redundant branch processor thread.Type: GrantFiled: September 26, 2021Date of Patent: October 8, 2024Assignee: Ceremorphic, Inc.Inventors: Lizy Kurian John, Heonchul Park, Venkat Mattela
-
Patent number: 11921843Abstract: A fault detecting multi-thread pipeline processor with fault detection is operative with a single pipeline stage which generates branch status comprising at least one of branch taken/not_taken, branch direction, and branch target. A first thread has control and data instructions, the control instructions comprising loop instructions including unconditional and conditional branch instructions, loop initialization instructions, loop arithmetic instructions, and no operation (NOP) instructions. A second thread has only control instructions and either has the non-control instructions replaced with NOP instructions, or removed entirely. A fault detector compares the branch status of the first thread and second thread and asserts a fault output when they do not match.Type: GrantFiled: September 26, 2021Date of Patent: March 5, 2024Assignee: Ceremorphic, Inc.Inventors: Lizy Kurian John, Heonchul Park, Venkat Mattela
-
Patent number: 11409575Abstract: The present disclosure provides a computation method and product thereof. The computation method adopts a fusion method to perform machine learning computations. Technical effects of the present disclosure include fewer computations and less power consumption.Type: GrantFiled: December 18, 2019Date of Patent: August 9, 2022Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTDInventors: Shaoli Liu, Yuzhe Luo
-
Patent number: 11249758Abstract: Establishing a conditional branch frame barrier is described. A conditional branch in a function epilogue is used to provide frame-specific control. The conditional branch evaluates a return condition to determine whether to return from a callee function to a calling function, or to execute a slow path instead. The return condition is evaluated based on a thread local value. The thread local value is set such that returns to potentially unsafe frames in a call stack are prohibited. The prohibition to return to a potentially unsafe frame may be referred to as a “frame barrier.” Additionally, the thread local value may be used to establish safepointing and/or thread local handshakes, both after execution of a function body and after execution of a loop body.Type: GrantFiled: January 6, 2021Date of Patent: February 15, 2022Assignee: Oracle International CorporationInventor: Erik Österlund
-
Patent number: 10955900Abstract: Examples of techniques for speculation throttling for reliability management are described herein. An aspect includes determining that a power state of a processor is above a speculation throttling threshold. Another aspect includes, based on determining that the power state of the processor is above the speculation throttling threshold, throttling speculation in the processor. Another aspect includes determining that the power state of the processor is above a power proxy threshold, wherein the power proxy threshold is higher than the speculation throttling threshold. Another aspect includes, based on determining that the power state of the processor is above the power proxy threshold, enabling a performance throttle unit of the processor.Type: GrantFiled: December 4, 2018Date of Patent: March 23, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rahul Rao, Preetham M. Lobo
-
Patent number: 10908913Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.Type: GrantFiled: August 9, 2019Date of Patent: February 2, 2021Assignee: Intel CorporationInventor: Mohammad Abdallah
-
Patent number: 10901743Abstract: Systems, methods, and computer-readable media are described for performing speculative execution of both paths/branches of a weakly predicted branch instruction. A branch instruction may be fetched from an instruction queue and determined to be a weakly predicted branch instruction, in which case, both paths of the branch instruction may be dispatched and speculatively executed. When the actual path taken becomes known, instructions corresponding to the path not taken may be flushed. Instructions from both paths of a weakly predicted branch instruction that are speculatively executed may be dispatch and executed in an interleaved manner.Type: GrantFiled: July 19, 2018Date of Patent: January 26, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Ward, Dung Q. Nguyen, Susan E. Eisen, Hung Le
-
Patent number: 10877767Abstract: There is provided an apparatus that includes processing circuitry for performing processing operations specified by program instructions and a target register that stores a target program address. A value register stores a data value. There is also provided an architectural register and an instruction decoder that decodes the program instructions to generate control signals to control the processing circuitry to perform the processing operations. The instruction decoder includes branch instruction decoding circuitry that decodes a register restoring branch instruction to cause the processing circuitry to determine whether the target program address and the data value are valid. If the target program address and the data value are both valid then the processing circuitry is caused to branch to the target program address and update the architectural register to store the data value. Otherwise an error action is taken.Type: GrantFiled: June 15, 2017Date of Patent: December 29, 2020Assignee: ARM LimitedInventors: Alasdair Grant, Edmund Thomas Grimley Evans
-
Patent number: 10838723Abstract: Techniques are disclosed relating to speculative writes to special-purpose registers (SPRs). In some embodiments, the disclosed techniques may reduce or avoid system instruction stalls while waiting for SPR writes, which may improve processor performance. In some embodiments, a processor includes a first storage element configured to store a non-speculative value of a special-purpose register and speculative storage circuitry configured to store one or more speculative values of the special-purpose register based on one or more speculatively-performed writes to the special-purpose register. In some embodiments, the processor includes control circuitry configured to: propagate the non-speculative value of the special-purpose register to control other circuitry and provide a youngest speculative value of the special-purpose register in the speculative storage circuitry as a speculative read of the special-purpose register.Type: GrantFiled: February 27, 2019Date of Patent: November 17, 2020Assignee: Apple Inc.Inventors: Christopher M. Tsay, Conrado Blasco, Deepankar Duggal, Richard F. Russo
-
Patent number: 10817299Abstract: A data processing apparatus is provided that includes a plurality of control flow execution circuits to simultaneously execute a first control flow instruction having a first type and a second control flow instruction having a second type from a plurality of instructions. A control flow prediction update circuit updates at most one of: a prediction of the first control flow instruction based on a result of the first control flow instruction, and a prediction of the second control flow instruction based on a result of the second control flow instruction.Type: GrantFiled: September 7, 2018Date of Patent: October 27, 2020Assignee: Arm LimitedInventors: Yasuo Ishii, Chris Abernathy
-
Patent number: 10802882Abstract: A method accelerates memory access in a network using thread progress based arbitration. A memory controller identifies a prioritized thread from multiple threads in an application. The prioritized thread reaches a synchronization barrier after the other threads due to the thread encountering more events than the other threads before reaching the barrier, where the events are from a group consisting of instruction executions, cache misses, and load/store operations in a core. The memory controller detects a cache miss by the prioritized thread during execution of the prioritized thread after the barrier is reached by the multiple threads. The memory controller then retrieves and returns data from the memory that cures the cache miss for the prioritized thread before retrieving data that cures cache misses for the other threads by applying thread progress based arbitration in the network.Type: GrantFiled: December 13, 2018Date of Patent: October 13, 2020Assignee: International Business Machines CorporationInventors: Su Liu, Jinho Lee, Inseok Hwang, Eric Rozner
-
Patent number: 10698684Abstract: Systems, methods, and apparatuses are provided for code injection and code interception in an operating systems having multiple subsystem environments. Code injection into a target process can rely on generation of a virtual process that can permit analysis of information loaded in a memory image of the target process regardless of the host environment in which the target process is executed. Based at least on information collected via the analysis, code can be injected into the target process while preserving integrity of the target process. Code interception also can exploit the analysis for suitable hooking that preserves integrity of target process. Code interception can utilize relocatable tokenized code that can be parameterized through token replacement.Type: GrantFiled: June 12, 2017Date of Patent: June 30, 2020Assignee: PEGASYSYTEMS INC.Inventor: Stephen M. Beckett
-
Patent number: 10558464Abstract: Embodiments include load-balancing a plurality of simultaneous threads of a processor. An example method includes computing a minimum group count for a thread from the plurality of threads. The minimum group count indicates a minimum number of groups of instructions to be assigned to the thread. The method further includes computing a maximum allowed group count for the thread. The maximum allowed group count indicates a maximum number of groups of instructions to be assigned to the thread. The method further includes issuing one or more groups of instructions for execution by the thread based on the minimum group count and the maximum allowed group count for the thread.Type: GrantFiled: February 9, 2017Date of Patent: February 11, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gregory W. Alexander, Stephen Duffy, David S. Hutton, Christian Jacobi, Anthony Saporito, Somin Song
-
Patent number: 10503471Abstract: An electronic device according to some example embodiments includes a clock management circuit configured to control a clock signal and a processor circuit directly connected to the clock management circuit and configured to provide a clock control request for the clock signal to the clock management circuit according to an operation status of the processor circuit.Type: GrantFiled: June 9, 2017Date of Patent: December 10, 2019Assignee: Samsung Electronics Co., Ltd.Inventor: Donguk Moon
-
Patent number: 10496413Abstract: A processor includes a memory to hold a buffer to store data dependencies comprising nodes and edges for each of a plurality of micro-operations. The nodes include a first node for dispatch, a second node for execution, and a third node for commit. A detector circuit is to queue, in the buffer, the nodes of a micro-operation; add, to determine a node weight for each of the nodes of the micro-operation, an edge weight to a previous node weight of a connected micro-operation that yields a maximum node weight for the node, wherein the node weight comprises a number of execution cycles of an OOO pipeline of the processor and the edge weight comprises a number of execution cycles to execute the connected micro-operation; and identify, as a critical path, a path through the data dependencies that yields the maximum node weight for the micro-operation.Type: GrantFiled: February 15, 2017Date of Patent: December 3, 2019Assignee: Intel CorporationInventors: Jayesh Gaur, Pooja Roy, Sreenivas Subramoney, Hong Wang, Ronak Singhal
-
Patent number: 10417000Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and sing a front end track table to track both the delayed branch the one branch.Type: GrantFiled: October 13, 2017Date of Patent: September 17, 2019Assignee: Intel CorporationInventor: Mohammad Abdallah
-
Patent number: 10387159Abstract: Methods and apparatuses relate to emulating architectural performance monitoring in a binary translation system. In one embodiment, a processor includes an architectural performance counter to maintain an architectural value associated with instruction execution, a register to store the architectural value of the architectural performance counter, binary translation logic to embed an architectural value from the architectural performance counter into a stream of translated instructions having a transactional code region and to store the architectural value into the register, and an execution unit to execute the transactional code region of the stream of translated instructions. The binary translation logic is configured to add the architectural value from the register to the architectural performance counter upon completion of the transactional code region of the stream of translated instructions.Type: GrantFiled: February 4, 2015Date of Patent: August 20, 2019Assignee: Intel CorporationInventors: Jason M Agron, Polychronis Xekalakis, Paul Caprioli, Jiwei Oliver Lu, Koichi Yamada
-
Patent number: 10346172Abstract: Embodiments include a technique for caching of perceptron branch patterns using ternary content addressable memory. The technique includes defining a table of perceptrons, each perceptron having a plurality of weights with each weight being associated with a bit location in a history vector, and defining a TCAM, the TCAM having a number of entries, wherein each entry includes a number of bit pairs, the number of bit pairs being equal to a number of weights for each associated perceptron. The technique also includes associating the TCAM with an array of x-bit saturating counters, and performing a branch prediction for a history vector of a given branch, the branch prediction indicating a perceptron prediction. The technique includes determining a most influential bit location in the history vector, the most influential bit location having a greatest weight of an associated perceptron.Type: GrantFiled: January 13, 2017Date of Patent: July 9, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: James J. Bonanno, Brian R. Prasky
-
Patent number: 10209994Abstract: Provided is a method for predicting a target address using a set of Indirect Target TAgged GEometric (ITTAGE) tables and a target address pattern table. A branch instruction that is to be executed may be identified. A first tag for the branch instruction may be determined. The first tag may be a unique identifier that corresponds to the branch instruction. Using the tag, the branch instruction may be determined to be in a target address pattern table, and an index may be generated. A predicted target address for the branch instruction may be determined using the generated index and the largest ITTAGE table. Instructions associated with the predicted target address may be fetched.Type: GrantFiled: December 18, 2017Date of Patent: February 19, 2019Assignee: International Business Machines CorporationInventors: Satish Kumar Sadasivam, Puneeth A. H. Bhat, Shruti Saxena
-
Patent number: 10175983Abstract: Exemplary methods, apparatuses, and systems assign a plurality of branch instructions within a computer program to a plurality of prime numbers. Each branch instruction is assigned a unique prime number within the plurality of prime numbers. A run-time branch trace value is determined to be divisible, without a remainder, by a first prime number of the plurality of prime numbers. The run-time branch trace value was generated during execution of the computer program. An output is generated indicating that a first branch instruction assigned to the first prime number was executed.Type: GrantFiled: September 1, 2016Date of Patent: January 8, 2019Assignee: VMware, Inc.Inventor: Rajiv Madampath
-
Patent number: 9965279Abstract: An apparatus for processing data includes first execution circuitry, such as an out-of-order processor, and second execution circuitry, such as an in-order processor. The first execution circuitry is of higher performance but uses more energy than the second execution circuitry. Control circuitry switches between the first execution circuitry being active and the second execution circuitry being active. The control circuitry includes prediction circuitry which is configured to predict a predicted identity of a next sequence of program instructions to be executed in dependence upon a most recently executed sequence of program instructions and then in dependence upon this predicted identity to predict a predicted execution target corresponding to whether the next sequence of program instructions should be executed by the first execution circuitry or the second execution circuitry.Type: GrantFiled: November 29, 2013Date of Patent: May 8, 2018Assignee: The Regents of the University of MichiganInventors: Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
-
Patent number: 9952869Abstract: A system and method is provided for executing a conditional branch instruction. The system and method may include a branch predictor to predict one or more instructions that depend on the conditional branch instruction and a branch mis-prediction buffer to store correct instructions that were not predicted by the branch predictor during a branch mis-prediction.Type: GrantFiled: November 4, 2009Date of Patent: April 24, 2018Assignee: Ceva D.S.P. Ltd.Inventors: Jeffrey Allan (Alon) Jacob (Yaakov), Michael Boukaya
-
Patent number: 9910672Abstract: A method and load and store buffer for issuing a load instruction to a data cache. The method includes determining whether there are any unresolved store instructions in the store buffer that are older than the load instruction. If there is at least one unresolved store instruction in the store buffer older than the load instruction, it is determined whether the oldest unresolved store instruction in the store buffer is within a speculation window for the load instruction. If the oldest unresolved store instruction is within the speculation window for the load instruction, the load instruction is speculatively issued to the data cache. Otherwise, the load instruction is stalled until any unresolved store instructions outside the speculation window are resolved. The speculation window is a short window that defines a number of instructions or store instructions that immediately precede the load instruction.Type: GrantFiled: June 15, 2016Date of Patent: March 6, 2018Assignee: MIPS Tech, LLCInventors: Hugh Jackson, Anand Khot
-
Patent number: 9817666Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.Type: GrantFiled: March 17, 2014Date of Patent: November 14, 2017Assignee: Intel CorporationInventor: Mohammad Abdallah
-
Patent number: 9720880Abstract: Embodiments are provided for an asynchronous processor using master and assisted tokens. In an embodiment, an apparatus for an asynchronous processor comprises a memory to cache a plurality of instructions, a feedback engine to decode the instructions from the memory, and a plurality of XUs coupled to the feedback engine and arranged in a token ring architecture. Each one of the XUs is configured to receive an instruction of the instructions form the feedback engine, and receive a master token associated with a resource and further receive an assisted token for the master token. Upon determining that the assisted token and the master token are received in an abnormal order, the XU is configured to detect an operation status for the instruction in association with the assisted token, and upon determining a needed action in accordance with the operation status and the assisted token, perform the needed action.Type: GrantFiled: September 8, 2014Date of Patent: August 1, 2017Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
-
Patent number: 9626188Abstract: Embodiments relate to a method and computer program product for relative offset branching in a reduced instruction set computing (RISC) architecture. One aspect is a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A relative offset value is acquired from the instruction stream. The relative offset value is formatted as an offset relative to a program counter value and sized as a multiple of the fixed instruction width. The relative offset value is added with the program counter value to form a branch target address value. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.Type: GrantFiled: September 5, 2014Date of Patent: April 18, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9619301Abstract: A method of operating a multi-core processor. In one embodiment, each processor core is provided with its own private cache and the device comprises or has access to a common memory, and the method comprises executing a processing thread on a selected first processor core, and implementing a normal access mode for executing an operation within a processing thread and comprising allocating sole responsibility for writing data to given blocks of said common memory, to respective processor cores. The method further comprises implementing a speculative execution mode switchable to override said normal access mode. This speculative execution mode comprises, upon identification of said operation within said processing thread, transferring responsibility for performing said operation to a plurality of second processor cores, and optionally performing said operation on the first processor core as well.Type: GrantFiled: April 5, 2012Date of Patent: April 11, 2017Assignee: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)Inventors: Andras Vajda, Per Stenström
-
Patent number: 9606804Abstract: Embodiments relate to a method and computer program product for absolute address branching in a reduced instruction set computing (RISC) architecture. One aspect is a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A branch target address value is acquired from the instruction stream. The branch target address value represents a target address of the branch instruction. The branch target address value is formatted as an absolute address and sized as a multiple of the fixed instruction width. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.Type: GrantFiled: September 5, 2014Date of Patent: March 28, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9563427Abstract: Embodiments relate to a system for relative offset branching in a reduced instruction set computing (RISC) architecture. One aspect is a system that includes memory and a processing circuit communicatively coupled to the memory. The system is configured to perform a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A relative offset value is acquired from the instruction stream. The relative offset value is formatted as an offset relative to a program counter value and sized as a multiple of the fixed instruction width. The relative offset value is added with the program counter value to form a branch target address value. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.Type: GrantFiled: May 30, 2014Date of Patent: February 7, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9547494Abstract: Embodiments relate to a system for absolute address branching in a reduced instruction set computing (RISC) architecture. One aspect is a system that includes memory and a processing circuit communicatively coupled to the memory. The system is configured to perform a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A branch target address value is acquired from the instruction stream. The branch target address value represents a target address of the branch instruction. The branch target address value is formatted as an absolute address and sized as a multiple of the fixed instruction width. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.Type: GrantFiled: May 30, 2014Date of Patent: January 17, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9501284Abstract: A processor includes a mechanism that checks for and flushes only speculative loads and any respective dependent instructions that are younger than an executed wait for event (WEV) instruction, and which also match an address of a store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor. The mechanism may allow speculative loads that do not match the address of any store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor.Type: GrantFiled: September 30, 2014Date of Patent: November 22, 2016Assignee: Apple Inc.Inventors: Pradeep Kanapathipillai, Richard F. Russo, Sandeep Gupta, Conrado Blasco
-
Patent number: 9418224Abstract: An IC card stores history information indicating information relating to a command executed for each logical channel in a storage portion and determines the validity of the command based on history information of a logical channel specified by the command stored in the storage portion when the command is supplied from an external device, performs a process corresponding to the command when the validity of the command is determined, and stores information relating to the executed command in the storage portion as history information of the logical channel.Type: GrantFiled: July 30, 2008Date of Patent: August 16, 2016Assignee: KABUSHIKI KAISHA TOSHIBAInventor: Satoshi Sekiya
-
Patent number: 9361150Abstract: Only a particular number of applications on a computing device are active at any given time, with applications that are not active being suspended. A policy is applied to determine when an application is to be suspended. However, an operating system component can have a particular application be exempted from being suspended (e.g., due to an operation being performed by the application). Additionally, an operating system component can have an application that has been suspended resumed (e.g., due to a desire of another application to communicate with the suspended application).Type: GrantFiled: September 30, 2013Date of Patent: June 7, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Benjamin S. Srour, Michael H. Krause, Richard K. Neves, Arun U. Kishan, Hari Pulapaka, David B. Probert, Zinaida A. Pozen
-
Patent number: 9361110Abstract: A method is provided for controlling a pipeline operation of a processor. The processor is coupled to a memory containing executable computer instructions. The method includes determining a branch instruction to be executed by the processor, and providing both an address of a branch target instruction of the branch instruction and an address of a next instruction following the branch instruction in a program sequence. The method also includes determining a branch decision with respect to the branch instruction based on at least the address of the branch target instruction provided, and selecting at least one of the branch target instruction and the next instruction as a proper instruction to be executed by an execution unit of the processor, based on the branch decision and before the branch instruction is executed by the execution unit, such that the pipeline operation is not stalled whether or not a branch is taken with respect to the branch instruction.Type: GrantFiled: December 31, 2010Date of Patent: June 7, 2016Inventor: Kenneth Chenghao Lin
-
Patent number: 9342480Abstract: An apparatus and method for generating a very long instruction word (VLIW) command that supports predicated execution, and a VLIW processor and method for processing a VLIW are provided herein. The VLIW command includes an instruction bundle formed of a plurality of instructions to be executed in parallel and a single value indicating predicated execution, and is generated using the apparatus and method for generating a VLIW command. The VLIW processor decodes the instruction bundle and executes the instructions, which are included in the decoded instruction bundle, in parallel, according to the value indicating predicated execution.Type: GrantFiled: October 28, 2013Date of Patent: May 17, 2016Assignee: Samsung Electronics Co., Ltd.Inventors: Bernhard Egger, Soo-jung Ryu, Dong-hoon Yoo, Il-hyun Park
-
Patent number: 9304898Abstract: Technologies are generally described herein for compressing an array using hardware-based compression and performing various instructions on the compressed array. Some example technologies may receive an instruction adapted to access an address in an array. The technologies may determine whether address is compressible. If the address is compressible, then the technologies may determine a compressed address of a compressed array based on the address. The compressed array may represent a compressed layout of the array where a reduced size of each compressed element in the compressed array is smaller than an original size of each element in the array. The technologies may access the compressed array at the compressed address in accordance with the instruction.Type: GrantFiled: August 30, 2011Date of Patent: April 5, 2016Assignee: Empire Technology Development LLCInventor: Yan Solihin
-
Patent number: 9292470Abstract: A microprocessor includes hardware registers that instantiate the Intel 64 Architecture R8-R15 GPRs. The microprocessor associates with each of the R8-R15 GPRs a respective unique MSR address. The microprocessor also includes hardware registers that instantiate the ARM Architecture GPRs. In response to an ARM MRRC instruction that specifies the respective unique MSR address of one of the R8-R15 GPRs, the microprocessor reads the contents of the hardware register that instantiates the specified one of the R8-R15 GPRs into the hardware registers that instantiate two of the ARM GPRs registers. In response to an ARM MCRR instruction that specifies the respective unique MSR address of one of the R8-R15 GPRs, the microprocessor writes into the hardware register that instantiates the specified one of the R8-R15 GPRs the contents of the hardware registers that instantiate two of the ARM Architecture GPRs registers. The hardware registers may be shared by the two Architectures.Type: GrantFiled: May 1, 2013Date of Patent: March 22, 2016Assignee: VIA Technologies, Inc.Inventor: Mark John Ebersole
-
Patent number: 9286137Abstract: Systems and methods may provide for detecting a time critical code section associated with a real time processor core and suspending execution on a suspendable processor core in response to the time critical code section. Additionally, execution on the suspendable core may be resumed when the real time processor core reaches the end of the time critical code section. In one example, execution is suspended by issuing an inter-processor interrupt (IPI) from the real time core to the suspendable core, wherein execution may be resumed when the real time core conducts a write to a memory location that is monitored by the suspendable core during suspension of execution.Type: GrantFiled: September 14, 2012Date of Patent: March 15, 2016Assignee: Intel CorporationInventors: Ian Betts, Alexander Komarov, Anton Langebner
-
Tile-based processor architecture model for high-efficiency embedded homogeneous multicore platforms
Patent number: 9275002Abstract: The present invention relates to a processor which comprises processing elements that execute instructions in parallel and are connected together with point-to-point communication links called data communication links (DCL). The instructions use DCLs to communicate data between them. In order to realize those communications, they specify the DCLs from which they take their operands, and the DCLs to which they write their results. The DCLs allow the instructions to synchronize their executions and to explicitly manage the data they manipulate. Communications are explicit and are used to realize the storage of temporary variables, which is decoupled from the storage of long-living variables.Type: GrantFiled: January 31, 2011Date of Patent: March 1, 2016Inventors: Philippe Manet, Bertrand Rousseau -
Patent number: 9207958Abstract: In one general aspect, a system includes an abstract machine instruction stream, a virtual machine coprocessor configured to receive an instruction from the abstract machine instruction stream and to generate one or more native machine instructions in response to the received instruction, and a processor coupled to the virtual machine coprocessor and operable to execute the native machine instructions generated by the virtual machine coprocessor. The virtual machine coprocessor is operable to generate one or more native machine instructions to explicitly control the virtual machine coprocessor.Type: GrantFiled: August 8, 2003Date of Patent: December 8, 2015Assignee: ARM FINANCE OVERSEAS LIMITEDInventor: Kevin D. Kissell
-
Patent number: 9135215Abstract: Communicating among nodes in a network includes: sending a packet from an origin node to a destination node over a route including plural nodes. At each node in the route, routing of the packet is initiated according to a predicted path concurrently with verifying the correctness of the predicted path based on analyzing route information in the packet. In response to results of verifying the correctness of the predicted path, the routing of the packet is completed according to the predicted path or initiating a routing of the packet according to an actual path based on the route information in the packet.Type: GrantFiled: September 20, 2010Date of Patent: September 15, 2015Assignee: Tilera CorporationInventors: Ian Rudolf Bratt, Carl G. Ramey, Matthew Mattina
-
Patent number: 8959319Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.Type: GrantFiled: December 2, 2011Date of Patent: February 17, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
-
Patent number: 8843928Abstract: A method and system of efficient use and programming of a multi-processing core device. The system includes a programming construct that is based on stream-domain code. A programmable core based computing device is disclosed. The computing device includes a plurality of processing cores coupled to each other. A memory stores stream-domain code including a stream defining a stream destination module and a stream source module. The stream source module places data values in the stream and the stream conveys data values from the stream source module to the stream destination module. A runtime system detects when the data values are available to the stream destination module and schedules the stream destination module for execution on one of the plurality of processing cores.Type: GrantFiled: January 21, 2011Date of Patent: September 23, 2014Assignee: QST Holdings, LLCInventors: Paul Master, Frederick Furtek
-
Patent number: 8812826Abstract: In one implementation, processor testing may include the ability to randomly generate a first plurality of branch instructions for a first portion of an instruction set, each branch instruction in the first portion branching to a respective instruction in a second portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. Processor testing may also include the ability to randomly generate a second plurality of branch instructions for the second portion of the instruction set, each branch instruction in the second portion branching to a respective instruction in the first portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. Processor testing may additionally include the ability to generate a plurality of instructions to increment a counter when each branch instruction is encountered during execution.Type: GrantFiled: October 20, 2010Date of Patent: August 19, 2014Assignee: International Business Machines CorporationInventors: Abhishek Bansal, Nitin Gupta, Brad L. Herold, Jayakumar N Sankarannair
-
Publication number: 20130346731Abstract: Instructions are tracked in a processor. A completion unit in the processor receives an instruction group to add to a table to form a received instruction group. In response to receiving the received instruction group, the completion unit determines whether an entry is present that contains a previously stored instruction group in a first location and has space for storing the received instruction group. In response to the entry being present, the completion unit stores the received instruction group in a second location in the entry to form a stored instruction group.Type: ApplicationFiled: August 26, 2013Publication date: December 26, 2013Applicant: International Business Machines CorporationInventors: Christopher M. Abernathy, Hung Q. Le, Dung Q. Nguyen, Benjamin W. Stolt
-
Patent number: 8601177Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.Type: GrantFiled: June 27, 2012Date of Patent: December 3, 2013Assignee: Intel CorporationInventor: Thomas A. Piazza
-
Patent number: 8572355Abstract: One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.Type: GrantFiled: September 13, 2010Date of Patent: October 29, 2013Assignee: Nvidia CorporationInventors: Guillermo Juan Rozas, Brett W. Coon
-
Patent number: 8539211Abstract: A multi-threaded processor comprises a processing unit (PU) for concurrently processing multiple threads. A register file means (RF) is provided having a plurality of registers, wherein a first register (LI) is used for storing loop invariant values and N second registers (LVI-LVN) are each used for storing loop variant values. Furthermore N program counters (PCI-PCN) are provided each being associated to one of the multiple threads, wherein N being the number of threads being processed.Type: GrantFiled: January 17, 2006Date of Patent: September 17, 2013Assignee: Nytell Software LLCInventor: Jan Hoogerbrugge
-
Patent number: 8539501Abstract: Processes requiring access to shared resources are adapted to issue a reservation request, such that a place in a resource access queue, such as one administered by means of a semaphore system, can be reserved for the process. The reservation is issued by a Reservation Management module at a time calculated to ensure that the reservation reaches the head of the queue as closely as possible to the moment at which the process actually needs access to the resource. The calculation may be made on the basis of priority information concerning the process itself, and statistical information gathered concerning historical performance of the queue.Type: GrantFiled: March 6, 2012Date of Patent: September 17, 2013Assignee: International Business Machines CorporationInventors: Chiara Conti, Mariella Corbacio, Giuseppe Longobardi, Alessandra Masci, Enrico Nocerini, Pia Toro
-
Patent number: 8528000Abstract: The execution environment provides for scalability where components will execute in parallel and exploit various patterns of parallelism. Dataflow applications are represented by reusable dataflow graphs called map components, while the executable version is called a prepared map. Using runtime properties the prepared map is executed in parallel with a thread allocated to each map process. The execution environment not only monitors threads, detects and corrects deadlocks, logs and controls program exceptions, but also data input and output ports of the map components are processed in parallel to take advantage of data partitioning schemes. Port implementation supports multi-state null value tokens to more accurately report exceptions. Data tokens are batched to minimize synchronization and transportation overhead and thread contention.Type: GrantFiled: May 6, 2010Date of Patent: September 3, 2013Assignee: Pervasive Software, Inc.Inventors: Larry Lee Schumacher, Agustin Gonzales-Tuchmann, Laurence Tobin Yogman, Paul C. Dingman