Abstract: A branch predictor for use in a processor includes a Level 1 branch predictor, a Level 2 branch predictor, a match determining circuit, and an override determining circuit. The Level 1 branch predictor generates a Level 1 branch prediction. The Level 2 branch predictor generates a Level 2 branch prediction. The match determining circuit determines whether the Level 1 and Level 2 branch predictions match. The override determining circuit determines whether to override the Level 1 branch prediction with the Level 2 branch prediction. The Level 1 branch prediction is used when the Level 1 and Level 2 branch predictions match or when the Level 1 and Level 2 branch predictions do not match and the Level 1 branch prediction is not overridden. The Level 2 branch prediction is used when the Level 1 and Level 2 branch predictions do not match and the Level 1 branch prediction is overridden.
Type:
Grant
Filed:
December 22, 2010
Date of Patent:
July 22, 2014
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Trivikram Krishnamurthy, Anthony Jarvis
Abstract: A processor and a method for privilege escalation in a processor are provided. The method may comprise fetching an instruction from a fetch address, where the instruction requires the processor to be in supervisor mode for execution, and determining whether the fetch address is within a predetermined address range. The instruction is filtered through an instruction mask and then it is determined whether the instruction, after being filtered through the mask, equals the value in an instruction value compare register. The processor privilege is raised to supervisor mode for execution of the instruction in response to the fetch address being within the predetermined address range and the filtered instruction equaling the value in the instruction value compare register, wherein the processor privilege is raised to supervisor mode without use of an interrupt. The processor privilege returns to its previous level after execution of the instruction.
Type:
Grant
Filed:
December 14, 2010
Date of Patent:
July 15, 2014
Assignee:
International Business Machines Corporation
Abstract: A multiple stage branch prediction system includes a branch target address cache (BTAC) and a branch predictor circuit. The BTAC is configured to store a BTAC entry. The branch predictor circuit is configured to store state information. The branch predictor circuit utilizes the state information to predict the direction of a branch instruction and to manage the BTAC entry based on modified state information prior to resolution of the branch instruction.
Abstract: A device employing techniques to optimize Context-based Adaptive Binary Arithmetic Coding (CABAC) for the H.264 video decoding is provided. The device includes a processing circuit operative to implement a set of instructions to decode multiple bins simultaneously and renormalize an offset register and a range register after the multiple bins are decoded. The range register and offset registers may be 32 or 64 bits. The use of a larger range register allows renormalization to be skipped when enough bits are still in the range register.
Abstract: The invention is a method and system for providing trace data in a pipelined data processor. Aspects of the invention including providing a trace pipeline in parallel to the execution pipeline, providing trace information on whether conditional instructions are complete or not, providing trace information on the interrupt status of the processor, replacing instructions in the processor with functionally equivalent instructions that also produce trace information and modifying the scheduling of instructions in the processor based on the occupancy of the trace output buffer.
Type:
Grant
Filed:
April 28, 2009
Date of Patent:
July 8, 2014
Assignee:
Imagination Technologies, Limited
Inventors:
Robert Graham Isherwood, Ian Oliver, Andrew Webber
Abstract: A set of helper thread binaries is created from a set of main thread binaries. The helper thread monitors software or hardware ports for incoming data events. When the helper thread detects an incoming event, the helper thread asynchronously executes instructions that calculate incoming data needed by the main thread.
Type:
Grant
Filed:
February 1, 2008
Date of Patent:
July 8, 2014
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Juan C. Rubio, Balaram Sinharoy
Abstract: Methods and apparatuses are provided for increased efficiency in a processor via early instruction completion. An apparatus is provided for increased efficiency in a processor via early instruction completion. The apparatus comprises an execution unit for processing instructions and determining whether a later issued instruction is ready for completion or an earlier issued instruction is ready for completion and a retire unit for retiring the later issued instruction when the later instruction is ready for completion or to retire the earlier instruction when later instruction is not ready for completion and the earlier issued instruction has a known good completion status. A method is provided for increased efficiency in a processor via early instruction completion. The method comprises completing an earlier issued instruction having a known good completion status ahead of a later issued instruction when the later issued instruction is not ready for completion.
Type:
Grant
Filed:
April 15, 2011
Date of Patent:
July 1, 2014
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Michael D Estlick, Kevin Hurd, Jay Fleischman
Abstract: Method, apparatus, and system for a programmable event-driven yield mechanism. The mechanism may disrupt processing of a program to deliver a yield event. The event may be treated as a fault-like yield event or a trap-like event. For a fault-like yield event, the faulting instruction is canceled before retirement and processor state is not updated before the yield event is delivered. For a trap-like yield event the instruction causing the trap is retired and the yield event is delivered on an interrupt boundary. Multiple pending yield events may be handled according to priority. Other embodiments are also described and claimed.
Type:
Grant
Filed:
March 31, 2006
Date of Patent:
June 24, 2014
Assignee:
Intel Corporation
Inventors:
Xiang Zou, Hong Wang, Robert Knight, Robert Geva, Gautham Chinya, Scott Dion Rodgers, Chris Newburn, Bryant E. Bigbee, Per Hammarlund, Ittai Anati, Jim B. Crossland, John P. Shen
Abstract: An apparatus providing for a secure execution environment is presented. The apparatus includes a microprocessor and a secure non-volatile memory. The a microprocessor is configured to execute non-secure application programs and a secure application program, where the non-secure application programs are accessed from a system memory via a system bus. The microprocessor has a plurality of timers which are visible and accessible only by the secure application program when executing in a secure execution mode. The secure non-volatile memory is coupled to the microprocessor via a private bus and is configured to store the secure application program in encrypted form. Transactions over the private bus between the microprocessor and the secure non-volatile memory are isolated from the system bus, the system memory, and corresponding system bus resources within the microprocessor.
Abstract: The described embodiments provide a processor for generating a result vector with incremented or decremented values from an input vector. During operation, the processor receives an input vector and a control vector. The processor then copies a value contained in a selected element of the input vector. The processor next generates the result vector, which involves writing an incremented or decremented value to the result vector, depending on the value of the control vector and the embodiment. In addition, a predicate vector can be used to control the values that are written to the result vector.
Type:
Grant
Filed:
June 30, 2009
Date of Patent:
June 24, 2014
Assignee:
Apple Inc.
Inventors:
Jeffry E. Gonion, Keith E. Diefendorff, Jr.
Abstract: Described are techniques synchronizing processing of at least two code threads. A first thread executing in user space is provided. A second thread executing in kernel space is provided. A global mutex lock is provided for synchronizing processing between said first thread and said second thread. One of said first thread and said second thread holds the global mutex lock and is identified as a current owner of the global mutex lock. The other of said first thread and said second thread requests the global mutex lock and is blocked until the current owner of the global mutex lock releases the global mutex lock. The global mutex lock is held by at most one thread at a point in time and is identified as the current owner.
Abstract: A branch target address table is provided for each branch instruction having a plurality of branch targets. Each branch target address table stores a history of a plurality of branch target addresses determined in the past by executing a corresponding branch instruction. A branch target prediction unit predicts a predicted branch target address with respect to a branch instruction with reference to the history of branch target addresses stored in the branch target address table corresponding to the branch instruction. The predicted branch target address obtained as a result of the prediction is stored, for example, in a predicted branch target address storage unit in association with the branch instruction, and is referenced by an instruction fetch control unit at the time of prefetching a branch target instruction.
Abstract: A method for implementing command acceleration. The method includes receiving a first set of instructions from a first processor, wherein the first set of instructions are formatted in accordance with a microarchitecture of the first processor. The first set of instructions are translated into a second set of instructions, wherein the second set of instructions are formatted in accordance with a microarchitecture of a second processor. The second set instructions are then transmitted to the second processor for execution by the second processor.
Type:
Grant
Filed:
November 4, 2005
Date of Patent:
May 27, 2014
Assignee:
NVIDIA Corporation
Inventors:
Ashish Karandikar, Shirish Gadre, Amir H. Salek
Abstract: A program processing device comprises a CPU for carrying out predetermined processing according to a program; an internal memory storing the program and data generated by the CPU by carrying out the program, and a data acquiring circuit connected to an external program processing device, for acquiring the program from the external program processing device to write into the internal memory, wherein the CPU, the internal memory, a debug processing circuit, and the data acquiring circuit are integrally mounted on the same semiconductor substrate.
Abstract: Systems and methods for performing single instruction multiple data (SIMD) operations on a data set. The methods may include examining a structure of the data set to determine what reorganization may be necessary to facilitate SIMD processing. The method may include selecting a stored bit mask corresponding to the organization of the data set and loading the bit mask into an application specific register (ASR). Subsequently, the data may be reorganized inline according to the ASR as the data is loaded into the SIMD functional unit such that the SIMD functional unit may operate on the data set. The results of the SIMD operation may be written to a results register.
Abstract: A method for generating a digital signal pattern at M outputs involves retrieving an instruction from memory comprising a first set of bits identifying a first group of N outputs that includes fewer than all of the M outputs, and a second set of N bits each corresponding to a respective output included in the identified first group of N outputs. For each of the M outputs that is included in the identified first group of N outputs, the signal at the output is toggled if the one of the N bits corresponding to that output is in a first state and is kept in the same state if the one of the N bits corresponding to that output is in a second state. For each of the M outputs that is not included in the identified first group of N outputs, the signal at that output is kept in the same state.
Type:
Grant
Filed:
December 3, 2007
Date of Patent:
May 20, 2014
Assignee:
Analog Devices, Inc.
Inventors:
Christopher Jacobs, Andreas D. Olofsson, Paul Kettle
Abstract: Embodiments of the present invention execute an anti-prefetch instruction. These embodiments start by decoding instructions in a decode unit in a processor to prepare the instructions for execution. Upon decoding an anti-prefetch instruction, these embodiments stall the decode unit to prevent decoding subsequent instructions. These embodiments then execute the anti-prefetch instruction, wherein executing the anti-prefetch instruction involves: (1) sending a prefetch request for a cache line in an L1 cache; (2) determining if the prefetch request hits in the L1 cache; (3) if the prefetch request hits in the L1 cache, determining if the cache line contains a predetermined value; and (4) conditionally performing subsequent operations based on whether the prefetch request hits in the L1 cache or the value of the data in the cache line.
Type:
Grant
Filed:
April 16, 2008
Date of Patent:
May 20, 2014
Assignee:
Oracle America, Inc.
Inventors:
Paul Caprioli, Sherman H. Yip, Gideon N. Levinsky
Abstract: A programming language may include hint instructions that may notify a programming idiom accelerator that a programming idiom is coming. An idiom begin hint exposes the programming idiom to the programming idiom accelerator. Thus, the programming idiom accelerator need not perform pattern matching or other forms of analysis to recognize a sequence of instructions. Rather, the programmer may insert idiom hint instructions, such as an idiom begin hint, to expose the idiom to the programming idiom accelerator. Similarly, an idiom end hint may mark the end of the programming idiom.
Type:
Grant
Filed:
February 1, 2008
Date of Patent:
May 13, 2014
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
Abstract: In one embodiment, a processor can perform a function call from a main program to a function that is to operate on at least one vector-type operand, in which only scalar values are passed to the function, and input values to the function including the at least one vector-type operand are to be renamed from virtual registers identified in the function to physical registers of a vector register file, and output values from the function including the at least one vector-type operand are to be renamed from virtual registers identified in the function to physical registers of the vector register file. Other embodiments are described and claimed.
Abstract: To provide a device to reconfigure multi-level logic networks, which enable logic modification and reconfiguration of a multi-level logic network with small circuit area and low-power dissipation in a simple manner. For example, in the case of reconfiguring a multi-level logic network following logic modification for deleting an output vector F(b) of an objective logic function F(X) corresponding to an input vector b, unmodified pq elements are selected one by one from the nearest pq element EG to an output side. At this time, among output values of pq elements closer to an input side than selected pq elements, output values corresponding to the input vector, which equal an output value corresponding to any input variable X other than the input vector b are considered modified and thus not selected. Then, a selected output value corresponding to the input vector b is rewritten to an “invalid value”.