Instruction Decoding (e.g., By Microinstruction, Start Address Generator, Hardwired) Patents (Class 712/208)

Decoding instruction to accommodate plural instruction interpretations (e.g., different dialects, languages, emulation, etc.) (Class 712/209)

Decoding instruction to accommodate variable length instruction or operand (Class 712/210)

Decoding instruction to generate an address of a microroutine (Class 712/211)

Decoding by plural parallel decoders (Class 712/212)

Predecoding of instruction component (Class 712/213)

INSTRUCTION SET ARCHITECTURE WITH EXTENSIBLE REGISTER ADDRESSING

Publication number: 20150026435

Abstract: A method and circuit arrangement selectively source and/or write data from/to extended registers of an extended register file based in part on whether an operand address of an instruction references a primary register of primary register file configured to store a pointer to the extended register. Control logic connected to the primary register file and the extended register file determines whether the operand address references a primary register configured to store a pointer, and responsive to the determination, the control logic causes execution logic to selectively source and/or write data from/to the extended register pointed to by the pointer stored in the referenced primary register.

Type: Application

Filed: July 22, 2013

Publication date: January 22, 2015

Applicant: International Business Machines Corporation

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Highly Efficient Different Precision Complex Multiply Accumulate to Enhance Chip Rate Functionality in DSSS Cellular Systems

Publication number: 20150019842

Abstract: This invention is a digital signal processor capable of performing correlation of data with pseudo noise for code division multiple access (CDMA) decoding using clusters. Each cluster includes plural multipliers. The multipliers multiply real and imaginary parts of packed data by corresponding pseudo noise data. Within a cluster the real parts and the imaginary parts of the products are summed separately. This forms plural complex number outputs equal in number to the number of clusters. The pseudo noise data is offset relative to the data input differing amounts for different clusters. The clusters are divided into first half clusters receiving data from even numbered slots and second half clusters receiving data from odd numbered slots. The correlation unit includes a mask input to selectively zero a multiplier product.

Type: Application

Filed: July 9, 2014

Publication date: January 15, 2015

Inventors: Mujibur Rahman, Peter Richard Dent, Timothy David Anderson, Duc Quang Bui
Instruction operation code generation system

Patent number: 8935512

Abstract: It is possible to increase the processor instruction set design job efficiency and reduce workload on designers in investigation of an instruction set. An instruction operation code generation system includes an operation code bit width decision means, an instruction sorting means, and an operation code value decision means. The operation code bit width decision means decides a bit width that can be assigned for an operation code of each instruction according to specification data associated with a processor instruction set. The instruction sorting means sorts the instructions according to the operation code bit width. The operation code value decision means decides the value of the operation code of each instruction.

Type: Grant

Filed: November 19, 2007

Date of Patent: January 13, 2015

Assignee: NEC Corporation

Inventor: Takahiro Kumura
LOOP STREAMING DETECTOR FOR STANDARD AND COMPLEX INSTRUCTION TYPES

Publication number: 20150012726

Abstract: A processor includes a microcode storage comprising a plurality of microcode flows and a decode logic coupled to the microcode storage. The decode logic is configured to receive a first instruction, decode the first instruction into an entry point vector to a first microcode flow in the microcode storage, the entry point vector comprising a first indicator specifying a number of clock cycles associated with the first microcode flow, initiate the microcode storage, wherein the microcode storage inserts microinstructions of the first microcode flow into an instruction queue, count clock cycles after initiating the microcode storage, and decode a second instruction without first receiving a return from the microcode storage, wherein the second instruction is decoded at a particular clock cycle based on the number of clock cycles associated with the first microcode flow.

Type: Application

Filed: July 3, 2013

Publication date: January 8, 2015

Inventors: Jonathan D. Combs, Jonathan Y. Tong
PROCESSING DEVICE AND CONTROL METHOD OF PROCESSING DEVICE

Publication number: 20150012727

Abstract: A processing device has: a plurality of registers configured to correspond to a plurality of accessible register windows; and an instruction decoder configured to inhibit, when, during an execution of a first instruction of changing a number of a current register window by one, a second instruction of changing the number of the current register window by one in a direction same as a direction of the first instruction is input, a decode of the second instruction until when the execution of the first instruction is completed, and to perform, when, during the execution of the first instruction of changing the number of the current register window by one, a second instruction of changing the number of the current register window by one in a direction opposite to the direction of the first instruction is input, a decode of the second instruction during the execution of the first instruction.

Type: Application

Filed: May 16, 2014

Publication date: January 8, 2015

Applicant: FUJITSU LIMITED

Inventor: YASUNOBU AKIZUKI
Instruction and logic to length decode X86 instructions

Patent number: 8930678

Abstract: Techniques to increase the consumption rate of raw instruction bytes within an instruction fetch unit. An instruction fetch unit according to embodiments of the present invention may include a prefetch buffer, a set of bypass multiplexers, an array of bypass latches, a byte-block multiplexer, an instruction alignment multiplexer, a predecode cache, and an instruction length decoder. Raw instruction bytes may be steered from the bypass latches into macro-instructions for consumption by the instruction length decoder, which may generate micro-instructions from the macro-instructions. Embodiments of the present invention may de-couple a latency for reading raw instruction bytes from the prefetch buffer from consuming raw instruction bytes by the instruction length decoder.

Type: Grant

Filed: April 26, 2012

Date of Patent: January 6, 2015

Assignee: Intel Corporation

Inventors: Venkateswara R. Madduri, Hoichi Cheong, Jonathan Y. Tong
MODE DEPENDENT PARTIAL WIDTH LOAD TO WIDER REGISTER PROCESSORS, METHODS, AND SYSTEMS

Publication number: 20150006856

Abstract: A method of an aspect is performed by a processor. The method includes receiving a partial width load instruction. The partial width load instruction indicates a memory location of a memory as a source operand and indicates a register as a destination operand. The method includes loading data from the indicated memory location to the processor in response to the partial width load instruction. The method includes writing at least a portion of the loaded data to a partial width of the register in response to the partial width load instruction. The method includes finishing writing the register with a set of bits stored in a remaining width of the register that have bit values that depend on a partial width load mode of the processor. The partial width load instruction does not indicate the partial width load mode. Other methods, processors, and systems are also disclosed.

Type: Application

Filed: June 28, 2013

Publication date: January 1, 2015

Inventors: William C. RASH, Yazmin A. SANTIAGO, Martin Guy DIXON
APPARATUS AND METHOD TO ACCELERATE COMPRESSION AND DECOMPRESSION OPERATIONS

Publication number: 20150006853

Abstract: A processor is described that includes an instruction execution pipeline having an instruction fetch unit to fetch and decode an instruction. The processor also has an execution unit to execute the instruction. The execution unit has a state machine and content addressable memory (CAM) circuitry. The state machine is to receive a pointer to a stream of DEFLATE encoded information, fetch a section of the DEFLATE encoded information and apply the section of the DEFLATE encoded information to the CAM to obtain decoded DEFLATE information.

Type: Application

Filed: June 29, 2013

Publication date: January 1, 2015

Inventors: Vinodh GOPAL, James D. GUILFORD, Gilbert M. WOLRICH
PACKED DATA ELEMENT PREDICATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20150006858

Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.

Type: Application

Filed: June 28, 2013

Publication date: January 1, 2015

Inventors: BRET L. TOLL, Buford M. Guy, Ronak Singhal, Mishali Nail
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO TRANSCODE VARIABLE LENGTH CODE POINTS OF UNICODE CHARACTERS

Publication number: 20150006857

Abstract: A processor includes a plurality of packed data registers. The processor also includes a decode unit to decode a packed variable length code point length determination instruction. The instruction is to indicate a first source packed data that is to have a plurality of packed variable length code points that are each to represent a character. The instruction is also to indicate a destination storage location. The processor also has an execution unit coupled with the decode unit and the packed data registers. The execution unit, in response to the instruction, is to store a result packed data in the indicated destination storage location. The result packed data is to have a length for each of the plurality of the packed variable length code points. Other processors, methods, systems, and instructions are also disclosed.

Type: Application

Filed: June 28, 2013

Publication date: January 1, 2015

Inventor: SHIH J. KUO
PROCESSOR, MULTIPROCESSOR SYSTEM, COMPILER, SOFTWARE SYSTEM, MEMORY CONTROL SYSTEM, AND COMPUTER SYSTEM

Publication number: 20140380021

Abstract: A processor includes: a first instruction processing unit that, in a first mode, receives a first input including instructions included in a first instruction set; a second instruction processing unit that, in a second mode, receives the first input, the second instruction processing unit having a simpler configuration than the first instruction processing unit; a third instruction processing unit that, in a third mode, receives a second input including instructions included in a second instruction set, the second instruction set including part of the instructions included in the first instruction set, the third instruction processing unit having a simpler configuration than the first instruction processing unit and the second instruction processing unit; a selection unit that selects, according to a mode, a result of decoding by one of the instruction processing units; and an instruction execution unit that executes an instruction according to the selected result of decoding.

Type: Application

Filed: September 5, 2014

Publication date: December 25, 2014

Inventor: Naoki OCHI
Mapping same logical register specifier for different instruction sets with divergent association to architectural register file using common address format

Patent number: 8914615

Abstract: A processor core supports execution of program instruction from both a first instruction set and a second instruction set. An architectural register file 18 containing architectural registers is shared by the two instruction sets. The two instruction sets employ logical register specifiers which for at least some values of those logical registers specifiers correspond to different architectural registers within the architectural register file 18. A first decoder 4 for the first instruction set and a second decoder 6 for the second instruction set serve to decode the logical register specifiers to a common register addressing format. This common register addressing format is used to supply register specifiers to renaming circuitry 10 for supporting register renaming in conjunction with a physical register file 16 and an architectural register file 18.

Type: Grant

Filed: December 2, 2011

Date of Patent: December 16, 2014

Assignee: ARM Limited

Inventors: Glen Andrew Harris, James Nolan Hardage, Mark Carpenter Glass
Divided central data processing

Patent number: 8910181

Abstract: A circuit configuration for a data processing system and a corresponding method for executing multiple tasks by way of a central processing unit having a processing capacity assigned to the processing unit, the circuit configuration being configured to distribute the processing capacity of the processing unit uniformly among the respective tasks, and to process the respective tasks in time-offset fashion until they are respectively executed.

Type: Grant

Filed: March 17, 2011

Date of Patent: December 9, 2014

Assignee: Robert Bosch GmbH

Inventors: Eberhard Boehl, Ruben Bartholomae
Gathering and Scattering Multiple Data Elements

Publication number: 20140344553

Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.

Type: Application

Filed: May 20, 2013

Publication date: November 20, 2014

Inventors: Christopher J. Hughes, Yen-Kuang (Y.K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
PROVIDING STATUS OF A PROCESSING DEVICE WITH PERIODIC SYNCHRONIZATION POINT IN INSTRUCTION TRACING SYSTEM

Publication number: 20140344552

Abstract: In accordance with embodiments disclosed herein, there is provided systems and methods for providing status of a processing device with a periodic synchronization point in an instruction tracing system. For example, the method may include generating a boundary packet based on a unique byte pattern in a packet log. The boundary packet provides a starting point for packet decode. The method may also include generating a plurality of state packets based on status information of the processor. The plurality of state packets follows the boundary packet when outputted into the packet log.

Type: Application

Filed: May 16, 2013

Publication date: November 20, 2014

Inventors: Frank Binns, Matthew C. Merten, Mayank Bomb, Beeman C. Strong, Peter Lachner, Jason W. Brandt, Itamar Kazachinsky, Ofer Levy, MD A. Rahman
Changing opcode of subsequent instruction when same destination address is not used as source address by intervening instructions

Patent number: 8892851

Abstract: A circuit arrangement and method support compression and expansion of instruction opcodes by detecting successive address targeting and decoding a first opcode of an instruction into a second opcode in response to detecting successive address targeting. The circuit arrangement and method execute instructions in an instruction stream and detect successive address targeting by two or more instructions in the instruction stream without the targeted address being utilized as a source address in an instruction executed between the first and second instructions in the instruction stream. Then, based on that detection, the opcode of the second instruction is modified, changed, or appended to such that a different opcode is indicated by the second instruction, such that executing the second instruction causes a different unique type of operation to be performed.

Type: Grant

Filed: November 2, 2011

Date of Patent: November 18, 2014

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
SYNCHRONISATION OF EXECUTION THREADS ON A MULTI-THREADED PROCESSOR

Publication number: 20140331029

Abstract: Method and apparatus are provided for synchronising execution of a plurality of threads on a multi-threaded processor. A program executed by a thread can have a number of synchronisation points corresponding to points where execution is to be synchronised with another thread. Execution of a thread is paused when it reaches a synchronisation point until at least one other thread with which it is intended to be synchronised reaches a corresponding synchronisation point. Execution is subsequently resumed. A control core maintains status data for threads and can cause a thread that is ready to run to use execution resources that were occupied by a thread that is waiting for a synchronization event.

Type: Application

Filed: February 11, 2014

Publication date: November 6, 2014

Applicant: IMAGINATION TECHNOLOGIES LIMITED

Inventor: Yoong Chert Foo
Flexible channel decoder

Patent number: 8879670

Abstract: A configurable Turbo-LDPC decoder having A set of P>1 Soft-Input-Soft-Output decoding units (DP0-DPP-1; DPi) for iteratively decoding both Turbo- and LDPC-encoded input data, each of the decoding units having first (I1i) and second (I2i) input ports and first (O1i) and second (O2i) output ports for intermediate data; First and second memories (M1, M2) for storing the intermediate data, each of the first and second memories comprising P independently readable and writable memory blocks having respective input and output ports; and A configurable switching network (SN) for connecting the first input and output ports of the decoding units to the output and input ports of the first memory, and the second input and output ports of the decoding units to the output and input ports of the second memory.

Type: Grant

Filed: September 8, 2010

Date of Patent: November 4, 2014

Assignee: Agence Spatiale Europeenne

Inventors: Giuseppe Gentile, Massimo Rovini, Paolo Burzigotti, Luca Fanucci
Detecting logically non-significant operation based on opcode and operand and setting flag to decode address specified in subsequent instruction as different address

Patent number: 8880852

Abstract: A method, apparatus, and program product execute instructions of an instruction stream and detect logically non-significant operations in the instruction stream. Then, based on that detection, a target or source address of a subsequent instruction is adjusted. In some instances, doing so enables a greater number of addresses, e.g., registers, to be accessed in a given number of bit positions within an instruction format.

Type: Grant

Filed: October 26, 2011

Date of Patent: November 4, 2014

Assignee: International Business Machines Corporation

Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
Microprocessor that fuses load-alu-store and JCC macroinstructions

Patent number: 8856496

Abstract: A microprocessor receives first and second program-adjacent macroinstructions of the microprocessor instruction set architecture. The first macroinstruction loads an operand from a location in memory, performs an arithmetic/logic operation using the loaded operand to generate a result, and stores the result back to the memory location. The second macroinstruction jumps to a target address if condition codes satisfy a specified condition and otherwise executes the next sequential instruction. An instruction translator simultaneously translates the first and second program-adjacent macroinstructions into first, second, and third micro-operations for execution by execution units. The first micro-operation calculates the memory location address and loads the operand therefrom.

Type: Grant

Filed: February 25, 2011

Date of Patent: October 7, 2014

Assignee: Via Technologies, Inc.

Inventors: G. Glenn Henry, Terry Parks
System using a unique marker with each software code-block

Patent number: 8850410

Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.

Type: Grant

Filed: January 29, 2010

Date of Patent: September 30, 2014

Assignee: International Business Machines Corporation

Inventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
Microprocessor that fuses MOV/ALU/JCC instructions

Patent number: 8850164

Abstract: A microprocessor receives first, second, and third program-adjacent macroinstructions. The first macroinstruction moves a first operand to a first register from a second register. The second macroinstruction performs an arithmetic/logic operation using the first operand in the second register and a second operand in a third register to generate a result, loads the result back into the first register, and updates condition codes based on the result. The third macroinstruction conditionally jumps to a target address. An instruction translator simultaneously translates the first, second, and third program-adjacent macroinstructions into a single micro-operation for execution by an execution unit. The micro-operation performs the arithmetic/logic operation using the first operand in the second register and the second operand in third register to generate the result, loads the result back into the first register, updates the condition codes based on the result, and conditionally jumps to the target address.

Type: Grant

Filed: February 25, 2011

Date of Patent: September 30, 2014

Assignee: Via Technologies, Inc.

Inventor: Terry Parks
SWITCHING BETWEEN DEDICATED FUNCTION HARDWARE AND USE OF A SOFTWARE ROUTINE TO GENERATE RESULT DATA

Publication number: 20140289499

Abstract: An apparatus for processing data 2 is provided including processing circuitry 24 controlled by an instruction decoder 20 in response to a stream of program instructions. There is also provided dedicated function hardware 12 configured to receive output data from the processing circuitry and to perform a dedicated processing operation. The instruction decoder 20 is responsive to an end instruction 54 and a software processing flag (blend_shade_enabled) to control the processing circuitry to end a current software routine, to generate output data and in dependence upon the software processing flag either trigger processing of the output data by the dedicated function hardware or trigger the processing circuitry to perform a further software routine upon the output data to generate software generated result data instead of hardware generated result data as generated by the dedicated hardware circuitry.

Type: Application

Filed: June 9, 2014

Publication date: September 25, 2014

Inventors: Simon JONES, Andreas ENGH-HALSTVEDT, Aske Simon CHRISTENSEN
Microprocessor that fuses MOV/ALU instructions

Patent number: 8843729

Abstract: A microprocessor receives first and second program-adjacent macroinstructions of the instruction set architecture of the microprocessor. The first macroinstruction instructs the microprocessor to move a first operand to a first architectural register from a second architectural register. The second macroinstruction instructs the microprocessor to perform an arithmetic/logic operation using the first operand in the second architectural register and a second operand in a third architectural register to generate a result and to load the result back into the first architectural register. An instruction translator simultaneously translates the first and second program-adjacent macroinstructions into a single micro-operation for execution by an execution unit.

Type: Grant

Filed: February 25, 2011

Date of Patent: September 23, 2014

Assignee: VIA Technologies, Inc.

Inventor: Terry Parks
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO CONSOLIDATE UNMASKED ELEMENTS OF OPERATION MASKS

Publication number: 20140281396

Abstract: An instruction processing apparatus of an aspect includes a plurality of operation mask registers. The apparatus also includes a decode unit to receive an operation mask consolidation instruction. The operation mask consolidation instruction is to indicate a source operation mask register, of the plurality of operation mask registers, and a destination storage location. The source operation mask register is to include a source operation mask that is to include a plurality of masked elements that are to be disposed within a plurality of unmasked elements. An execution unit is coupled with the decode unit. The execution unit, in response to the operation mask consolidation instruction, is to store a consolidated operation mask in the destination storage location. The consolidated operation mask is to include the unmasked elements from the source operation mask consolidated together. Other apparatus, methods, systems, and instructions are also disclosed.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Inventor: Ashish Jha
FUSIBLE INSTRUCTIONS AND LOGIC TO PROVIDE OR-TEST AND AND-TEST FUNCTIONALITY USING MULTIPLE TEST SOURCES

Publication number: 20140281397

Abstract: Fusible instructions and logic provide OR-test and AND-test functionality on multiple test sources. Some embodiments include a processor decode stage to decode a test instruction for execution, the instruction specifying first, second and third source data operands, and an operation type. Execution units, responsive to the decoded test instruction, perform one logical operation, according to the specified operation type, between data from the first and second source data operands, and perform a second logical operation between the data from the third source data operand and the result of the first logical operation to set a condition flag. Some embodiments generate the test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the test instruction through a just-in-time compiler. Some embodiments also fuse the test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
METHODS AND APPARATUS FOR FUSING INSTRUCTIONS TO PROVIDE OR-TEST AND AND-TEST FUNCTIONALITY ON MULTIPLE TEST SOURCES

Publication number: 20140281389

Abstract: Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
METHOD TO IMPROVE SPEED OF EXECUTING RETURN BRANCH INSTRUCTIONS IN A PROCESSOR

Publication number: 20140281394

Abstract: An apparatus and method for executing call branch and return branch instructions in a processor by utilizing a link register stack. The processor includes a branch counter that is initialized to zero, and is set to zero each time the processor decodes a link register manipulating instruction other than a call branch instruction. The branch counter is incremented by one each time a call branch instruction is decoded and an address is pushed onto the link register stack. In response to decoding a return branch instruction and provided the branch counter is not zero, a target address for the decoded return branch instruction is popped off the link register stack, the branch counter is decremented, and there is no need to check the target address for correctness.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: QUALCOMM INCORPORATED

Inventors: Rodney Wayne Smith, Jeffery M. Schottmiller, Michael Scott McIlvaine, Brian Michael Stempel, Melinda J. Brown, Daren Eugene Streett
INSTRUCTION EMULATION PROCESSORS, METHODS, AND SYSTEMS

Publication number: 20140281398

Abstract: A processor of an aspect includes decode logic to receive a first instruction and to determine that the first instruction is to be emulated. The processor also includes emulation mode aware post-decode instruction processor logic coupled with the decode logic. The emulation mode aware post-decode instruction processor logic is to process one or more control signals decoded from an instruction. The instruction is one of a set of one or more instructions used to emulate the first instruction. The one or more control signals are to be processed differently by the emulation mode aware post-decode instruction processor logic when in an emulation mode than when not in the emulation mode. Other apparatus are also disclosed as well as methods and systems.

Type: Application

Filed: March 16, 2013

Publication date: September 18, 2014

Inventors: WILLIAM C. RASH, Martin G. Dixon, Yazmin A. Santiago
METHOD AND APPARATUS FOR FORWARDING LITERAL GENERATED DATA TO DEPENDENT INSTRUCTIONS MORE EFFICIENTLY USING A CONSTANT CACHE

Publication number: 20140281391

Abstract: A processor to a store constant value (immediate or literal) in a cache upon decoding a move immediate instruction in which the immediate is to be moved (copied or written) to an architected register. The constant value is stored in an entry in the cache. Each entry in the cache includes a field to indicate whether its stored constant value is valid, and a field to associate the entry with an architected register. Once a constant value is stored in the cache, it is immediately available for forwarding to a processor pipeline where a decoded instruction may need the constant value as an operand.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Applicant: QUALCOMM INCORPORATED

Inventors: James Norris Dieffenderfer, Michael William Morrow, Rodney Wayne Smith, Jeffery M. Schottmiller, Daniel S. Higdon, Michael Scott McIlvaine, Brian Michael Stempel, Kulin N. Kothari
Systems, Apparatuses, and Methods for Reducing the Number of Short Integer Multiplications

Publication number: 20140281395

Abstract: Systems, methods, and apparatuses for calculating a square of a data value of a first source operand, a square of a data value of a second source operand, and a multiplication of the data of the first and second operands only using one multiplication are described.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Inventors: Ilya Albrekht, Elmoustapha Ould-Ahmed-Vall
REORDER-BUFFER-BASED STATIC CHECKPOINTING FOR RENAME TABLE REBUILDING

Publication number: 20140281393

Abstract: Out-of-order CPUs, devices and methods diminish the time penalty from stalling the pipe to rebuild a rename table, such as due to a misprediction. A microprocessor can include a pipe that has a decoder, a dispatcher, and at least one execution unit. A rename table stores rename data, and a check-point table (“CPT”) stores rename data received from the dispatcher. A Re-Order Buffer (“ROB”) stores ROB data, and has a static mapping relationship with the CPT. If the rename table is flushed, such as due to a misprediction, the rename table is rebuilt at least in part by concurrent copying of rename data stored in the CPT, in coordination with walking the ROB.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Inventors: Ravi Iyengar, Prarthna Santhanakrishnan
PROFILING CODE PORTIONS TO GENERATE TRANSLATIONS

Publication number: 20140281392

Abstract: The disclosure provides a micro-processing system operable in a hardware decoder mode and in a translation mode. In the hardware decoder mode, the hardware decoder receives and decodes non-native ISA instructions into native instructions for execution in a processing pipeline. In the translation mode, native translations of non-native ISA instructions are executed in the processing pipeline without using the hardware decoder. The system includes a code portion profile stored in hardware that changes dynamically in response to use of the hardware decoder to execute portions of non-native ISA code. The code portion profile is then used to dynamically form new native translations executable in the translation mode.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Applicant: NVIDIA CORPORATION

Inventors: Nathan Tuck, Alexander Klaiber, Ross Segelken, David Dunn, Ben Hertzberg, Rupert Brauch, Thomas Kistler, Guillermo J. Rozas, Madhu Swarna
INSTRUCTION AND LOGIC TO PROVIDE VECTOR HORIZONTAL COMPARE FUNCTIONALITY

Publication number: 20140258683

Abstract: Instructions and logic provide vector horizontal compare functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read values from data fields of the specified size in the source operand, corresponding to the mask and compare the values for equality. In some embodiments, responsive to a detection of inequality, a trap may be taken. In some alternative embodiments, a flag may be set. In other alternative embodiments, a mask field may be set to a masked state for the corresponding unequal value(s). In some embodiments, responsive to all unmasked data fields of the source operand being equal to a particular value, that value may be broadcast to all data fields of the specified size in the destination operand.

Type: Application

Filed: November 30, 2011

Publication date: September 11, 2014

Applicant: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
Using vector atomic memory operation to handle data of different lengths

Patent number: 8826252

Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.

Type: Grant

Filed: June 12, 2009

Date of Patent: September 2, 2014

Assignee: Cray Inc.

Inventor: Terry D. Greyzck
EXTENSIBLE EXECUTION UNIT INTERFACE ARCHITECTURE

Publication number: 20140229713

Abstract: A method and circuit arrangement tightly couple together decode logic associated with multiple types of execution units and having varying priorities to enable instructions that are decoded as valid instructions for multiple types of execution units to be forwarded to a highest priority type of execution unit among the multiple types of execution units. Among other benefits, when an auxiliary execution unit is coupled to a general purpose processing core with the decode logic for the auxiliary execution unit tightly coupled with the decode logic for the general purpose processing core, the auxiliary execution unit may be used to effectively overlay new functionality for an existing instruction that is normally executed by the general purpose processing core, e.g., to patch a design flaw in the general purpose processing core or to provide improved performance for specialized applications.

Type: Application

Filed: March 11, 2013

Publication date: August 14, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
INDIRECT INSTRUCTION PREDICATION

Publication number: 20140229712

Abstract: A method, circuit arrangement, and program product for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.

Type: Application

Filed: February 27, 2013

Publication date: August 14, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
INDIRECT INSTRUCTION PREDICATION

Publication number: 20140229711

Abstract: A method, circuit arrangement, and program product for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.

Type: Application

Filed: February 13, 2013

Publication date: August 14, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
LOCAL INSTRUCTION LOOP BUFFER UTILIZING EXECUTION UNIT REGISTER FILE

Publication number: 20140229714

Abstract: A method and circuit arrangement utilize a register file of an execution unit as a local instruction loop buffer to enable suitable algorithms, such as DSP algorithms, to be fetched and executed directly within the execution unit, and often enabling other logic circuits utilized for other, general purpose workloads to either be powered down or freed up to handle other workloads.

Type: Application

Filed: March 12, 2013

Publication date: August 14, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Inter reference picture set signaling and prediction on an electronic device

Patent number: 8805098

Abstract: A previously signaled reference picture set (RPS) corresponding to a current picture is indicated. A first flag for a picture in the previously signaled RPS is set if the picture is to be used as a reference picture for the current picture. A bitstream is sent.

Type: Grant

Filed: May 31, 2013

Date of Patent: August 12, 2014

Assignee: Sharp Laboratories of America, Inc.

Inventor: Sachin G. Deshpande
Set sampling controls instruction

Patent number: 8806178

Abstract: A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.

Type: Grant

Filed: August 14, 2013

Date of Patent: August 12, 2014

Assignee: International Business Machines Corporation

Inventors: Jane H. Bartik, Lisa C. Heller, Damian L. Osisek, Donald W. Schmidt, Patrick M. West, Jr., Phil C. Yeh
PROCESSOR AND COMPILER

Publication number: 20140223142

Abstract: A Very Long Instruction Word (VLIW) processor having an instruction set with a reduced size resulting in a small number of bits being necessary to specify registers. The VLIW processor includes a register file, and first through third operation units, and executes a very long instruction word. Further, the very long instruction word includes a register specifying field which specifies a least one of the registers in the register file and a plurality of instructions. The operand of each instruction includes bits src1, src2, and dst, which indicate whether or not the registers specified by the register specifying field are to be used as the source register and the destination register.

Type: Application

Filed: April 8, 2014

Publication date: August 7, 2014

Applicant: Panasonic Corporation

Inventors: Takahiro KAGEYAMA, Hideshi NISHIDA, Takeshi TANAKA, Kouji NAKAJIMA
Prioritized assignment of sub-range registers of circularly addressable extended register file to loop variables in RISC processor

Patent number: 8799626

Abstract: A segmental allocation method of expanding RISC processor register includes the steps of a) setting an instruction format of the RISC processor, the destination register field being set having 6 bits to correspond to 64 registers and at least one source register field having at least 4 bits to correspond to at least 16 registers; b) providing two solutions to the problem resulting from that the instruction format in the step a) goes beyond range under some circumstances; and c) setting a register segment allocation algorithm having the steps of c1) providing and grouping a plurality of pseudo registers; c2) prioritizing the pseudo registers in each of the groups; c3) combining the groups pursuant to the priorities thereof; and c4) locating the physical register of lowest computational cost.

Type: Grant

Filed: September 9, 2011

Date of Patent: August 5, 2014

Assignee: National Chung Cheng University

Inventors: Rong-Guey Chang, Yuan-Shin Hwang, Chia-Hsien Su
DATA PROCESSING APPARATUS AND METHOD FOR CONTROLLING USE OF AN ISSUE QUEUE

Publication number: 20140215189

Abstract: An apparatus and method includes execution circuitry including a wide operand execution unit configured to allow up to N bits of operand data to be processed during execution of a single instruction. Decoder circuitry decodes and generates, for each instruction, at least one control data block identifying an operation to be performed by the execution circuitry and at least two re-combineable control data blocks for the instruction. Issue queue control circuitry then allocates a slot in the issue queue for each of the at least two data blocks and up to M bits of associated operand data, and marks those allocated slots to identify that they contain re-combineable control data blocks. The issue queue control circuitry issues the combined block to said wide operand execution unit along with the operand data contained in each of the allocated slots for said at least two control data blocks.

Type: Application

Filed: January 29, 2013

Publication date: July 31, 2014

Applicant: ARM LIMITED

Inventors: Cedric Denis Robert AIRAUD, Luca SCALABRINO, Frederic Jean Denis ARSANTO, Guillaume SCHON, Frederic Claude Marie PIRY, Albin Pierick TONNERRE
Multi-Level Dispatch for a Superscalar Processor

Publication number: 20140215188

Abstract: In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.

Type: Application

Filed: January 25, 2013

Publication date: July 31, 2014

Applicant: Apple Inc.

Inventors: John H. Mylius, Gerard R. Williams, III, Shyam Sundar Balasubramanian, Conrado Blasco-Allue
Microcomputer having a protection function in a register

Patent number: 8789169

Abstract: A control unit controls execution of an instruction according to a decode result of an instruction code. A GRA register stores an access attribute for each of the plurality of general-purpose registers. A mode storage unit stores modes for controlling an operation of a CPU. When the control unit makes a request for access to the general-purpose register, register access allowance determining circuit determines whether the access to the general-purpose register in question is to be allowed or not, depending on the access attribute stored in the GRA register and the mode stored in the mode storage unit. Therefore, the number of the general-purpose registers used corresponding to the mode can be changed, and efficiency of use of the general-purpose registers can be optimized.

Type: Grant

Filed: May 19, 2010

Date of Patent: July 22, 2014

Assignee: Renesas Electronics Corporation

Inventors: Sugako Otani, Hiroyuki Kondo
INSTRUCTION AND LOGIC TO PROVIDE VECTOR SCATTER-OP AND GATHER-OP FUNCTIONALITY

Publication number: 20140201498

Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.

Type: Application

Filed: September 26, 2011

Publication date: July 17, 2014

Applicant: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
Processor to execute shift right merge instructions

Patent number: 8782377

Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.

Type: Grant

Filed: May 22, 2012

Date of Patent: July 15, 2014

Assignee: Intel Corporation

Inventors: Julien Sebot, William W. Macy, Eric Debes, Huy V. Nguyen
METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM

Publication number: 20140195782

Abstract: A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.

Type: Application

Filed: March 30, 2012

Publication date: July 10, 2014

Inventors: Kirk S. Yap, Gilbert M. Wolrich, James D. Guilford, Vinodh Gopal, Erdinc Ozturk, Sean M. Gulley, Wajdi K. Feghali, Martin G. Dixon
METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT RESOLUTION WITH VECTOR POPULATION COUNT FUNCTIONALITY

Publication number: 20140189307

Abstract: Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Application

Filed: December 29, 2012

Publication date: July 3, 2014

Inventors: Robert Valentine, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Brett L. Toll

prev 1 2 3 4 5 6 7 8 … next