Instruction Decoding (e.g., By Microinstruction, Start Address Generator, Hardwired) Patents (Class 712/208)
  • Publication number: 20150026435
    Abstract: A method and circuit arrangement selectively source and/or write data from/to extended registers of an extended register file based in part on whether an operand address of an instruction references a primary register of primary register file configured to store a pointer to the extended register. Control logic connected to the primary register file and the extended register file determines whether the operand address references a primary register configured to store a pointer, and responsive to the determination, the control logic causes execution logic to selectively source and/or write data from/to the extended register pointed to by the pointer stored in the referenced primary register.
    Type: Application
    Filed: July 22, 2013
    Publication date: January 22, 2015
    Applicant: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20150019842
    Abstract: This invention is a digital signal processor capable of performing correlation of data with pseudo noise for code division multiple access (CDMA) decoding using clusters. Each cluster includes plural multipliers. The multipliers multiply real and imaginary parts of packed data by corresponding pseudo noise data. Within a cluster the real parts and the imaginary parts of the products are summed separately. This forms plural complex number outputs equal in number to the number of clusters. The pseudo noise data is offset relative to the data input differing amounts for different clusters. The clusters are divided into first half clusters receiving data from even numbered slots and second half clusters receiving data from odd numbered slots. The correlation unit includes a mask input to selectively zero a multiplier product.
    Type: Application
    Filed: July 9, 2014
    Publication date: January 15, 2015
    Inventors: Mujibur Rahman, Peter Richard Dent, Timothy David Anderson, Duc Quang Bui
  • Patent number: 8935512
    Abstract: It is possible to increase the processor instruction set design job efficiency and reduce workload on designers in investigation of an instruction set. An instruction operation code generation system includes an operation code bit width decision means, an instruction sorting means, and an operation code value decision means. The operation code bit width decision means decides a bit width that can be assigned for an operation code of each instruction according to specification data associated with a processor instruction set. The instruction sorting means sorts the instructions according to the operation code bit width. The operation code value decision means decides the value of the operation code of each instruction.
    Type: Grant
    Filed: November 19, 2007
    Date of Patent: January 13, 2015
    Assignee: NEC Corporation
    Inventor: Takahiro Kumura
  • Publication number: 20150012726
    Abstract: A processor includes a microcode storage comprising a plurality of microcode flows and a decode logic coupled to the microcode storage. The decode logic is configured to receive a first instruction, decode the first instruction into an entry point vector to a first microcode flow in the microcode storage, the entry point vector comprising a first indicator specifying a number of clock cycles associated with the first microcode flow, initiate the microcode storage, wherein the microcode storage inserts microinstructions of the first microcode flow into an instruction queue, count clock cycles after initiating the microcode storage, and decode a second instruction without first receiving a return from the microcode storage, wherein the second instruction is decoded at a particular clock cycle based on the number of clock cycles associated with the first microcode flow.
    Type: Application
    Filed: July 3, 2013
    Publication date: January 8, 2015
    Inventors: Jonathan D. Combs, Jonathan Y. Tong
  • Publication number: 20150012727
    Abstract: A processing device has: a plurality of registers configured to correspond to a plurality of accessible register windows; and an instruction decoder configured to inhibit, when, during an execution of a first instruction of changing a number of a current register window by one, a second instruction of changing the number of the current register window by one in a direction same as a direction of the first instruction is input, a decode of the second instruction until when the execution of the first instruction is completed, and to perform, when, during the execution of the first instruction of changing the number of the current register window by one, a second instruction of changing the number of the current register window by one in a direction opposite to the direction of the first instruction is input, a decode of the second instruction during the execution of the first instruction.
    Type: Application
    Filed: May 16, 2014
    Publication date: January 8, 2015
    Applicant: FUJITSU LIMITED
    Inventor: YASUNOBU AKIZUKI
  • Patent number: 8930678
    Abstract: Techniques to increase the consumption rate of raw instruction bytes within an instruction fetch unit. An instruction fetch unit according to embodiments of the present invention may include a prefetch buffer, a set of bypass multiplexers, an array of bypass latches, a byte-block multiplexer, an instruction alignment multiplexer, a predecode cache, and an instruction length decoder. Raw instruction bytes may be steered from the bypass latches into macro-instructions for consumption by the instruction length decoder, which may generate micro-instructions from the macro-instructions. Embodiments of the present invention may de-couple a latency for reading raw instruction bytes from the prefetch buffer from consuming raw instruction bytes by the instruction length decoder.
    Type: Grant
    Filed: April 26, 2012
    Date of Patent: January 6, 2015
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Hoichi Cheong, Jonathan Y. Tong
  • Publication number: 20150006856
    Abstract: A method of an aspect is performed by a processor. The method includes receiving a partial width load instruction. The partial width load instruction indicates a memory location of a memory as a source operand and indicates a register as a destination operand. The method includes loading data from the indicated memory location to the processor in response to the partial width load instruction. The method includes writing at least a portion of the loaded data to a partial width of the register in response to the partial width load instruction. The method includes finishing writing the register with a set of bits stored in a remaining width of the register that have bit values that depend on a partial width load mode of the processor. The partial width load instruction does not indicate the partial width load mode. Other methods, processors, and systems are also disclosed.
    Type: Application
    Filed: June 28, 2013
    Publication date: January 1, 2015
    Inventors: William C. RASH, Yazmin A. SANTIAGO, Martin Guy DIXON
  • Publication number: 20150006853
    Abstract: A processor is described that includes an instruction execution pipeline having an instruction fetch unit to fetch and decode an instruction. The processor also has an execution unit to execute the instruction. The execution unit has a state machine and content addressable memory (CAM) circuitry. The state machine is to receive a pointer to a stream of DEFLATE encoded information, fetch a section of the DEFLATE encoded information and apply the section of the DEFLATE encoded information to the CAM to obtain decoded DEFLATE information.
    Type: Application
    Filed: June 29, 2013
    Publication date: January 1, 2015
    Inventors: Vinodh GOPAL, James D. GUILFORD, Gilbert M. WOLRICH
  • Publication number: 20150006858
    Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.
    Type: Application
    Filed: June 28, 2013
    Publication date: January 1, 2015
    Inventors: BRET L. TOLL, Buford M. Guy, Ronak Singhal, Mishali Nail
  • Publication number: 20150006857
    Abstract: A processor includes a plurality of packed data registers. The processor also includes a decode unit to decode a packed variable length code point length determination instruction. The instruction is to indicate a first source packed data that is to have a plurality of packed variable length code points that are each to represent a character. The instruction is also to indicate a destination storage location. The processor also has an execution unit coupled with the decode unit and the packed data registers. The execution unit, in response to the instruction, is to store a result packed data in the indicated destination storage location. The result packed data is to have a length for each of the plurality of the packed variable length code points. Other processors, methods, systems, and instructions are also disclosed.
    Type: Application
    Filed: June 28, 2013
    Publication date: January 1, 2015
    Inventor: SHIH J. KUO
  • Publication number: 20140380021
    Abstract: A processor includes: a first instruction processing unit that, in a first mode, receives a first input including instructions included in a first instruction set; a second instruction processing unit that, in a second mode, receives the first input, the second instruction processing unit having a simpler configuration than the first instruction processing unit; a third instruction processing unit that, in a third mode, receives a second input including instructions included in a second instruction set, the second instruction set including part of the instructions included in the first instruction set, the third instruction processing unit having a simpler configuration than the first instruction processing unit and the second instruction processing unit; a selection unit that selects, according to a mode, a result of decoding by one of the instruction processing units; and an instruction execution unit that executes an instruction according to the selected result of decoding.
    Type: Application
    Filed: September 5, 2014
    Publication date: December 25, 2014
    Inventor: Naoki OCHI
  • Patent number: 8914615
    Abstract: A processor core supports execution of program instruction from both a first instruction set and a second instruction set. An architectural register file 18 containing architectural registers is shared by the two instruction sets. The two instruction sets employ logical register specifiers which for at least some values of those logical registers specifiers correspond to different architectural registers within the architectural register file 18. A first decoder 4 for the first instruction set and a second decoder 6 for the second instruction set serve to decode the logical register specifiers to a common register addressing format. This common register addressing format is used to supply register specifiers to renaming circuitry 10 for supporting register renaming in conjunction with a physical register file 16 and an architectural register file 18.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: December 16, 2014
    Assignee: ARM Limited
    Inventors: Glen Andrew Harris, James Nolan Hardage, Mark Carpenter Glass
  • Patent number: 8910181
    Abstract: A circuit configuration for a data processing system and a corresponding method for executing multiple tasks by way of a central processing unit having a processing capacity assigned to the processing unit, the circuit configuration being configured to distribute the processing capacity of the processing unit uniformly among the respective tasks, and to process the respective tasks in time-offset fashion until they are respectively executed.
    Type: Grant
    Filed: March 17, 2011
    Date of Patent: December 9, 2014
    Assignee: Robert Bosch GmbH
    Inventors: Eberhard Boehl, Ruben Bartholomae
  • Publication number: 20140344553
    Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.
    Type: Application
    Filed: May 20, 2013
    Publication date: November 20, 2014
    Inventors: Christopher J. Hughes, Yen-Kuang (Y.K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
  • Publication number: 20140344552
    Abstract: In accordance with embodiments disclosed herein, there is provided systems and methods for providing status of a processing device with a periodic synchronization point in an instruction tracing system. For example, the method may include generating a boundary packet based on a unique byte pattern in a packet log. The boundary packet provides a starting point for packet decode. The method may also include generating a plurality of state packets based on status information of the processor. The plurality of state packets follows the boundary packet when outputted into the packet log.
    Type: Application
    Filed: May 16, 2013
    Publication date: November 20, 2014
    Inventors: Frank Binns, Matthew C. Merten, Mayank Bomb, Beeman C. Strong, Peter Lachner, Jason W. Brandt, Itamar Kazachinsky, Ofer Levy, MD A. Rahman
  • Patent number: 8892851
    Abstract: A circuit arrangement and method support compression and expansion of instruction opcodes by detecting successive address targeting and decoding a first opcode of an instruction into a second opcode in response to detecting successive address targeting. The circuit arrangement and method execute instructions in an instruction stream and detect successive address targeting by two or more instructions in the instruction stream without the targeted address being utilized as a source address in an instruction executed between the first and second instructions in the instruction stream. Then, based on that detection, the opcode of the second instruction is modified, changed, or appended to such that a different opcode is indicated by the second instruction, such that executing the second instruction causes a different unique type of operation to be performed.
    Type: Grant
    Filed: November 2, 2011
    Date of Patent: November 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20140331029
    Abstract: Method and apparatus are provided for synchronising execution of a plurality of threads on a multi-threaded processor. A program executed by a thread can have a number of synchronisation points corresponding to points where execution is to be synchronised with another thread. Execution of a thread is paused when it reaches a synchronisation point until at least one other thread with which it is intended to be synchronised reaches a corresponding synchronisation point. Execution is subsequently resumed. A control core maintains status data for threads and can cause a thread that is ready to run to use execution resources that were occupied by a thread that is waiting for a synchronization event.
    Type: Application
    Filed: February 11, 2014
    Publication date: November 6, 2014
    Applicant: IMAGINATION TECHNOLOGIES LIMITED
    Inventor: Yoong Chert Foo
  • Patent number: 8879670
    Abstract: A configurable Turbo-LDPC decoder having A set of P>1 Soft-Input-Soft-Output decoding units (DP0-DPP-1; DPi) for iteratively decoding both Turbo- and LDPC-encoded input data, each of the decoding units having first (I1i) and second (I2i) input ports and first (O1i) and second (O2i) output ports for intermediate data; First and second memories (M1, M2) for storing the intermediate data, each of the first and second memories comprising P independently readable and writable memory blocks having respective input and output ports; and A configurable switching network (SN) for connecting the first input and output ports of the decoding units to the output and input ports of the first memory, and the second input and output ports of the decoding units to the output and input ports of the second memory.
    Type: Grant
    Filed: September 8, 2010
    Date of Patent: November 4, 2014
    Assignee: Agence Spatiale Europeenne
    Inventors: Giuseppe Gentile, Massimo Rovini, Paolo Burzigotti, Luca Fanucci
  • Patent number: 8880852
    Abstract: A method, apparatus, and program product execute instructions of an instruction stream and detect logically non-significant operations in the instruction stream. Then, based on that detection, a target or source address of a subsequent instruction is adjusted. In some instances, doing so enables a greater number of addresses, e.g., registers, to be accessed in a given number of bit positions within an instruction format.
    Type: Grant
    Filed: October 26, 2011
    Date of Patent: November 4, 2014
    Assignee: International Business Machines Corporation
    Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
  • Patent number: 8856496
    Abstract: A microprocessor receives first and second program-adjacent macroinstructions of the microprocessor instruction set architecture. The first macroinstruction loads an operand from a location in memory, performs an arithmetic/logic operation using the loaded operand to generate a result, and stores the result back to the memory location. The second macroinstruction jumps to a target address if condition codes satisfy a specified condition and otherwise executes the next sequential instruction. An instruction translator simultaneously translates the first and second program-adjacent macroinstructions into first, second, and third micro-operations for execution by execution units. The first micro-operation calculates the memory location address and loads the operand therefrom.
    Type: Grant
    Filed: February 25, 2011
    Date of Patent: October 7, 2014
    Assignee: Via Technologies, Inc.
    Inventors: G. Glenn Henry, Terry Parks
  • Patent number: 8850410
    Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: September 30, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
  • Patent number: 8850164
    Abstract: A microprocessor receives first, second, and third program-adjacent macroinstructions. The first macroinstruction moves a first operand to a first register from a second register. The second macroinstruction performs an arithmetic/logic operation using the first operand in the second register and a second operand in a third register to generate a result, loads the result back into the first register, and updates condition codes based on the result. The third macroinstruction conditionally jumps to a target address. An instruction translator simultaneously translates the first, second, and third program-adjacent macroinstructions into a single micro-operation for execution by an execution unit. The micro-operation performs the arithmetic/logic operation using the first operand in the second register and the second operand in third register to generate the result, loads the result back into the first register, updates the condition codes based on the result, and conditionally jumps to the target address.
    Type: Grant
    Filed: February 25, 2011
    Date of Patent: September 30, 2014
    Assignee: Via Technologies, Inc.
    Inventor: Terry Parks
  • Publication number: 20140289499
    Abstract: An apparatus for processing data 2 is provided including processing circuitry 24 controlled by an instruction decoder 20 in response to a stream of program instructions. There is also provided dedicated function hardware 12 configured to receive output data from the processing circuitry and to perform a dedicated processing operation. The instruction decoder 20 is responsive to an end instruction 54 and a software processing flag (blend_shade_enabled) to control the processing circuitry to end a current software routine, to generate output data and in dependence upon the software processing flag either trigger processing of the output data by the dedicated function hardware or trigger the processing circuitry to perform a further software routine upon the output data to generate software generated result data instead of hardware generated result data as generated by the dedicated hardware circuitry.
    Type: Application
    Filed: June 9, 2014
    Publication date: September 25, 2014
    Inventors: Simon JONES, Andreas ENGH-HALSTVEDT, Aske Simon CHRISTENSEN
  • Patent number: 8843729
    Abstract: A microprocessor receives first and second program-adjacent macroinstructions of the instruction set architecture of the microprocessor. The first macroinstruction instructs the microprocessor to move a first operand to a first architectural register from a second architectural register. The second macroinstruction instructs the microprocessor to perform an arithmetic/logic operation using the first operand in the second architectural register and a second operand in a third architectural register to generate a result and to load the result back into the first architectural register. An instruction translator simultaneously translates the first and second program-adjacent macroinstructions into a single micro-operation for execution by an execution unit.
    Type: Grant
    Filed: February 25, 2011
    Date of Patent: September 23, 2014
    Assignee: VIA Technologies, Inc.
    Inventor: Terry Parks
  • Publication number: 20140281396
    Abstract: An instruction processing apparatus of an aspect includes a plurality of operation mask registers. The apparatus also includes a decode unit to receive an operation mask consolidation instruction. The operation mask consolidation instruction is to indicate a source operation mask register, of the plurality of operation mask registers, and a destination storage location. The source operation mask register is to include a source operation mask that is to include a plurality of masked elements that are to be disposed within a plurality of unmasked elements. An execution unit is coupled with the decode unit. The execution unit, in response to the operation mask consolidation instruction, is to store a consolidated operation mask in the destination storage location. The consolidated operation mask is to include the unmasked elements from the source operation mask consolidated together. Other apparatus, methods, systems, and instructions are also disclosed.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventor: Ashish Jha
  • Publication number: 20140281397
    Abstract: Fusible instructions and logic provide OR-test and AND-test functionality on multiple test sources. Some embodiments include a processor decode stage to decode a test instruction for execution, the instruction specifying first, second and third source data operands, and an operation type. Execution units, responsive to the decoded test instruction, perform one logical operation, according to the specified operation type, between data from the first and second source data operands, and perform a second logical operation between the data from the third source data operand and the result of the first logical operation to set a condition flag. Some embodiments generate the test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the test instruction through a just-in-time compiler. Some embodiments also fuse the test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
  • Publication number: 20140281389
    Abstract: Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
  • Publication number: 20140281394
    Abstract: An apparatus and method for executing call branch and return branch instructions in a processor by utilizing a link register stack. The processor includes a branch counter that is initialized to zero, and is set to zero each time the processor decodes a link register manipulating instruction other than a call branch instruction. The branch counter is incremented by one each time a call branch instruction is decoded and an address is pushed onto the link register stack. In response to decoding a return branch instruction and provided the branch counter is not zero, a target address for the decoded return branch instruction is popped off the link register stack, the branch counter is decremented, and there is no need to check the target address for correctness.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Rodney Wayne Smith, Jeffery M. Schottmiller, Michael Scott McIlvaine, Brian Michael Stempel, Melinda J. Brown, Daren Eugene Streett
  • Publication number: 20140281398
    Abstract: A processor of an aspect includes decode logic to receive a first instruction and to determine that the first instruction is to be emulated. The processor also includes emulation mode aware post-decode instruction processor logic coupled with the decode logic. The emulation mode aware post-decode instruction processor logic is to process one or more control signals decoded from an instruction. The instruction is one of a set of one or more instructions used to emulate the first instruction. The one or more control signals are to be processed differently by the emulation mode aware post-decode instruction processor logic when in an emulation mode than when not in the emulation mode. Other apparatus are also disclosed as well as methods and systems.
    Type: Application
    Filed: March 16, 2013
    Publication date: September 18, 2014
    Inventors: WILLIAM C. RASH, Martin G. Dixon, Yazmin A. Santiago
  • Publication number: 20140281391
    Abstract: A processor to a store constant value (immediate or literal) in a cache upon decoding a move immediate instruction in which the immediate is to be moved (copied or written) to an architected register. The constant value is stored in an entry in the cache. Each entry in the cache includes a field to indicate whether its stored constant value is valid, and a field to associate the entry with an architected register. Once a constant value is stored in the cache, it is immediately available for forwarding to a processor pipeline where a decoded instruction may need the constant value as an operand.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: James Norris Dieffenderfer, Michael William Morrow, Rodney Wayne Smith, Jeffery M. Schottmiller, Daniel S. Higdon, Michael Scott McIlvaine, Brian Michael Stempel, Kulin N. Kothari
  • Publication number: 20140281395
    Abstract: Systems, methods, and apparatuses for calculating a square of a data value of a first source operand, a square of a data value of a second source operand, and a multiplication of the data of the first and second operands only using one multiplication are described.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventors: Ilya Albrekht, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20140281393
    Abstract: Out-of-order CPUs, devices and methods diminish the time penalty from stalling the pipe to rebuild a rename table, such as due to a misprediction. A microprocessor can include a pipe that has a decoder, a dispatcher, and at least one execution unit. A rename table stores rename data, and a check-point table (“CPT”) stores rename data received from the dispatcher. A Re-Order Buffer (“ROB”) stores ROB data, and has a static mapping relationship with the CPT. If the rename table is flushed, such as due to a misprediction, the rename table is rebuilt at least in part by concurrent copying of rename data stored in the CPT, in coordination with walking the ROB.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Inventors: Ravi Iyengar, Prarthna Santhanakrishnan
  • Publication number: 20140281392
    Abstract: The disclosure provides a micro-processing system operable in a hardware decoder mode and in a translation mode. In the hardware decoder mode, the hardware decoder receives and decodes non-native ISA instructions into native instructions for execution in a processing pipeline. In the translation mode, native translations of non-native ISA instructions are executed in the processing pipeline without using the hardware decoder. The system includes a code portion profile stored in hardware that changes dynamically in response to use of the hardware decoder to execute portions of non-native ISA code. The code portion profile is then used to dynamically form new native translations executable in the translation mode.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Nathan Tuck, Alexander Klaiber, Ross Segelken, David Dunn, Ben Hertzberg, Rupert Brauch, Thomas Kistler, Guillermo J. Rozas, Madhu Swarna
  • Publication number: 20140258683
    Abstract: Instructions and logic provide vector horizontal compare functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read values from data fields of the specified size in the source operand, corresponding to the mask and compare the values for equality. In some embodiments, responsive to a detection of inequality, a trap may be taken. In some alternative embodiments, a flag may be set. In other alternative embodiments, a mask field may be set to a masked state for the corresponding unequal value(s). In some embodiments, responsive to all unmasked data fields of the source operand being equal to a particular value, that value may be broadcast to all data fields of the specified size in the destination operand.
    Type: Application
    Filed: November 30, 2011
    Publication date: September 11, 2014
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
  • Patent number: 8826252
    Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: September 2, 2014
    Assignee: Cray Inc.
    Inventor: Terry D. Greyzck
  • Publication number: 20140229713
    Abstract: A method and circuit arrangement tightly couple together decode logic associated with multiple types of execution units and having varying priorities to enable instructions that are decoded as valid instructions for multiple types of execution units to be forwarded to a highest priority type of execution unit among the multiple types of execution units. Among other benefits, when an auxiliary execution unit is coupled to a general purpose processing core with the decode logic for the auxiliary execution unit tightly coupled with the decode logic for the general purpose processing core, the auxiliary execution unit may be used to effectively overlay new functionality for an existing instruction that is normally executed by the general purpose processing core, e.g., to patch a design flaw in the general purpose processing core or to provide improved performance for specialized applications.
    Type: Application
    Filed: March 11, 2013
    Publication date: August 14, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20140229712
    Abstract: A method, circuit arrangement, and program product for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.
    Type: Application
    Filed: February 27, 2013
    Publication date: August 14, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20140229711
    Abstract: A method, circuit arrangement, and program product for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.
    Type: Application
    Filed: February 13, 2013
    Publication date: August 14, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20140229714
    Abstract: A method and circuit arrangement utilize a register file of an execution unit as a local instruction loop buffer to enable suitable algorithms, such as DSP algorithms, to be fetched and executed directly within the execution unit, and often enabling other logic circuits utilized for other, general purpose workloads to either be powered down or freed up to handle other workloads.
    Type: Application
    Filed: March 12, 2013
    Publication date: August 14, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 8805098
    Abstract: A previously signaled reference picture set (RPS) corresponding to a current picture is indicated. A first flag for a picture in the previously signaled RPS is set if the picture is to be used as a reference picture for the current picture. A bitstream is sent.
    Type: Grant
    Filed: May 31, 2013
    Date of Patent: August 12, 2014
    Assignee: Sharp Laboratories of America, Inc.
    Inventor: Sachin G. Deshpande
  • Patent number: 8806178
    Abstract: A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.
    Type: Grant
    Filed: August 14, 2013
    Date of Patent: August 12, 2014
    Assignee: International Business Machines Corporation
    Inventors: Jane H. Bartik, Lisa C. Heller, Damian L. Osisek, Donald W. Schmidt, Patrick M. West, Jr., Phil C. Yeh
  • Publication number: 20140223142
    Abstract: A Very Long Instruction Word (VLIW) processor having an instruction set with a reduced size resulting in a small number of bits being necessary to specify registers. The VLIW processor includes a register file, and first through third operation units, and executes a very long instruction word. Further, the very long instruction word includes a register specifying field which specifies a least one of the registers in the register file and a plurality of instructions. The operand of each instruction includes bits src1, src2, and dst, which indicate whether or not the registers specified by the register specifying field are to be used as the source register and the destination register.
    Type: Application
    Filed: April 8, 2014
    Publication date: August 7, 2014
    Applicant: Panasonic Corporation
    Inventors: Takahiro KAGEYAMA, Hideshi NISHIDA, Takeshi TANAKA, Kouji NAKAJIMA
  • Patent number: 8799626
    Abstract: A segmental allocation method of expanding RISC processor register includes the steps of a) setting an instruction format of the RISC processor, the destination register field being set having 6 bits to correspond to 64 registers and at least one source register field having at least 4 bits to correspond to at least 16 registers; b) providing two solutions to the problem resulting from that the instruction format in the step a) goes beyond range under some circumstances; and c) setting a register segment allocation algorithm having the steps of c1) providing and grouping a plurality of pseudo registers; c2) prioritizing the pseudo registers in each of the groups; c3) combining the groups pursuant to the priorities thereof; and c4) locating the physical register of lowest computational cost.
    Type: Grant
    Filed: September 9, 2011
    Date of Patent: August 5, 2014
    Assignee: National Chung Cheng University
    Inventors: Rong-Guey Chang, Yuan-Shin Hwang, Chia-Hsien Su
  • Publication number: 20140215189
    Abstract: An apparatus and method includes execution circuitry including a wide operand execution unit configured to allow up to N bits of operand data to be processed during execution of a single instruction. Decoder circuitry decodes and generates, for each instruction, at least one control data block identifying an operation to be performed by the execution circuitry and at least two re-combineable control data blocks for the instruction. Issue queue control circuitry then allocates a slot in the issue queue for each of the at least two data blocks and up to M bits of associated operand data, and marks those allocated slots to identify that they contain re-combineable control data blocks. The issue queue control circuitry issues the combined block to said wide operand execution unit along with the operand data contained in each of the allocated slots for said at least two control data blocks.
    Type: Application
    Filed: January 29, 2013
    Publication date: July 31, 2014
    Applicant: ARM LIMITED
    Inventors: Cedric Denis Robert AIRAUD, Luca SCALABRINO, Frederic Jean Denis ARSANTO, Guillaume SCHON, Frederic Claude Marie PIRY, Albin Pierick TONNERRE
  • Publication number: 20140215188
    Abstract: In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.
    Type: Application
    Filed: January 25, 2013
    Publication date: July 31, 2014
    Applicant: Apple Inc.
    Inventors: John H. Mylius, Gerard R. Williams, III, Shyam Sundar Balasubramanian, Conrado Blasco-Allue
  • Patent number: 8789169
    Abstract: A control unit controls execution of an instruction according to a decode result of an instruction code. A GRA register stores an access attribute for each of the plurality of general-purpose registers. A mode storage unit stores modes for controlling an operation of a CPU. When the control unit makes a request for access to the general-purpose register, register access allowance determining circuit determines whether the access to the general-purpose register in question is to be allowed or not, depending on the access attribute stored in the GRA register and the mode stored in the mode storage unit. Therefore, the number of the general-purpose registers used corresponding to the mode can be changed, and efficiency of use of the general-purpose registers can be optimized.
    Type: Grant
    Filed: May 19, 2010
    Date of Patent: July 22, 2014
    Assignee: Renesas Electronics Corporation
    Inventors: Sugako Otani, Hiroyuki Kondo
  • Publication number: 20140201498
    Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
    Type: Application
    Filed: September 26, 2011
    Publication date: July 17, 2014
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
  • Patent number: 8782377
    Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.
    Type: Grant
    Filed: May 22, 2012
    Date of Patent: July 15, 2014
    Assignee: Intel Corporation
    Inventors: Julien Sebot, William W. Macy, Eric Debes, Huy V. Nguyen
  • Publication number: 20140195782
    Abstract: A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.
    Type: Application
    Filed: March 30, 2012
    Publication date: July 10, 2014
    Inventors: Kirk S. Yap, Gilbert M. Wolrich, James D. Guilford, Vinodh Gopal, Erdinc Ozturk, Sean M. Gulley, Wajdi K. Feghali, Martin G. Dixon
  • Publication number: 20140189307
    Abstract: Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: December 29, 2012
    Publication date: July 3, 2014
    Inventors: Robert Valentine, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Brett L. Toll