Instruction Decoding (e.g., By Microinstruction, Start Address Generator, Hardwired) Patents (Class 712/208)
-
Publication number: 20150026435Abstract: A method and circuit arrangement selectively source and/or write data from/to extended registers of an extended register file based in part on whether an operand address of an instruction references a primary register of primary register file configured to store a pointer to the extended register. Control logic connected to the primary register file and the extended register file determines whether the operand address references a primary register configured to store a pointer, and responsive to the determination, the control logic causes execution logic to selectively source and/or write data from/to the extended register pointed to by the pointer stored in the referenced primary register.Type: ApplicationFiled: July 22, 2013Publication date: January 22, 2015Applicant: International Business Machines CorporationInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Publication number: 20150019842Abstract: This invention is a digital signal processor capable of performing correlation of data with pseudo noise for code division multiple access (CDMA) decoding using clusters. Each cluster includes plural multipliers. The multipliers multiply real and imaginary parts of packed data by corresponding pseudo noise data. Within a cluster the real parts and the imaginary parts of the products are summed separately. This forms plural complex number outputs equal in number to the number of clusters. The pseudo noise data is offset relative to the data input differing amounts for different clusters. The clusters are divided into first half clusters receiving data from even numbered slots and second half clusters receiving data from odd numbered slots. The correlation unit includes a mask input to selectively zero a multiplier product.Type: ApplicationFiled: July 9, 2014Publication date: January 15, 2015Inventors: Mujibur Rahman, Peter Richard Dent, Timothy David Anderson, Duc Quang Bui
-
Patent number: 8935512Abstract: It is possible to increase the processor instruction set design job efficiency and reduce workload on designers in investigation of an instruction set. An instruction operation code generation system includes an operation code bit width decision means, an instruction sorting means, and an operation code value decision means. The operation code bit width decision means decides a bit width that can be assigned for an operation code of each instruction according to specification data associated with a processor instruction set. The instruction sorting means sorts the instructions according to the operation code bit width. The operation code value decision means decides the value of the operation code of each instruction.Type: GrantFiled: November 19, 2007Date of Patent: January 13, 2015Assignee: NEC CorporationInventor: Takahiro Kumura
-
Publication number: 20150012726Abstract: A processor includes a microcode storage comprising a plurality of microcode flows and a decode logic coupled to the microcode storage. The decode logic is configured to receive a first instruction, decode the first instruction into an entry point vector to a first microcode flow in the microcode storage, the entry point vector comprising a first indicator specifying a number of clock cycles associated with the first microcode flow, initiate the microcode storage, wherein the microcode storage inserts microinstructions of the first microcode flow into an instruction queue, count clock cycles after initiating the microcode storage, and decode a second instruction without first receiving a return from the microcode storage, wherein the second instruction is decoded at a particular clock cycle based on the number of clock cycles associated with the first microcode flow.Type: ApplicationFiled: July 3, 2013Publication date: January 8, 2015Inventors: Jonathan D. Combs, Jonathan Y. Tong
-
Publication number: 20150012727Abstract: A processing device has: a plurality of registers configured to correspond to a plurality of accessible register windows; and an instruction decoder configured to inhibit, when, during an execution of a first instruction of changing a number of a current register window by one, a second instruction of changing the number of the current register window by one in a direction same as a direction of the first instruction is input, a decode of the second instruction until when the execution of the first instruction is completed, and to perform, when, during the execution of the first instruction of changing the number of the current register window by one, a second instruction of changing the number of the current register window by one in a direction opposite to the direction of the first instruction is input, a decode of the second instruction during the execution of the first instruction.Type: ApplicationFiled: May 16, 2014Publication date: January 8, 2015Applicant: FUJITSU LIMITEDInventor: YASUNOBU AKIZUKI
-
Patent number: 8930678Abstract: Techniques to increase the consumption rate of raw instruction bytes within an instruction fetch unit. An instruction fetch unit according to embodiments of the present invention may include a prefetch buffer, a set of bypass multiplexers, an array of bypass latches, a byte-block multiplexer, an instruction alignment multiplexer, a predecode cache, and an instruction length decoder. Raw instruction bytes may be steered from the bypass latches into macro-instructions for consumption by the instruction length decoder, which may generate micro-instructions from the macro-instructions. Embodiments of the present invention may de-couple a latency for reading raw instruction bytes from the prefetch buffer from consuming raw instruction bytes by the instruction length decoder.Type: GrantFiled: April 26, 2012Date of Patent: January 6, 2015Assignee: Intel CorporationInventors: Venkateswara R. Madduri, Hoichi Cheong, Jonathan Y. Tong
-
Publication number: 20150006856Abstract: A method of an aspect is performed by a processor. The method includes receiving a partial width load instruction. The partial width load instruction indicates a memory location of a memory as a source operand and indicates a register as a destination operand. The method includes loading data from the indicated memory location to the processor in response to the partial width load instruction. The method includes writing at least a portion of the loaded data to a partial width of the register in response to the partial width load instruction. The method includes finishing writing the register with a set of bits stored in a remaining width of the register that have bit values that depend on a partial width load mode of the processor. The partial width load instruction does not indicate the partial width load mode. Other methods, processors, and systems are also disclosed.Type: ApplicationFiled: June 28, 2013Publication date: January 1, 2015Inventors: William C. RASH, Yazmin A. SANTIAGO, Martin Guy DIXON
-
Publication number: 20150006853Abstract: A processor is described that includes an instruction execution pipeline having an instruction fetch unit to fetch and decode an instruction. The processor also has an execution unit to execute the instruction. The execution unit has a state machine and content addressable memory (CAM) circuitry. The state machine is to receive a pointer to a stream of DEFLATE encoded information, fetch a section of the DEFLATE encoded information and apply the section of the DEFLATE encoded information to the CAM to obtain decoded DEFLATE information.Type: ApplicationFiled: June 29, 2013Publication date: January 1, 2015Inventors: Vinodh GOPAL, James D. GUILFORD, Gilbert M. WOLRICH
-
Publication number: 20150006858Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.Type: ApplicationFiled: June 28, 2013Publication date: January 1, 2015Inventors: BRET L. TOLL, Buford M. Guy, Ronak Singhal, Mishali Nail
-
Publication number: 20150006857Abstract: A processor includes a plurality of packed data registers. The processor also includes a decode unit to decode a packed variable length code point length determination instruction. The instruction is to indicate a first source packed data that is to have a plurality of packed variable length code points that are each to represent a character. The instruction is also to indicate a destination storage location. The processor also has an execution unit coupled with the decode unit and the packed data registers. The execution unit, in response to the instruction, is to store a result packed data in the indicated destination storage location. The result packed data is to have a length for each of the plurality of the packed variable length code points. Other processors, methods, systems, and instructions are also disclosed.Type: ApplicationFiled: June 28, 2013Publication date: January 1, 2015Inventor: SHIH J. KUO
-
Publication number: 20140380021Abstract: A processor includes: a first instruction processing unit that, in a first mode, receives a first input including instructions included in a first instruction set; a second instruction processing unit that, in a second mode, receives the first input, the second instruction processing unit having a simpler configuration than the first instruction processing unit; a third instruction processing unit that, in a third mode, receives a second input including instructions included in a second instruction set, the second instruction set including part of the instructions included in the first instruction set, the third instruction processing unit having a simpler configuration than the first instruction processing unit and the second instruction processing unit; a selection unit that selects, according to a mode, a result of decoding by one of the instruction processing units; and an instruction execution unit that executes an instruction according to the selected result of decoding.Type: ApplicationFiled: September 5, 2014Publication date: December 25, 2014Inventor: Naoki OCHI
-
Patent number: 8914615Abstract: A processor core supports execution of program instruction from both a first instruction set and a second instruction set. An architectural register file 18 containing architectural registers is shared by the two instruction sets. The two instruction sets employ logical register specifiers which for at least some values of those logical registers specifiers correspond to different architectural registers within the architectural register file 18. A first decoder 4 for the first instruction set and a second decoder 6 for the second instruction set serve to decode the logical register specifiers to a common register addressing format. This common register addressing format is used to supply register specifiers to renaming circuitry 10 for supporting register renaming in conjunction with a physical register file 16 and an architectural register file 18.Type: GrantFiled: December 2, 2011Date of Patent: December 16, 2014Assignee: ARM LimitedInventors: Glen Andrew Harris, James Nolan Hardage, Mark Carpenter Glass
-
Patent number: 8910181Abstract: A circuit configuration for a data processing system and a corresponding method for executing multiple tasks by way of a central processing unit having a processing capacity assigned to the processing unit, the circuit configuration being configured to distribute the processing capacity of the processing unit uniformly among the respective tasks, and to process the respective tasks in time-offset fashion until they are respectively executed.Type: GrantFiled: March 17, 2011Date of Patent: December 9, 2014Assignee: Robert Bosch GmbHInventors: Eberhard Boehl, Ruben Bartholomae
-
Publication number: 20140344553Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.Type: ApplicationFiled: May 20, 2013Publication date: November 20, 2014Inventors: Christopher J. Hughes, Yen-Kuang (Y.K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
-
Publication number: 20140344552Abstract: In accordance with embodiments disclosed herein, there is provided systems and methods for providing status of a processing device with a periodic synchronization point in an instruction tracing system. For example, the method may include generating a boundary packet based on a unique byte pattern in a packet log. The boundary packet provides a starting point for packet decode. The method may also include generating a plurality of state packets based on status information of the processor. The plurality of state packets follows the boundary packet when outputted into the packet log.Type: ApplicationFiled: May 16, 2013Publication date: November 20, 2014Inventors: Frank Binns, Matthew C. Merten, Mayank Bomb, Beeman C. Strong, Peter Lachner, Jason W. Brandt, Itamar Kazachinsky, Ofer Levy, MD A. Rahman
-
Patent number: 8892851Abstract: A circuit arrangement and method support compression and expansion of instruction opcodes by detecting successive address targeting and decoding a first opcode of an instruction into a second opcode in response to detecting successive address targeting. The circuit arrangement and method execute instructions in an instruction stream and detect successive address targeting by two or more instructions in the instruction stream without the targeted address being utilized as a source address in an instruction executed between the first and second instructions in the instruction stream. Then, based on that detection, the opcode of the second instruction is modified, changed, or appended to such that a different opcode is indicated by the second instruction, such that executing the second instruction causes a different unique type of operation to be performed.Type: GrantFiled: November 2, 2011Date of Patent: November 18, 2014Assignee: International Business Machines CorporationInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Publication number: 20140331029Abstract: Method and apparatus are provided for synchronising execution of a plurality of threads on a multi-threaded processor. A program executed by a thread can have a number of synchronisation points corresponding to points where execution is to be synchronised with another thread. Execution of a thread is paused when it reaches a synchronisation point until at least one other thread with which it is intended to be synchronised reaches a corresponding synchronisation point. Execution is subsequently resumed. A control core maintains status data for threads and can cause a thread that is ready to run to use execution resources that were occupied by a thread that is waiting for a synchronization event.Type: ApplicationFiled: February 11, 2014Publication date: November 6, 2014Applicant: IMAGINATION TECHNOLOGIES LIMITEDInventor: Yoong Chert Foo
-
Patent number: 8879670Abstract: A configurable Turbo-LDPC decoder having A set of P>1 Soft-Input-Soft-Output decoding units (DP0-DPP-1; DPi) for iteratively decoding both Turbo- and LDPC-encoded input data, each of the decoding units having first (I1i) and second (I2i) input ports and first (O1i) and second (O2i) output ports for intermediate data; First and second memories (M1, M2) for storing the intermediate data, each of the first and second memories comprising P independently readable and writable memory blocks having respective input and output ports; and A configurable switching network (SN) for connecting the first input and output ports of the decoding units to the output and input ports of the first memory, and the second input and output ports of the decoding units to the output and input ports of the second memory.Type: GrantFiled: September 8, 2010Date of Patent: November 4, 2014Assignee: Agence Spatiale EuropeenneInventors: Giuseppe Gentile, Massimo Rovini, Paolo Burzigotti, Luca Fanucci
-
Patent number: 8880852Abstract: A method, apparatus, and program product execute instructions of an instruction stream and detect logically non-significant operations in the instruction stream. Then, based on that detection, a target or source address of a subsequent instruction is adjusted. In some instances, doing so enables a greater number of addresses, e.g., registers, to be accessed in a given number of bit positions within an instruction format.Type: GrantFiled: October 26, 2011Date of Patent: November 4, 2014Assignee: International Business Machines CorporationInventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
-
Patent number: 8856496Abstract: A microprocessor receives first and second program-adjacent macroinstructions of the microprocessor instruction set architecture. The first macroinstruction loads an operand from a location in memory, performs an arithmetic/logic operation using the loaded operand to generate a result, and stores the result back to the memory location. The second macroinstruction jumps to a target address if condition codes satisfy a specified condition and otherwise executes the next sequential instruction. An instruction translator simultaneously translates the first and second program-adjacent macroinstructions into first, second, and third micro-operations for execution by execution units. The first micro-operation calculates the memory location address and loads the operand therefrom.Type: GrantFiled: February 25, 2011Date of Patent: October 7, 2014Assignee: Via Technologies, Inc.Inventors: G. Glenn Henry, Terry Parks
-
Patent number: 8850410Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.Type: GrantFiled: January 29, 2010Date of Patent: September 30, 2014Assignee: International Business Machines CorporationInventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
-
Patent number: 8850164Abstract: A microprocessor receives first, second, and third program-adjacent macroinstructions. The first macroinstruction moves a first operand to a first register from a second register. The second macroinstruction performs an arithmetic/logic operation using the first operand in the second register and a second operand in a third register to generate a result, loads the result back into the first register, and updates condition codes based on the result. The third macroinstruction conditionally jumps to a target address. An instruction translator simultaneously translates the first, second, and third program-adjacent macroinstructions into a single micro-operation for execution by an execution unit. The micro-operation performs the arithmetic/logic operation using the first operand in the second register and the second operand in third register to generate the result, loads the result back into the first register, updates the condition codes based on the result, and conditionally jumps to the target address.Type: GrantFiled: February 25, 2011Date of Patent: September 30, 2014Assignee: Via Technologies, Inc.Inventor: Terry Parks
-
Publication number: 20140289499Abstract: An apparatus for processing data 2 is provided including processing circuitry 24 controlled by an instruction decoder 20 in response to a stream of program instructions. There is also provided dedicated function hardware 12 configured to receive output data from the processing circuitry and to perform a dedicated processing operation. The instruction decoder 20 is responsive to an end instruction 54 and a software processing flag (blend_shade_enabled) to control the processing circuitry to end a current software routine, to generate output data and in dependence upon the software processing flag either trigger processing of the output data by the dedicated function hardware or trigger the processing circuitry to perform a further software routine upon the output data to generate software generated result data instead of hardware generated result data as generated by the dedicated hardware circuitry.Type: ApplicationFiled: June 9, 2014Publication date: September 25, 2014Inventors: Simon JONES, Andreas ENGH-HALSTVEDT, Aske Simon CHRISTENSEN
-
Patent number: 8843729Abstract: A microprocessor receives first and second program-adjacent macroinstructions of the instruction set architecture of the microprocessor. The first macroinstruction instructs the microprocessor to move a first operand to a first architectural register from a second architectural register. The second macroinstruction instructs the microprocessor to perform an arithmetic/logic operation using the first operand in the second architectural register and a second operand in a third architectural register to generate a result and to load the result back into the first architectural register. An instruction translator simultaneously translates the first and second program-adjacent macroinstructions into a single micro-operation for execution by an execution unit.Type: GrantFiled: February 25, 2011Date of Patent: September 23, 2014Assignee: VIA Technologies, Inc.Inventor: Terry Parks
-
Publication number: 20140281396Abstract: An instruction processing apparatus of an aspect includes a plurality of operation mask registers. The apparatus also includes a decode unit to receive an operation mask consolidation instruction. The operation mask consolidation instruction is to indicate a source operation mask register, of the plurality of operation mask registers, and a destination storage location. The source operation mask register is to include a source operation mask that is to include a plurality of masked elements that are to be disposed within a plurality of unmasked elements. An execution unit is coupled with the decode unit. The execution unit, in response to the operation mask consolidation instruction, is to store a consolidated operation mask in the destination storage location. The consolidated operation mask is to include the unmasked elements from the source operation mask consolidated together. Other apparatus, methods, systems, and instructions are also disclosed.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventor: Ashish Jha
-
Publication number: 20140281397Abstract: Fusible instructions and logic provide OR-test and AND-test functionality on multiple test sources. Some embodiments include a processor decode stage to decode a test instruction for execution, the instruction specifying first, second and third source data operands, and an operation type. Execution units, responsive to the decoded test instruction, perform one logical operation, according to the specified operation type, between data from the first and second source data operands, and perform a second logical operation between the data from the third source data operand and the result of the first logical operation to set a condition flag. Some embodiments generate the test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the test instruction through a just-in-time compiler. Some embodiments also fuse the test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
-
Publication number: 20140281389Abstract: Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
-
Publication number: 20140281394Abstract: An apparatus and method for executing call branch and return branch instructions in a processor by utilizing a link register stack. The processor includes a branch counter that is initialized to zero, and is set to zero each time the processor decodes a link register manipulating instruction other than a call branch instruction. The branch counter is incremented by one each time a call branch instruction is decoded and an address is pushed onto the link register stack. In response to decoding a return branch instruction and provided the branch counter is not zero, a target address for the decoded return branch instruction is popped off the link register stack, the branch counter is decremented, and there is no need to check the target address for correctness.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Applicant: QUALCOMM INCORPORATEDInventors: Rodney Wayne Smith, Jeffery M. Schottmiller, Michael Scott McIlvaine, Brian Michael Stempel, Melinda J. Brown, Daren Eugene Streett
-
Publication number: 20140281398Abstract: A processor of an aspect includes decode logic to receive a first instruction and to determine that the first instruction is to be emulated. The processor also includes emulation mode aware post-decode instruction processor logic coupled with the decode logic. The emulation mode aware post-decode instruction processor logic is to process one or more control signals decoded from an instruction. The instruction is one of a set of one or more instructions used to emulate the first instruction. The one or more control signals are to be processed differently by the emulation mode aware post-decode instruction processor logic when in an emulation mode than when not in the emulation mode. Other apparatus are also disclosed as well as methods and systems.Type: ApplicationFiled: March 16, 2013Publication date: September 18, 2014Inventors: WILLIAM C. RASH, Martin G. Dixon, Yazmin A. Santiago
-
Publication number: 20140281391Abstract: A processor to a store constant value (immediate or literal) in a cache upon decoding a move immediate instruction in which the immediate is to be moved (copied or written) to an architected register. The constant value is stored in an entry in the cache. Each entry in the cache includes a field to indicate whether its stored constant value is valid, and a field to associate the entry with an architected register. Once a constant value is stored in the cache, it is immediately available for forwarding to a processor pipeline where a decoded instruction may need the constant value as an operand.Type: ApplicationFiled: March 14, 2013Publication date: September 18, 2014Applicant: QUALCOMM INCORPORATEDInventors: James Norris Dieffenderfer, Michael William Morrow, Rodney Wayne Smith, Jeffery M. Schottmiller, Daniel S. Higdon, Michael Scott McIlvaine, Brian Michael Stempel, Kulin N. Kothari
-
Publication number: 20140281395Abstract: Systems, methods, and apparatuses for calculating a square of a data value of a first source operand, a square of a data value of a second source operand, and a multiplication of the data of the first and second operands only using one multiplication are described.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventors: Ilya Albrekht, Elmoustapha Ould-Ahmed-Vall
-
Publication number: 20140281393Abstract: Out-of-order CPUs, devices and methods diminish the time penalty from stalling the pipe to rebuild a rename table, such as due to a misprediction. A microprocessor can include a pipe that has a decoder, a dispatcher, and at least one execution unit. A rename table stores rename data, and a check-point table (“CPT”) stores rename data received from the dispatcher. A Re-Order Buffer (“ROB”) stores ROB data, and has a static mapping relationship with the CPT. If the rename table is flushed, such as due to a misprediction, the rename table is rebuilt at least in part by concurrent copying of rename data stored in the CPT, in coordination with walking the ROB.Type: ApplicationFiled: March 14, 2013Publication date: September 18, 2014Inventors: Ravi Iyengar, Prarthna Santhanakrishnan
-
Publication number: 20140281392Abstract: The disclosure provides a micro-processing system operable in a hardware decoder mode and in a translation mode. In the hardware decoder mode, the hardware decoder receives and decodes non-native ISA instructions into native instructions for execution in a processing pipeline. In the translation mode, native translations of non-native ISA instructions are executed in the processing pipeline without using the hardware decoder. The system includes a code portion profile stored in hardware that changes dynamically in response to use of the hardware decoder to execute portions of non-native ISA code. The code portion profile is then used to dynamically form new native translations executable in the translation mode.Type: ApplicationFiled: March 14, 2013Publication date: September 18, 2014Applicant: NVIDIA CORPORATIONInventors: Nathan Tuck, Alexander Klaiber, Ross Segelken, David Dunn, Ben Hertzberg, Rupert Brauch, Thomas Kistler, Guillermo J. Rozas, Madhu Swarna
-
Publication number: 20140258683Abstract: Instructions and logic provide vector horizontal compare functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read values from data fields of the specified size in the source operand, corresponding to the mask and compare the values for equality. In some embodiments, responsive to a detection of inequality, a trap may be taken. In some alternative embodiments, a flag may be set. In other alternative embodiments, a mask field may be set to a masked state for the corresponding unequal value(s). In some embodiments, responsive to all unmasked data fields of the source operand being equal to a particular value, that value may be broadcast to all data fields of the specified size in the destination operand.Type: ApplicationFiled: November 30, 2011Publication date: September 11, 2014Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
-
Patent number: 8826252Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.Type: GrantFiled: June 12, 2009Date of Patent: September 2, 2014Assignee: Cray Inc.Inventor: Terry D. Greyzck
-
Publication number: 20140229713Abstract: A method and circuit arrangement tightly couple together decode logic associated with multiple types of execution units and having varying priorities to enable instructions that are decoded as valid instructions for multiple types of execution units to be forwarded to a highest priority type of execution unit among the multiple types of execution units. Among other benefits, when an auxiliary execution unit is coupled to a general purpose processing core with the decode logic for the auxiliary execution unit tightly coupled with the decode logic for the general purpose processing core, the auxiliary execution unit may be used to effectively overlay new functionality for an existing instruction that is normally executed by the general purpose processing core, e.g., to patch a design flaw in the general purpose processing core or to provide improved performance for specialized applications.Type: ApplicationFiled: March 11, 2013Publication date: August 14, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Publication number: 20140229712Abstract: A method, circuit arrangement, and program product for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.Type: ApplicationFiled: February 27, 2013Publication date: August 14, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Publication number: 20140229711Abstract: A method, circuit arrangement, and program product for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.Type: ApplicationFiled: February 13, 2013Publication date: August 14, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Publication number: 20140229714Abstract: A method and circuit arrangement utilize a register file of an execution unit as a local instruction loop buffer to enable suitable algorithms, such as DSP algorithms, to be fetched and executed directly within the execution unit, and often enabling other logic circuits utilized for other, general purpose workloads to either be powered down or freed up to handle other workloads.Type: ApplicationFiled: March 12, 2013Publication date: August 14, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Patent number: 8805098Abstract: A previously signaled reference picture set (RPS) corresponding to a current picture is indicated. A first flag for a picture in the previously signaled RPS is set if the picture is to be used as a reference picture for the current picture. A bitstream is sent.Type: GrantFiled: May 31, 2013Date of Patent: August 12, 2014Assignee: Sharp Laboratories of America, Inc.Inventor: Sachin G. Deshpande
-
Patent number: 8806178Abstract: A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.Type: GrantFiled: August 14, 2013Date of Patent: August 12, 2014Assignee: International Business Machines CorporationInventors: Jane H. Bartik, Lisa C. Heller, Damian L. Osisek, Donald W. Schmidt, Patrick M. West, Jr., Phil C. Yeh
-
Publication number: 20140223142Abstract: A Very Long Instruction Word (VLIW) processor having an instruction set with a reduced size resulting in a small number of bits being necessary to specify registers. The VLIW processor includes a register file, and first through third operation units, and executes a very long instruction word. Further, the very long instruction word includes a register specifying field which specifies a least one of the registers in the register file and a plurality of instructions. The operand of each instruction includes bits src1, src2, and dst, which indicate whether or not the registers specified by the register specifying field are to be used as the source register and the destination register.Type: ApplicationFiled: April 8, 2014Publication date: August 7, 2014Applicant: Panasonic CorporationInventors: Takahiro KAGEYAMA, Hideshi NISHIDA, Takeshi TANAKA, Kouji NAKAJIMA
-
Patent number: 8799626Abstract: A segmental allocation method of expanding RISC processor register includes the steps of a) setting an instruction format of the RISC processor, the destination register field being set having 6 bits to correspond to 64 registers and at least one source register field having at least 4 bits to correspond to at least 16 registers; b) providing two solutions to the problem resulting from that the instruction format in the step a) goes beyond range under some circumstances; and c) setting a register segment allocation algorithm having the steps of c1) providing and grouping a plurality of pseudo registers; c2) prioritizing the pseudo registers in each of the groups; c3) combining the groups pursuant to the priorities thereof; and c4) locating the physical register of lowest computational cost.Type: GrantFiled: September 9, 2011Date of Patent: August 5, 2014Assignee: National Chung Cheng UniversityInventors: Rong-Guey Chang, Yuan-Shin Hwang, Chia-Hsien Su
-
Publication number: 20140215189Abstract: An apparatus and method includes execution circuitry including a wide operand execution unit configured to allow up to N bits of operand data to be processed during execution of a single instruction. Decoder circuitry decodes and generates, for each instruction, at least one control data block identifying an operation to be performed by the execution circuitry and at least two re-combineable control data blocks for the instruction. Issue queue control circuitry then allocates a slot in the issue queue for each of the at least two data blocks and up to M bits of associated operand data, and marks those allocated slots to identify that they contain re-combineable control data blocks. The issue queue control circuitry issues the combined block to said wide operand execution unit along with the operand data contained in each of the allocated slots for said at least two control data blocks.Type: ApplicationFiled: January 29, 2013Publication date: July 31, 2014Applicant: ARM LIMITEDInventors: Cedric Denis Robert AIRAUD, Luca SCALABRINO, Frederic Jean Denis ARSANTO, Guillaume SCHON, Frederic Claude Marie PIRY, Albin Pierick TONNERRE
-
Publication number: 20140215188Abstract: In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.Type: ApplicationFiled: January 25, 2013Publication date: July 31, 2014Applicant: Apple Inc.Inventors: John H. Mylius, Gerard R. Williams, III, Shyam Sundar Balasubramanian, Conrado Blasco-Allue
-
Patent number: 8789169Abstract: A control unit controls execution of an instruction according to a decode result of an instruction code. A GRA register stores an access attribute for each of the plurality of general-purpose registers. A mode storage unit stores modes for controlling an operation of a CPU. When the control unit makes a request for access to the general-purpose register, register access allowance determining circuit determines whether the access to the general-purpose register in question is to be allowed or not, depending on the access attribute stored in the GRA register and the mode stored in the mode storage unit. Therefore, the number of the general-purpose registers used corresponding to the mode can be changed, and efficiency of use of the general-purpose registers can be optimized.Type: GrantFiled: May 19, 2010Date of Patent: July 22, 2014Assignee: Renesas Electronics CorporationInventors: Sugako Otani, Hiroyuki Kondo
-
Publication number: 20140201498Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.Type: ApplicationFiled: September 26, 2011Publication date: July 17, 2014Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
-
Patent number: 8782377Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: May 22, 2012Date of Patent: July 15, 2014Assignee: Intel CorporationInventors: Julien Sebot, William W. Macy, Eric Debes, Huy V. Nguyen
-
Publication number: 20140195782Abstract: A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.Type: ApplicationFiled: March 30, 2012Publication date: July 10, 2014Inventors: Kirk S. Yap, Gilbert M. Wolrich, James D. Guilford, Vinodh Gopal, Erdinc Ozturk, Sean M. Gulley, Wajdi K. Feghali, Martin G. Dixon
-
Publication number: 20140189307Abstract: Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.Type: ApplicationFiled: December 29, 2012Publication date: July 3, 2014Inventors: Robert Valentine, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Brett L. Toll