Instruction Decoding (e.g., By Microinstruction, Start Address Generator, Hardwired) Patents (Class 712/208)
-
Publication number: 20140189308Abstract: Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations.Type: ApplicationFiled: December 29, 2012Publication date: July 3, 2014Inventors: Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Brett L. Toll, Mark J. Charney, Milind B. Girkar
-
Publication number: 20140189309Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.Type: ApplicationFiled: December 29, 2012Publication date: July 3, 2014Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
-
Publication number: 20140189287Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.Type: ApplicationFiled: December 27, 2012Publication date: July 3, 2014Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
-
Publication number: 20140189306Abstract: An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled.Type: ApplicationFiled: December 27, 2012Publication date: July 3, 2014Inventors: Matthew C. Merten, Justin M. Deinlein, Yury N. Ilin, Alexandre J. Farcy, Tong Li, Srikanth T. Srinivasan
-
Patent number: 8769247Abstract: Methods and apparatuses are provided for increased efficiency in a processor via early instruction completion. An apparatus is provided for increased efficiency in a processor via early instruction completion. The apparatus comprises an execution unit for processing instructions and determining whether a later issued instruction is ready for completion or an earlier issued instruction is ready for completion and a retire unit for retiring the later issued instruction when the later instruction is ready for completion or to retire the earlier instruction when later instruction is not ready for completion and the earlier issued instruction has a known good completion status. A method is provided for increased efficiency in a processor via early instruction completion. The method comprises completing an earlier issued instruction having a known good completion status ahead of a later issued instruction when the later issued instruction is not ready for completion.Type: GrantFiled: April 15, 2011Date of Patent: July 1, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Michael D Estlick, Kevin Hurd, Jay Fleischman
-
Publication number: 20140181580Abstract: According to one embodiment, a processor includes an instruction decoder to decode an instruction to read a plurality of data elements from memory, the instruction having a first operand specifying a storage location, a second operand specifying a bitmask having one or more bits, each bit corresponding to one of the data elements, and a third operand specifying a memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the instruction, to read one or more data elements speculatively, based on the bitmask specified by the second operand, from a memory location based on the memory address indicated by the third operand, and to store the one or more data elements in the storage location indicated by the first operand.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Inventors: Jayashankar BHARADWAJ, Nalini VASUDEVAN, Victor W. LEE, Sara S. BAGHSORKHI, Albert HARTONO, Daehyun KIM
-
Publication number: 20140181476Abstract: A scheduler implementing a dependency matrix having restricted entries is disclosed. A processing device of the disclosure includes a decode unit to decode an instruction and a scheduler communicably coupled to the decode unit. In one embodiment, the scheduler is configured to receive the decoded instruction, determine that the decoded instruction qualifies for allocation as a restricted reservation station (RS) entry type in a dependency matrix maintained by the scheduler, identify RS entries in the dependency matrix that are free for allocation, allocate one of the identified free RS entries with information of the decoded instruction in the dependency matrix, and update a row of the dependency matrix corresponding to the claimed RS entry with source dependency information of the decoded instruction.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Inventors: Srikanth T. Srinivasan, Matthew C. Merten, Bambang Sutanto, Rahul R. Kulkarni, Justin M. Deinlein, James D. Hadley
-
Publication number: 20140181477Abstract: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Inventors: Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, Saikat Saharoy, Mani Azimi
-
Publication number: 20140176584Abstract: A data processing system is used to evaluate a data processing function by executing a sequence of program instructions including an intermediate value generating instruction Inst0 and an intermediate value consuming instruction Inst1. In dependence upon one or more input operands to the evaluation, an embedded opcode within the intermediate value passed between the intermediate value generating instruction and the intermediate value consuming instruction may be set to have a value indicating that a substitute instruction should be used in place of the intermediate value consuming instruction. The instructions may be floating point instructions, such as a floating point power instruction evaluating the data processing function ab.Type: ApplicationFiled: February 26, 2014Publication date: June 26, 2014Applicant: ARM LimitedInventor: Jorn NYSTAD
-
Publication number: 20140173256Abstract: A method of associating operation codes with instructions for execution in a processor includes the steps of assigning the operation codes to the instructions in a manner that allows a given instruction to have multiple assigned operation codes and selecting a particular one of the multiple assigned operation codes for use in executing a program containing the given instruction. The assigning step may be implemented in conjunction with design of the processor, and may further comprise the steps of determining frequency of occurrence of adjacent pairs of instructions in one or more programs likely to be run on the processor, and assigning the operation codes to the instructions based at least in part on the determined frequency of occurrence of the adjacent pairs of instructions. The selecting step may be implemented in conjunction with code generation for the program containing the given instruction, for example, in a code assembler.Type: ApplicationFiled: February 21, 2014Publication date: June 19, 2014Applicant: LSI CorporationInventors: Prasad AVSS, Jacob Matthews
-
Publication number: 20140173255Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.Type: ApplicationFiled: December 18, 2012Publication date: June 19, 2014Inventors: Hariharan L. Thantry, Mani Azimi
-
Patent number: 8751823Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for obfuscating branches in computer code. A compiler or a post-compilation tool can obfuscate branches by receiving source code, and compiling the source code to yield computer-executable code. The compiler identifies branches in the computer-executable code, and determines a return address and a destination value for each branch. Then, based on the return address and the destination value for each branch, the compiler constructs a binary tree with nodes and leaf nodes, each node storing a balanced value, and each leaf node storing a destination value. The non-leaf nodes are arranged such that searching the binary tree by return address leads to a corresponding destination value. Then the compiler inserts the binary tree in the computer-executable code and replaces each branch with instructions in the computer-executable code for performing a branching operation based on the binary tree.Type: GrantFiled: August 1, 2011Date of Patent: June 10, 2014Assignee: Apple Inc.Inventors: Gideon M. Myles, Julien Lerouge, Jon McLachlan, Ganna Zaks, Augustin J. Farrugia
-
Publication number: 20140156972Abstract: In an embodiment, the present invention includes a processor having an execution logic to execute instructions and a control transfer termination (CTT) logic coupled to the execution logic. This logic is to cause a CTT fault to be raised if a target instruction of a control transfer instruction is not a CTT instruction. Other embodiments are described and claimed.Type: ApplicationFiled: November 30, 2012Publication date: June 5, 2014Inventors: Vedyvas Shanbhogue, Jason W. Brandt, Uday R. Savagaonkar, Ravi L. Sahita
-
Patent number: 8745419Abstract: A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages.Type: GrantFiled: June 21, 2012Date of Patent: June 3, 2014Assignee: Oracle America, Inc.Inventors: Shailender Chaudhry, Quinn A. Jacobson, Marc Tremblay
-
Publication number: 20140149718Abstract: Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.Type: ApplicationFiled: November 28, 2012Publication date: May 29, 2014Inventors: Christopher J. Hughes, Changkyu Kim, Daehyun Kim, Victor W. Lee, Jong Soo Park
-
Patent number: 8707013Abstract: In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. The register set includes a plurality of legacy predicate registers. Separate from the legacy predicate registers, a plurality of on-demand predicate registers are selectively signaled without changing the opcode space for the DSP.Type: GrantFiled: July 13, 2010Date of Patent: April 22, 2014Assignee: Texas Instruments IncorporatedInventors: Jagadeesh Sankaran, Joseph R. Zbiciak, Steven D. Krueger
-
Patent number: 8700886Abstract: A processor configured to operate with multiple operation codes for each of a plurality of instructions comprises memory circuitry and processing circuitry coupled to the memory circuitry. The processing circuitry is configured to decode a first operation code to produce a given one of the instructions and to decode a second operation code different than the first operation code to also produce the given instruction. Thus, the same instruction is produced for execution by the processing circuitry regardless of whether the first operation code or the second operation code is decoded. The assignment of multiple operation codes to a given instruction may occur in conjunction with the design of the processor, and dynamic selection of a particular one of those operation codes may be performed in conjunction with assembly of code for execution by the processor.Type: GrantFiled: May 30, 2007Date of Patent: April 15, 2014Assignee: Agere Systems LLCInventors: Prasad Avss, Jacob Mathews
-
Publication number: 20140101414Abstract: In response to determining an operation is a dependent operation, a mapper of a processor determines the source registers of the operation from which the dependent operation depends. The mapper translates the dependent operation to a new operation that uses as its source operands at least one of the determined source registers and a source register of the dependent operation. The new operation is independent of other pending operations and therefore can be executed without waiting for execution of other operations, thus reducing execution latency.Type: ApplicationFiled: October 9, 2012Publication date: April 10, 2014Applicant: Advanced Micro Devices, Inc.Inventor: Kevin A. Hurd
-
Publication number: 20140095831Abstract: An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.Type: ApplicationFiled: September 28, 2012Publication date: April 3, 2014Inventors: Edward T. Grochowski, Dennis R. Bradford, George Z. Chrysos, Andrew T. Forsyth, Michael D. Upton, Lisa K. Wu
-
Publication number: 20140095834Abstract: Instructions and logic provide extended vector suffix comparisons for Boyer-Moore searches. Some embodiments, responsive to an instruction specifying: a pattern source operand and a target source operand, compare each of m data elements of the pattern operand with each data element of the target operand. A first and second equal ordered aggregation operation are performed from the comparisons according to the m data elements of the pattern source operand. A result of the first and second aggregation operations indicating whether or not a possible match exists between the m data elements of the pattern source operand and d data element positions relative to data elements of the target source operand is stored. Ordering of the data elements of the pattern and the target operands may be reversed for the second aggregation operation, and d may be a sum of m?1 and the quantity of target operand elements in some embodiments.Type: ApplicationFiled: September 30, 2012Publication date: April 3, 2014Inventor: Shih J. Kuo
-
Publication number: 20140089638Abstract: Various techniques for processing instructions that specify multiple destinations. A first portion of a processor pipeline is configured to split a multi-destination instruction into a plurality of single-destination operations. A second portion of the pipeline is configured to process the plurality of single-destination operations. A third portion of the pipeline is configured to merge the plurality of single-destination operations into one or more multi-destination operations. The one or more multi-destination operations may be performed. The first portion of the pipeline may include a decode unit. The second portion of the pipeline may include a map unit, which may in turn include circuitry configured to maintain a list of free architectural registers and a mapping table that maps physical registers to architectural registers. The third portion of the pipeline may comprise a dispatch unit. In some embodiments, this may provide certain advantages such as reduced area and/or power consumption.Type: ApplicationFiled: September 26, 2012Publication date: March 27, 2014Applicant: APPLE INC.Inventors: John H. Mylius, Gerard R. Williams III, James B. Keller, Fang Liu, Shyam Sundar
-
Publication number: 20140082328Abstract: According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.Type: ApplicationFiled: September 14, 2012Publication date: March 20, 2014Applicant: INTEL CORPORATIONInventors: Vinodh Gopal, Erdinc Ozturk, James D. Guilford, Gilbert M. Wolrich
-
Publication number: 20140082329Abstract: Trustworthy systems require that code be validated as genuine. Most systems implement this requirement prior to execution by matching a cryptographic hash of the binary file against a reference hash value, leaving the code vulnerable to run time compromises, such as code injection, return and jump-oriented programming, and illegal linking of the code to compromised library functions. The Run-time Execution Validator (REV) validates, as the program executes, the control flow path and instructions executed along the control flow path. REV uses a signature cache integrated into the processor pipeline to perform live validation of executions, at basic block boundaries, and ensures that changes to the program state are not made by the instructions within a basic block until the control flow path into the basic block and the instructions within the basic block are both validated.Type: ApplicationFiled: September 16, 2013Publication date: March 20, 2014Applicant: The Research Foundation of State University of New YorkInventor: Kanad Ghose
-
Publication number: 20140068229Abstract: Coding circuitry comprises at least an encoder configured to encode an instruction address for transmission to a decoder. The encoder is operative to identify the instruction address as belonging to a particular one of a plurality of groups of instruction addresses associated with respective distinct program constructs, and to encode the instruction address based on the identified group. The decoder is operative to identify the encoded instruction address as belonging to the particular one of a plurality of groups of instruction addresses associated with respective distinct program constructs, and to decode the encoded instruction address based on the identified group. The coding circuitry may be implemented as part of an integrated circuit or other processing device that includes associated processor and memory elements. In such an arrangement, the processor may generate the instruction address for delivery over a bus to the memory.Type: ApplicationFiled: August 28, 2012Publication date: March 6, 2014Applicant: LSI CorporationInventors: Prakash Krishnamoorthy, Ramesh C. Tekumalla, Parag Madhani
-
Publication number: 20140059326Abstract: A calculation-processing-device includes: a decoder unit including, a first-counter to increment a first-count-value and to decrement the-first-count-value, and a second-counter configured to increment a second-count-value and to decrement the second-count-value; a first-instruction-executing-unit to execute an instruction of the first-class; a second-instruction-executing-unit to execute an instruction of the-second class; a first-instruction holding unit including a plurality of first-entries, to input the instruction of the first-class held in one of the plurality of first-entries into the first-instruction-executing-unit; a second-instruction-holding-unit including a plurality of second-entries, to input the instruction of the second-class held in one of the plurality of second-entries into the second-instruction-executing-unit; and first-control-unit to output the second-release-notification, and to change the output timing of the second-release-notification when a predetermined relationship is establishType: ApplicationFiled: June 19, 2013Publication date: February 27, 2014Inventors: Sota SAKASHITA, Yasunobu Akizuki, Toshio Yoshida
-
Publication number: 20140052963Abstract: A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values.Type: ApplicationFiled: October 25, 2013Publication date: February 20, 2014Inventors: Avinash Sodani, Stephan Jourdan, Alexandre Farcy, Per Hammarlund
-
Publication number: 20140052962Abstract: A processing system includes a microprocessor, a hardware decoder arranged within the microprocessor, and a translator operatively coupled to the microprocessor. The hardware decoder is configured to decode instruction code non-native to the microprocessor for execution in the microprocessor. The translator is configured to form a translation of the instruction code in an instruction set native to the microprocessor and to connect a branch instruction in the translation to a chaining stub. The chaining stub is configured to selectively cause additional instruction code at a target address of the branch instruction to be received in the hardware decoder without causing the processing system to search for a translation of additional instruction code at the target address.Type: ApplicationFiled: August 15, 2012Publication date: February 20, 2014Applicant: NVIDIA CORPORATIONInventors: Ben Hertzberg, Nathan Tuck
-
Patent number: 8656376Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.Type: GrantFiled: September 1, 2011Date of Patent: February 18, 2014Assignee: National Tsing Hua UniversityInventors: Jenq Kuen Lee, Chi Bang Kuan
-
Patent number: 8650386Abstract: A data processor includes a first register file including registers, a second register file including registers, a number of which is larger than that of the registers of the first register file, an instruction decoder and an operation unit. The instruction decoder decodes an instruction described in first and second instruction formats. The first instruction format includes a first register-addressing field for designating the first register file. The second instruction format includes a second register-addressing field for designating the second register file, a size of which is larger than that of the first register-addressing field. The operation unit executes an instruction described in the first and second instruction formats using operand data stored in the first and second register files, respectively, based on the instruction decoder, and executes operations in parallel, a number of which is determined by a certain field included in the second instruction format.Type: GrantFiled: April 12, 2013Date of Patent: February 11, 2014Assignee: Panasonic CorporationInventors: Takeshi Kishida, Masaitsu Nakajima
-
Publication number: 20140040597Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.Type: ApplicationFiled: August 8, 2012Publication date: February 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Publication number: 20140040598Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.Type: ApplicationFiled: August 8, 2012Publication date: February 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
-
Publication number: 20140040601Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.Type: ApplicationFiled: August 3, 2012Publication date: February 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Publication number: 20140040602Abstract: A storage method, a memory and a storage system that have an accumulated write feature are provided in which the OR and AND operation are shifted from CPU/ALU (controller) to the memory, and the frequency for switching data transmission lines between read and write instructions can be reduced. In the memory, the interface unit includes a write arithmetic instruction interface, a write instruction interface, and an address instruction interface; the instruction/address decoder is configured to decode a write arithmetic instruction, a write instruction and an address instruction; and the pFET has a higher driving capability than the data switches, and the nFET has a lower driving capability than the data switches. The storage method, memory and storage system can reduce work load of CPU/ALU, and enable continuous data writing to the memory.Type: ApplicationFiled: December 30, 2011Publication date: February 6, 2014Applicant: Xi'an Sinochip Semiconductors Co., Ltd.Inventor: Hoffmann Jochen
-
Publication number: 20140040595Abstract: A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.Type: ApplicationFiled: August 1, 2012Publication date: February 6, 2014Applicant: FREESCALE SEMICONDUCTOR, INC.Inventor: Thang M. Tran
-
Patent number: 8639945Abstract: A microprocessor includes a storage element that stores decryption key data and a fetch unit that fetches and decrypts program instructions using a value of the decryption key data stored in the storage element. The fetch unit fetches an instance of a branch and switch key instruction and decrypts it using a first value of the decryption key data stored in the storage element. If the branch is taken, the microprocessor loads the storage element with a second value of the decryption key data for subsequent use by the fetch unit to decrypt an instruction fetched at a target address specified by the branch and switch key instruction. If the branch is not taken, the microprocessor retains the first value of the decryption key data in the storage element for subsequent use by the fetch unit to decrypt an instruction sequentially following the branch and switch key instruction.Type: GrantFiled: April 21, 2011Date of Patent: January 28, 2014Assignee: VIA Technologies, Inc.Inventors: G. Glenn Henry, Terry Parks, Brent Bean, Thomas A. Crispin
-
Publication number: 20140025933Abstract: A method for reducing a number of operations replayed in a processor includes decoding an operation to determine a memory address and a command in the operation. If data is not in a way predictor based on the memory address, a suppress wakeup signal is sent to an operation scheduler, and the operation scheduler suppresses waking up other operations that are dependent on the data.Type: ApplicationFiled: July 17, 2012Publication date: January 23, 2014Applicant: Advanced Micro Devices, Inc.Inventors: Ganesh Venkataramanan, Mike Butler, Krishnan V. Ramani
-
Patent number: 8635621Abstract: The invention relates to a method and apparatus for execution scheduling of a program thread of an application program and executing the scheduled program thread on a data processing system. The method includes: providing an application program thread priority to a thread execution scheduler; selecting for execution the program thread from a plurality of program threads inserted into the thread execution queue, wherein the program thread is selected for execution using a round-robin selection scheme, and wherein the round-robin selection scheme selects the program thread based on an execution priority associated with the program thread bit; placing the program thread in a data processing execution queue within the data processing system; and removing the program thread from the thread execution queue after a successful execution of the program thread by the data processing system.Type: GrantFiled: August 22, 2008Date of Patent: January 21, 2014Assignee: International Business Machines CorporationInventors: David Stephen Levitan, Jeffrey Richard Summers
-
Publication number: 20140019723Abstract: An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code for execution on the ASMP is analyzed and a determination is made as to whether to allow the program code, or a code segment thereof to execute on a first core natively or to use binary translation on the code and execute the translated code on a second core which consumes less power than the first core during execution.Type: ApplicationFiled: December 28, 2011Publication date: January 16, 2014Inventors: Koichi Yamada, Ronny Ronen, Wei Li, Boris Ginzburg, Gadi Haber, Konstantin Levit-Gurevich, Esfir Natanzon, Alon Naveh, Eliezer Weissmann, Michael Mishaeli
-
Publication number: 20140013084Abstract: The disclosure provides a device for implementing address buffer management of a processor, including: an assembler configured to perform operations to obtain intermediate values when the assembler encodes a set instruction for an address automatic-increment value and boundary values, and to encapsulate the intermediate values into the set instruction for the address automatic-increment value and boundary values; and a processor configured to determine, according to the intermediate values, whether to perform the address automatic-increment operation or the address automatic-decrement operation, so as to achieve the address buffer management.Type: ApplicationFiled: August 24, 2011Publication date: January 9, 2014Applicant: ZTE CorporationInventors: Lihuang Li, Chunyu Tian, Hui Ren
-
Publication number: 20140013083Abstract: A cache coprocessing unit in a computing system includes a cache array to store data, a hardware decode unit to decode instructions that are offloaded from being executed by an execution cluster of the computing system to reduce load and store operations between the execution cluster and the cache coprocessing unit, and a set of one or more operation units to perform operations on the cache array according to the decoded instructions.Type: ApplicationFiled: December 30, 2011Publication date: January 9, 2014Inventor: Ashish Jha
-
Publication number: 20130346728Abstract: In one embodiment, the present invention includes an instruction decoder that can receive an incoming instruction and a path select signal and decode the incoming instruction into a first instruction code or a second instruction code responsive to the path select signal. The two different instruction codes, both representing the same incoming instruction may be used by an execution unit to perform an operation optimized for different data lengths. Other embodiments are described and claimed.Type: ApplicationFiled: August 28, 2013Publication date: December 26, 2013Inventors: Ohad Falik, Lihu Rappoport, Ron Gabor, Yulia Kurolap, Michael Mishaeli
-
Publication number: 20130339668Abstract: Embodiments of systems, apparatuses, and methods for performing delta decoding on packed data elements of a source and storing the results in packed data elements of a destination using a single vector packed delta decode instruction are described.Type: ApplicationFiled: December 28, 2011Publication date: December 19, 2013Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale
-
Publication number: 20130339667Abstract: A method of changing a value of associated with a logical address in a computing device. The method includes: receiving an instruction at an instruction decoder, the instruction including a target register expressed as a logical value; determining at an instruction decoder that a result of the instruction is to set the target register to a constant value, the target register being in a physical register file associated with an execution unit; and mapping, in a register mapper, the logical address to a location represented by a special register tag.Type: ApplicationFiled: March 13, 2013Publication date: December 19, 2013Applicant: International Business Machines CorporationInventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Chung-Lum K. Shum
-
Publication number: 20130339666Abstract: A method of changing a value of associated with a logical address in a computing device. The method includes: receiving an instruction at an instruction decoder, the instruction including a target register expressed as a logical value; determining at an instruction decoder that a result of the instruction is to set the target register to a constant value, the target register being in a physical register file associated with an execution unit; and mapping, in a register mapper, the logical address to a location represented by a special register tag.Type: ApplicationFiled: June 15, 2012Publication date: December 19, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Bruce C. Giamei, Edward T. Malley, Chung-Lung K. Shum
-
Patent number: 8607209Abstract: A processor framework includes a compiler to add control information to an instruction sequence at compile time. The control information is added in the instruction sequence prior to a control-flow changing instruction. Microarchitecture is configured to use the control information at runtime to predict an outcome of the control-flow changing instruction prior to fetching the control-flow changing instruction.Type: GrantFiled: January 18, 2005Date of Patent: December 10, 2013Assignee: BlueRISC Inc.Inventors: Saurabh Chheda, Kristopher Carver, Raksit Ashok
-
Publication number: 20130326195Abstract: Preventing execution of parity-error-induced unpredictable instructions, and related processor systems, methods, and computer-readable media are disclosed. In this regard, a method for processing instructions in a central processing unit (CPU) is provided. The method comprises decoding an instruction comprising a plurality of bits, and generating a parity error indicator indicating whether a parity error exists in the plurality of bits prior to execution of the instruction. If the parity error indicator indicates that the parity error exists in the plurality of bits, one or more of the plurality of bits are modified to indicate a no execution operation (NOP), without effecting a roll back of a program counter of the CPU and without re-decoding the instruction. In this manner, the possibility of the parity error causing an inadvertent execution of an unpredictable instruction is reduced.Type: ApplicationFiled: March 7, 2013Publication date: December 5, 2013Applicant: QUALCOMM INCORPORATEDInventors: Michael Scott McIlvaine, James Norris Dieffenderfer, Brian Michael Stempel, Leslie Mark DeBruyne, Melinda J. Brown
-
Publication number: 20130326194Abstract: Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.Type: ApplicationFiled: November 21, 2012Publication date: December 5, 2013Inventor: Gopalan Ramanujam
-
Publication number: 20130326196Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed unary value decoding using masks in response to a single vector packed unary decoding using masks instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.Type: ApplicationFiled: December 23, 2011Publication date: December 5, 2013Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
-
Publication number: 20130326192Abstract: Embodiments of systems, apparatuses, and methods for performing a mask broadcast instruction in a computer processor are described. In some embodiments, the execution of a mask broadcast instruction causes a broadcast of a data element of the source operand to a destination register of the destination operand according to the broadcast size.Type: ApplicationFiled: December 22, 2011Publication date: December 5, 2013Inventors: Elmoustapha Ould-Ahmed-Vall, Milind Baburao Girkar, Robert C. Valentine, Suleyman Sair, Jesus Corbal San Adrian
-
Publication number: 20130311753Abstract: This invention constitutes a method and apparatus for enabling parallel computations of intermediate operations which are generic in many algorithms in given applications and also contain most of the computationally intensive operations. The method includes designing a set of intermediate level functions suitable for predefined application, obtaining instructions corresponding to intermediate level operations from a processor, computing the addresses of the operands and the results, performing computations involved in multiple intermediate level operations. In an exemplary embodiment the apparatus consists of a local data address generator that computes the addresses of a plurality of operands and results, a programmable computational unit that performs parallels computations of the intermediate level operations and a local memory interface that is interfaced to local memory organized in multiple blocks.Type: ApplicationFiled: August 28, 2012Publication date: November 21, 2013Inventor: VENU KANDADAI