Instruction Decoding (e.g., By Microinstruction, Start Address Generator, Hardwired) Patents (Class 712/208)

Decoding instruction to accommodate plural instruction interpretations (e.g., different dialects, languages, emulation, etc.) (Class 712/209)

Decoding instruction to accommodate variable length instruction or operand (Class 712/210)

Decoding instruction to generate an address of a microroutine (Class 712/211)

Decoding by plural parallel decoders (Class 712/212)

Predecoding of instruction component (Class 712/213)

METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT DETECTION FUNCTIONALITY

Publication number: 20140189308

Abstract: Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Application

Filed: December 29, 2012

Publication date: July 3, 2014

Inventors: Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Brett L. Toll, Mark J. Charney, Milind B. Girkar
METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY

Publication number: 20140189309

Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Application

Filed: December 29, 2012

Publication date: July 3, 2014

Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
COLLAPSING OF MULTIPLE NESTED LOOPS, METHODS AND INSTRUCTIONS

Publication number: 20140189287

Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
ENHANCED LOOP STREAMING DETECTOR TO DRIVE LOGIC OPTIMIZATION

Publication number: 20140189306

Abstract: An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Inventors: Matthew C. Merten, Justin M. Deinlein, Yury N. Ilin, Alexandre J. Farcy, Tong Li, Srikanth T. Srinivasan
Processor with increased efficiency via early instruction completion

Patent number: 8769247

Abstract: Methods and apparatuses are provided for increased efficiency in a processor via early instruction completion. An apparatus is provided for increased efficiency in a processor via early instruction completion. The apparatus comprises an execution unit for processing instructions and determining whether a later issued instruction is ready for completion or an earlier issued instruction is ready for completion and a retire unit for retiring the later issued instruction when the later instruction is ready for completion or to retire the earlier instruction when later instruction is not ready for completion and the earlier issued instruction has a known good completion status. A method is provided for increased efficiency in a processor via early instruction completion. The method comprises completing an earlier issued instruction having a known good completion status ahead of a later issued instruction when the later issued instruction is not ready for completion.

Type: Grant

Filed: April 15, 2011

Date of Patent: July 1, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael D Estlick, Kevin Hurd, Jay Fleischman
SPECULATIVE NON-FAULTING LOADS AND GATHERS

Publication number: 20140181580

Abstract: According to one embodiment, a processor includes an instruction decoder to decode an instruction to read a plurality of data elements from memory, the instruction having a first operand specifying a storage location, a second operand specifying a bitmask having one or more bits, each bit corresponding to one of the data elements, and a third operand specifying a memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the instruction, to read one or more data elements speculatively, based on the bitmask specified by the second operand, from a memory location based on the memory address indicated by the third operand, and to store the one or more data elements in the storage location indicated by the first operand.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Inventors: Jayashankar BHARADWAJ, Nalini VASUDEVAN, Victor W. LEE, Sara S. BAGHSORKHI, Albert HARTONO, Daehyun KIM
Scheduler Implementing Dependency Matrix Having Restricted Entries

Publication number: 20140181476

Abstract: A scheduler implementing a dependency matrix having restricted entries is disclosed. A processing device of the disclosure includes a decode unit to decode an instruction and a scheduler communicably coupled to the decode unit. In one embodiment, the scheduler is configured to receive the decoded instruction, determine that the decoded instruction qualifies for allocation as a restricted reservation station (RS) entry type in a dependency matrix maintained by the scheduler, identify RS entries in the dependency matrix that are free for allocation, allocate one of the identified free RS entries with information of the decoded instruction in the dependency matrix, and update a row of the dependency matrix corresponding to the claimed RS entry with source dependency information of the decoded instruction.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Inventors: Srikanth T. Srinivasan, Matthew C. Merten, Bambang Sutanto, Rahul R. Kulkarni, Justin M. Deinlein, James D. Hadley
Compressing Execution Cycles For Divergent Execution In A Single Instruction Multiple Data (SIMD) Processor

Publication number: 20140181477

Abstract: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Inventors: Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, Saikat Saharoy, Mani Azimi
REDUCING ENERGY AND INCREASING SPEED BY AN INSTRUCTION SUBSTITUTING SUBSEQUENT INSTRUCTIONS WITH SPECIFIC FUNCTION INSTRUCTION

Publication number: 20140176584

Abstract: A data processing system is used to evaluate a data processing function by executing a sequence of program instructions including an intermediate value generating instruction Inst0 and an intermediate value consuming instruction Inst1. In dependence upon one or more input operands to the evaluation, an embedded opcode within the intermediate value passed between the intermediate value generating instruction and the intermediate value consuming instruction may be set to have a value indicating that a substitute instruction should be used in place of the intermediate value consuming instruction. The instructions may be floating point instructions, such as a floating point power instruction evaluating the data processing function ab.

Type: Application

Filed: February 26, 2014

Publication date: June 26, 2014

Applicant: ARM Limited

Inventor: Jorn NYSTAD
PROCESSOR CONFIGURED FOR OPERATION WITH MULTIPLE OPERATION CODES PER INSTRUCTION

Publication number: 20140173256

Abstract: A method of associating operation codes with instructions for execution in a processor includes the steps of assigning the operation codes to the instructions in a manner that allows a given instruction to have multiple assigned operation codes and selecting a particular one of the multiple assigned operation codes for use in executing a program containing the given instruction. The assigning step may be implemented in conjunction with design of the processor, and may further comprise the steps of determining frequency of occurrence of adjacent pairs of instructions in one or more programs likely to be run on the processor, and assigning the operation codes to the instructions based at least in part on the determined frequency of occurrence of the adjacent pairs of instructions. The selecting step may be implemented in conjunction with code generation for the program containing the given instruction, for example, in a code assembler.

Type: Application

Filed: February 21, 2014

Publication date: June 19, 2014

Applicant: LSI Corporation

Inventors: Prasad AVSS, Jacob Matthews
INSTRUCTION SET FOR SUPPORTING WIDE SCALAR PATTERN MATCHES

Publication number: 20140173255

Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.

Type: Application

Filed: December 18, 2012

Publication date: June 19, 2014

Inventors: Hariharan L. Thantry, Mani Azimi
System and method for branch function based obfuscation

Patent number: 8751823

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for obfuscating branches in computer code. A compiler or a post-compilation tool can obfuscate branches by receiving source code, and compiling the source code to yield computer-executable code. The compiler identifies branches in the computer-executable code, and determines a return address and a destination value for each branch. Then, based on the return address and the destination value for each branch, the compiler constructs a binary tree with nodes and leaf nodes, each node storing a balanced value, and each leaf node storing a destination value. The non-leaf nodes are arranged such that searching the binary tree by return address leads to a corresponding destination value. Then the compiler inserts the binary tree in the computer-executable code and replaces each branch with instructions in the computer-executable code for performing a branching operation based on the binary tree.

Type: Grant

Filed: August 1, 2011

Date of Patent: June 10, 2014

Assignee: Apple Inc.

Inventors: Gideon M. Myles, Julien Lerouge, Jon McLachlan, Ganna Zaks, Augustin J. Farrugia
Control Transfer Termination Instructions Of An Instruction Set Architecture (ISA)

Publication number: 20140156972

Abstract: In an embodiment, the present invention includes a processor having an execution logic to execute instructions and a control transfer termination (CTT) logic coupled to the execution logic. This logic is to cause a CTT fault to be raised if a target instruction of a control transfer instruction is not a CTT instruction. Other embodiments are described and claimed.

Type: Application

Filed: November 30, 2012

Publication date: June 5, 2014

Inventors: Vedyvas Shanbhogue, Jason W. Brandt, Uday R. Savagaonkar, Ravi L. Sahita
Logical power throttling of instruction decode rate for successive time periods

Patent number: 8745419

Abstract: A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages.

Type: Grant

Filed: June 21, 2012

Date of Patent: June 3, 2014

Assignee: Oracle America, Inc.

Inventors: Shailender Chaudhry, Quinn A. Jacobson, Marc Tremblay
INSTRUCTION AND LOGIC TO PROVIDE PUSHING BUFFER COPY AND STORE FUNCTIONALITY

Publication number: 20140149718

Abstract: Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.

Type: Application

Filed: November 28, 2012

Publication date: May 29, 2014

Inventors: Christopher J. Hughes, Changkyu Kim, Daehyun Kim, Victor W. Lee, Jong Soo Park
On-demand predicate registers

Patent number: 8707013

Abstract: In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. The register set includes a plurality of legacy predicate registers. Separate from the legacy predicate registers, a plurality of on-demand predicate registers are selectively signaled without changing the opcode space for the DSP.

Type: Grant

Filed: July 13, 2010

Date of Patent: April 22, 2014

Assignee: Texas Instruments Incorporated

Inventors: Jagadeesh Sankaran, Joseph R. Zbiciak, Steven D. Krueger
Processor configured for operation with multiple operation codes per instruction

Patent number: 8700886

Abstract: A processor configured to operate with multiple operation codes for each of a plurality of instructions comprises memory circuitry and processing circuitry coupled to the memory circuitry. The processing circuitry is configured to decode a first operation code to produce a given one of the instructions and to decode a second operation code different than the first operation code to also produce the given instruction. Thus, the same instruction is produced for execution by the processing circuitry regardless of whether the first operation code or the second operation code is decoded. The assignment of multiple operation codes to a given instruction may occur in conjunction with the design of the processor, and dynamic selection of a particular one of those operation codes may be performed in conjunction with assembly of code for execution by the processor.

Type: Grant

Filed: May 30, 2007

Date of Patent: April 15, 2014

Assignee: Agere Systems LLC

Inventors: Prasad Avss, Jacob Mathews
TECHNIQUE FOR TRANSLATING DEPENDENT INSTRUCTIONS

Publication number: 20140101414

Abstract: In response to determining an operation is a dependent operation, a mapper of a processor determines the source registers of the operation from which the dependent operation depends. The mapper translates the dependent operation to a new operation that uses as its source operands at least one of the determined source registers and a source register of the dependent operation. The new operation is independent of other pending operations and therefore can be executed without waiting for execution of other operations, thus reducing execution latency.

Type: Application

Filed: October 9, 2012

Publication date: April 10, 2014

Applicant: Advanced Micro Devices, Inc.

Inventor: Kevin A. Hurd
APPARATUS AND METHOD FOR EFFICIENT GATHER AND SCATTER OPERATIONS

Publication number: 20140095831

Abstract: An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Inventors: Edward T. Grochowski, Dennis R. Bradford, George Z. Chrysos, Andrew T. Forsyth, Michael D. Upton, Lisa K. Wu
INSTRUCTION AND LOGIC FOR BOYER-MOORE SEARCH OF TEXT STRINGS

Publication number: 20140095834

Abstract: Instructions and logic provide extended vector suffix comparisons for Boyer-Moore searches. Some embodiments, responsive to an instruction specifying: a pattern source operand and a target source operand, compare each of m data elements of the pattern operand with each data element of the target operand. A first and second equal ordered aggregation operation are performed from the comparisons according to the m data elements of the pattern source operand. A result of the first and second aggregation operations indicating whether or not a possible match exists between the m data elements of the pattern source operand and d data element positions relative to data elements of the target source operand is stored. Ordering of the data elements of the pattern and the target operands may be reversed for the second aggregation operation, and d may be a sum of m?1 and the quantity of target operand elements in some embodiments.

Type: Application

Filed: September 30, 2012

Publication date: April 3, 2014

Inventor: Shih J. Kuo
Multi-Destination Instruction Handling

Publication number: 20140089638

Abstract: Various techniques for processing instructions that specify multiple destinations. A first portion of a processor pipeline is configured to split a multi-destination instruction into a plurality of single-destination operations. A second portion of the pipeline is configured to process the plurality of single-destination operations. A third portion of the pipeline is configured to merge the plurality of single-destination operations into one or more multi-destination operations. The one or more multi-destination operations may be performed. The first portion of the pipeline may include a decode unit. The second portion of the pipeline may include a map unit, which may in turn include circuitry configured to maintain a list of free architectural registers and a mapping table that maps physical registers to architectural registers. The third portion of the pipeline may comprise a dispatch unit. In some embodiments, this may provide certain advantages such as reduced area and/or power consumption.

Type: Application

Filed: September 26, 2012

Publication date: March 27, 2014

Applicant: APPLE INC.

Inventors: John H. Mylius, Gerard R. Williams III, James B. Keller, Fang Liu, Shyam Sundar
METHOD AND APPARATUS TO PROCESS 4-OPERAND SIMD INTEGER MULTIPLY-ACCUMULATE INSTRUCTION

Publication number: 20140082328

Abstract: According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.

Type: Application

Filed: September 14, 2012

Publication date: March 20, 2014

Applicant: INTEL CORPORATION

Inventors: Vinodh Gopal, Erdinc Ozturk, James D. Guilford, Gilbert M. Wolrich
CONTINUOUS RUN-TIME VALIDATION OF PROGRAM EXECUTION: A PRACTICAL APPROACH

Publication number: 20140082329

Abstract: Trustworthy systems require that code be validated as genuine. Most systems implement this requirement prior to execution by matching a cryptographic hash of the binary file against a reference hash value, leaving the code vulnerable to run time compromises, such as code injection, return and jump-oriented programming, and illegal linking of the code to compromised library functions. The Run-time Execution Validator (REV) validates, as the program executes, the control flow path and instructions executed along the control flow path. REV uses a signature cache integrated into the processor pipeline to perform live validation of executions, at basic block boundaries, and ensures that changes to the program state are not made by the instructions within a basic block until the control flow path into the basic block and the instructions within the basic block are both validated.

Type: Application

Filed: September 16, 2013

Publication date: March 20, 2014

Applicant: The Research Foundation of State University of New York

Inventor: Kanad Ghose
INSTRUCTION ADDRESS ENCODING AND DECODING BASED ON PROGRAM CONSTRUCT GROUPS

Publication number: 20140068229

Abstract: Coding circuitry comprises at least an encoder configured to encode an instruction address for transmission to a decoder. The encoder is operative to identify the instruction address as belonging to a particular one of a plurality of groups of instruction addresses associated with respective distinct program constructs, and to encode the instruction address based on the identified group. The decoder is operative to identify the encoded instruction address as belonging to the particular one of a plurality of groups of instruction addresses associated with respective distinct program constructs, and to decode the encoded instruction address based on the identified group. The coding circuitry may be implemented as part of an integrated circuit or other processing device that includes associated processor and memory elements. In such an arrangement, the processor may generate the instruction address for delivery over a bus to the memory.

Type: Application

Filed: August 28, 2012

Publication date: March 6, 2014

Applicant: LSI Corporation

Inventors: Prakash Krishnamoorthy, Ramesh C. Tekumalla, Parag Madhani
CALCULATION PROCESSING DEVICE AND CALCULATION PROCESSING DEVICE CONTROLLING METHOD

Publication number: 20140059326

Abstract: A calculation-processing-device includes: a decoder unit including, a first-counter to increment a first-count-value and to decrement the-first-count-value, and a second-counter configured to increment a second-count-value and to decrement the second-count-value; a first-instruction-executing-unit to execute an instruction of the first-class; a second-instruction-executing-unit to execute an instruction of the-second class; a first-instruction holding unit including a plurality of first-entries, to input the instruction of the first-class held in one of the plurality of first-entries into the first-instruction-executing-unit; a second-instruction-holding-unit including a plurality of second-entries, to input the instruction of the second-class held in one of the plurality of second-entries into the second-instruction-executing-unit; and first-control-unit to output the second-release-notification, and to change the output timing of the second-release-notification when a predetermined relationship is establish

Type: Application

Filed: June 19, 2013

Publication date: February 27, 2014

Inventors: Sota SAKASHITA, Yasunobu Akizuki, Toshio Yoshida
TECHNIQUE TO PERFORM THREE-SOURCE OPERATIONS

Publication number: 20140052963

Abstract: A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values.

Type: Application

Filed: October 25, 2013

Publication date: February 20, 2014

Inventors: Avinash Sodani, Stephan Jourdan, Alexandre Farcy, Per Hammarlund
CUSTOM CHAINING STUBS FOR INSTRUCTION CODE TRANSLATION

Publication number: 20140052962

Abstract: A processing system includes a microprocessor, a hardware decoder arranged within the microprocessor, and a translator operatively coupled to the microprocessor. The hardware decoder is configured to decode instruction code non-native to the microprocessor for execution in the microprocessor. The translator is configured to form a translation of the instruction code in an instruction set native to the microprocessor and to connect a branch instruction in the translation to a chaining stub. The chaining stub is configured to selectively cause additional instruction code at a target address of the branch instruction to be received in the hardware decoder without causing the processing system to search for a translation of additional instruction code at the target address.

Type: Application

Filed: August 15, 2012

Publication date: February 20, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ben Hertzberg, Nathan Tuck
Compiler for providing intrinsic supports for VLIW PAC processors with distributed register files and method thereof

Patent number: 8656376

Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.

Type: Grant

Filed: September 1, 2011

Date of Patent: February 18, 2014

Assignee: National Tsing Hua University

Inventors: Jenq Kuen Lee, Chi Bang Kuan
Data processor including an operation unit to execute operations in parallel

Patent number: 8650386

Abstract: A data processor includes a first register file including registers, a second register file including registers, a number of which is larger than that of the registers of the first register file, an instruction decoder and an operation unit. The instruction decoder decodes an instruction described in first and second instruction formats. The first instruction format includes a first register-addressing field for designating the first register file. The second instruction format includes a second register-addressing field for designating the second register file, a size of which is larger than that of the first register-addressing field. The operation unit executes an instruction described in the first and second instruction formats using operand data stored in the first and second register files, respectively, based on the instruction decoder, and executes operations in parallel, a number of which is determined by a certain field included in the second instruction format.

Type: Grant

Filed: April 12, 2013

Date of Patent: February 11, 2014

Assignee: Panasonic Corporation

Inventors: Takeshi Kishida, Masaitsu Nakajima
PREDICATION IN A VECTOR PROCESSOR

Publication number: 20140040597

Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.

Type: Application

Filed: August 8, 2012

Publication date: February 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
VECTOR PROCESSING IN AN ACTIVE MEMORY DEVICE

Publication number: 20140040598

Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.

Type: Application

Filed: August 8, 2012

Publication date: February 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
PREDICATION IN A VECTOR PROCESSOR

Publication number: 20140040601

Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.

Type: Application

Filed: August 3, 2012

Publication date: February 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
Storage Method, Memory, and Storing System with Accumulated Write Feature

Publication number: 20140040602

Abstract: A storage method, a memory and a storage system that have an accumulated write feature are provided in which the OR and AND operation are shifted from CPU/ALU (controller) to the memory, and the frequency for switching data transmission lines between read and write instructions can be reduced. In the memory, the interface unit includes a write arithmetic instruction interface, a write instruction interface, and an address instruction interface; the instruction/address decoder is configured to decode a write arithmetic instruction, a write instruction and an address instruction; and the pFET has a higher driving capability than the data switches, and the nFET has a lower driving capability than the data switches. The storage method, memory and storage system can reduce work load of CPU/ALU, and enable continuous data writing to the memory.

Type: Application

Filed: December 30, 2011

Publication date: February 6, 2014

Applicant: Xi'an Sinochip Semiconductors Co., Ltd.

Inventor: Hoffmann Jochen
SPACE EFFICIENT CHECKPOINT FACILITY AND TECHNIQUE FOR PROCESSOR WITH INTEGRALLY INDEXED REGISTER MAPPING AND FREE-LIST ARRAYS

Publication number: 20140040595

Abstract: A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.

Type: Application

Filed: August 1, 2012

Publication date: February 6, 2014

Applicant: FREESCALE SEMICONDUCTOR, INC.

Inventor: Thang M. Tran
Branch and switch key instruction in a microprocessor that fetches and decrypts encrypted instructions

Patent number: 8639945

Abstract: A microprocessor includes a storage element that stores decryption key data and a fetch unit that fetches and decrypts program instructions using a value of the decryption key data stored in the storage element. The fetch unit fetches an instance of a branch and switch key instruction and decrypts it using a first value of the decryption key data stored in the storage element. If the branch is taken, the microprocessor loads the storage element with a second value of the decryption key data for subsequent use by the fetch unit to decrypt an instruction fetched at a target address specified by the branch and switch key instruction. If the branch is not taken, the microprocessor retains the first value of the decryption key data in the storage element for subsequent use by the fetch unit to decrypt an instruction sequentially following the branch and switch key instruction.

Type: Grant

Filed: April 21, 2011

Date of Patent: January 28, 2014

Assignee: VIA Technologies, Inc.

Inventors: G. Glenn Henry, Terry Parks, Brent Bean, Thomas A. Crispin
REPLAY REDUCTION BY WAKEUP SUPPRESSION USING EARLY MISS INDICATION

Publication number: 20140025933

Abstract: A method for reducing a number of operations replayed in a processor includes decoding an operation to determine a memory address and a command in the operation. If data is not in a way predictor based on the memory address, a suppress wakeup signal is sent to an operation scheduler, and the operation scheduler suppresses waking up other operations that are dependent on the data.

Type: Application

Filed: July 17, 2012

Publication date: January 23, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Ganesh Venkataramanan, Mike Butler, Krishnan V. Ramani
Method and apparatus to implement software to hardware thread priority

Patent number: 8635621

Abstract: The invention relates to a method and apparatus for execution scheduling of a program thread of an application program and executing the scheduled program thread on a data processing system. The method includes: providing an application program thread priority to a thread execution scheduler; selecting for execution the program thread from a plurality of program threads inserted into the thread execution queue, wherein the program thread is selected for execution using a round-robin selection scheme, and wherein the round-robin selection scheme selects the program thread based on an execution priority associated with the program thread bit; placing the program thread in a data processing execution queue within the data processing system; and removing the program thread from the thread execution queue after a successful execution of the program thread by the data processing system.

Type: Grant

Filed: August 22, 2008

Date of Patent: January 21, 2014

Assignee: International Business Machines Corporation

Inventors: David Stephen Levitan, Jeffrey Richard Summers
BINARY TRANSLATION IN ASYMMETRIC MULTIPROCESSOR SYSTEM

Publication number: 20140019723

Abstract: An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code for execution on the ASMP is analyzed and a determination is made as to whether to allow the program code, or a code segment thereof to execute on a first core natively or to use binary translation on the code and execute the translated code on a second core which consumes less power than the first core during execution.

Type: Application

Filed: December 28, 2011

Publication date: January 16, 2014

Inventors: Koichi Yamada, Ronny Ronen, Wei Li, Boris Ginzburg, Gadi Haber, Konstantin Levit-Gurevich, Esfir Natanzon, Alon Naveh, Eliezer Weissmann, Michael Mishaeli
DEVICE AND METHOD FOR IMPLEMENTING ADDRESS BUFFER MANAGEMENT OF PROCESSOR

Publication number: 20140013084

Abstract: The disclosure provides a device for implementing address buffer management of a processor, including: an assembler configured to perform operations to obtain intermediate values when the assembler encodes a set instruction for an address automatic-increment value and boundary values, and to encapsulate the intermediate values into the set instruction for the address automatic-increment value and boundary values; and a processor configured to determine, according to the intermediate values, whether to perform the address automatic-increment operation or the address automatic-decrement operation, so as to achieve the address buffer management.

Type: Application

Filed: August 24, 2011

Publication date: January 9, 2014

Applicant: ZTE Corporation

Inventors: Lihuang Li, Chunyu Tian, Hui Ren
CACHE COPROCESSING UNIT

Publication number: 20140013083

Abstract: A cache coprocessing unit in a computing system includes a cache array to store data, a hardware decode unit to decode instructions that are offloaded from being executed by an execution cluster of the computing system to reduce load and store operations between the execution cluster and the cache coprocessing unit, and a set of one or more operation units to perform operations on the cache array according to the decoded instructions.

Type: Application

Filed: December 30, 2011

Publication date: January 9, 2014

Inventor: Ashish Jha
Optimizing Performance Of Instructions Based On Sequence Detection Or Information Associated With The Instructions

Publication number: 20130346728

Abstract: In one embodiment, the present invention includes an instruction decoder that can receive an incoming instruction and a path select signal and decode the incoming instruction into a first instruction code or a second instruction code responsive to the path select signal. The two different instruction codes, both representing the same incoming instruction may be used by an execution unit to perform an operation optimized for different data lengths. Other embodiments are described and claimed.

Type: Application

Filed: August 28, 2013

Publication date: December 26, 2013

Inventors: Ohad Falik, Lihu Rappoport, Ron Gabor, Yulia Kurolap, Michael Mishaeli
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING DELTA DECODING ON PACKED DATA ELEMENTS

Publication number: 20130339668

Abstract: Embodiments of systems, apparatuses, and methods for performing delta decoding on packed data elements of a source and storing the results in packed data elements of a destination using a single vector packed delta decode instruction are described.

Type: Application

Filed: December 28, 2011

Publication date: December 19, 2013

Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale
SPECIAL CASE REGISTER UPDATE WITHOUT EXECUTION

Publication number: 20130339667

Abstract: A method of changing a value of associated with a logical address in a computing device. The method includes: receiving an instruction at an instruction decoder, the instruction including a target register expressed as a logical value; determining at an instruction decoder that a result of the instruction is to set the target register to a constant value, the target register being in a physical register file associated with an execution unit; and mapping, in a register mapper, the logical address to a location represented by a special register tag.

Type: Application

Filed: March 13, 2013

Publication date: December 19, 2013

Applicant: International Business Machines Corporation

Inventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Chung-Lum K. Shum
SPECIAL CASE REGISTER UPDATE WITHOUT EXECUTION

Publication number: 20130339666

Abstract: A method of changing a value of associated with a logical address in a computing device. The method includes: receiving an instruction at an instruction decoder, the instruction including a target register expressed as a logical value; determining at an instruction decoder that a result of the instruction is to set the target register to a constant value, the target register being in a physical register file associated with an execution unit; and mapping, in a register mapper, the logical address to a location represented by a special register tag.

Type: Application

Filed: June 15, 2012

Publication date: December 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Bruce C. Giamei, Edward T. Malley, Chung-Lung K. Shum
Energy-focused compiler-assisted branch prediction

Patent number: 8607209

Abstract: A processor framework includes a compiler to add control information to an instruction sequence at compile time. The control information is added in the instruction sequence prior to a control-flow changing instruction. Microarchitecture is configured to use the control information at runtime to predict an outcome of the control-flow changing instruction prior to fetching the control-flow changing instruction.

Type: Grant

Filed: January 18, 2005

Date of Patent: December 10, 2013

Assignee: BlueRISC Inc.

Inventors: Saurabh Chheda, Kristopher Carver, Raksit Ashok
PREVENTING EXECUTION OF PARITY-ERROR-INDUCED UNPREDICTABLE INSTRUCTIONS, AND RELATED PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA

Publication number: 20130326195

Abstract: Preventing execution of parity-error-induced unpredictable instructions, and related processor systems, methods, and computer-readable media are disclosed. In this regard, a method for processing instructions in a central processing unit (CPU) is provided. The method comprises decoding an instruction comprising a plurality of bits, and generating a parity error indicator indicating whether a parity error exists in the plurality of bits prior to execution of the instruction. If the parity error indicator indicates that the parity error exists in the plurality of bits, one or more of the plurality of bits are modified to indicate a no execution operation (NOP), without effecting a roll back of a program counter of the CPU and without re-decoding the instruction. In this manner, the possibility of the parity error causing an inadvertent execution of an unpredictable instruction is reduced.

Type: Application

Filed: March 7, 2013

Publication date: December 5, 2013

Applicant: QUALCOMM INCORPORATED

Inventors: Michael Scott McIlvaine, James Norris Dieffenderfer, Brian Michael Stempel, Leslie Mark DeBruyne, Melinda J. Brown
METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS

Publication number: 20130326194

Abstract: Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.

Type: Application

Filed: November 21, 2012

Publication date: December 5, 2013

Inventor: Gopalan Ramanujam
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING VECTOR PACKED UNARY DECODING USING MASKS

Publication number: 20130326196

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed unary value decoding using masks in response to a single vector packed unary decoding using masks instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.

Type: Application

Filed: December 23, 2011

Publication date: December 5, 2013

Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
BROADCAST OPERATION ON MASK REGISTER

Publication number: 20130326192

Abstract: Embodiments of systems, apparatuses, and methods for performing a mask broadcast instruction in a computer processor are described. In some embodiments, the execution of a mask broadcast instruction causes a broadcast of a data element of the source operand to a destination register of the destination operand according to the broadcast size.

Type: Application

Filed: December 22, 2011

Publication date: December 5, 2013

Inventors: Elmoustapha Ould-Ahmed-Vall, Milind Baburao Girkar, Robert C. Valentine, Suleyman Sair, Jesus Corbal San Adrian
METHOD AND DEVICE (UNIVERSAL MULTIFUNCTION ACCELERATOR) FOR ACCELERATING COMPUTATIONS BY PARALLEL COMPUTATIONS OF MIDDLE STRATUM OPERATIONS

Publication number: 20130311753

Abstract: This invention constitutes a method and apparatus for enabling parallel computations of intermediate operations which are generic in many algorithms in given applications and also contain most of the computationally intensive operations. The method includes designing a set of intermediate level functions suitable for predefined application, obtaining instructions corresponding to intermediate level operations from a processor, computing the addresses of the operands and the results, performing computations involved in multiple intermediate level operations. In an exemplary embodiment the apparatus consists of a local data address generator that computes the addresses of a plurality of operands and results, a programmable computational unit that performs parallels computations of the intermediate level operations and a local memory interface that is interfaced to local memory organized in multiple blocks.

Type: Application

Filed: August 28, 2012

Publication date: November 21, 2013

Inventor: VENU KANDADAI

prev 1 2 3 4 5 6 7 8 9 … next