Instruction Decoding (e.g., By Microinstruction, Start Address Generator, Hardwired) Patents (Class 712/208)
  • Publication number: 20140189308
    Abstract: Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: December 29, 2012
    Publication date: July 3, 2014
    Inventors: Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Brett L. Toll, Mark J. Charney, Milind B. Girkar
  • Publication number: 20140189309
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: December 29, 2012
    Publication date: July 3, 2014
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Publication number: 20140189287
    Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20140189306
    Abstract: An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Inventors: Matthew C. Merten, Justin M. Deinlein, Yury N. Ilin, Alexandre J. Farcy, Tong Li, Srikanth T. Srinivasan
  • Patent number: 8769247
    Abstract: Methods and apparatuses are provided for increased efficiency in a processor via early instruction completion. An apparatus is provided for increased efficiency in a processor via early instruction completion. The apparatus comprises an execution unit for processing instructions and determining whether a later issued instruction is ready for completion or an earlier issued instruction is ready for completion and a retire unit for retiring the later issued instruction when the later instruction is ready for completion or to retire the earlier instruction when later instruction is not ready for completion and the earlier issued instruction has a known good completion status. A method is provided for increased efficiency in a processor via early instruction completion. The method comprises completing an earlier issued instruction having a known good completion status ahead of a later issued instruction when the later issued instruction is not ready for completion.
    Type: Grant
    Filed: April 15, 2011
    Date of Patent: July 1, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael D Estlick, Kevin Hurd, Jay Fleischman
  • Publication number: 20140181580
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode an instruction to read a plurality of data elements from memory, the instruction having a first operand specifying a storage location, a second operand specifying a bitmask having one or more bits, each bit corresponding to one of the data elements, and a third operand specifying a memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the instruction, to read one or more data elements speculatively, based on the bitmask specified by the second operand, from a memory location based on the memory address indicated by the third operand, and to store the one or more data elements in the storage location indicated by the first operand.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Inventors: Jayashankar BHARADWAJ, Nalini VASUDEVAN, Victor W. LEE, Sara S. BAGHSORKHI, Albert HARTONO, Daehyun KIM
  • Publication number: 20140181476
    Abstract: A scheduler implementing a dependency matrix having restricted entries is disclosed. A processing device of the disclosure includes a decode unit to decode an instruction and a scheduler communicably coupled to the decode unit. In one embodiment, the scheduler is configured to receive the decoded instruction, determine that the decoded instruction qualifies for allocation as a restricted reservation station (RS) entry type in a dependency matrix maintained by the scheduler, identify RS entries in the dependency matrix that are free for allocation, allocate one of the identified free RS entries with information of the decoded instruction in the dependency matrix, and update a row of the dependency matrix corresponding to the claimed RS entry with source dependency information of the decoded instruction.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Inventors: Srikanth T. Srinivasan, Matthew C. Merten, Bambang Sutanto, Rahul R. Kulkarni, Justin M. Deinlein, James D. Hadley
  • Publication number: 20140181477
    Abstract: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Inventors: Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, Saikat Saharoy, Mani Azimi
  • Publication number: 20140176584
    Abstract: A data processing system is used to evaluate a data processing function by executing a sequence of program instructions including an intermediate value generating instruction Inst0 and an intermediate value consuming instruction Inst1. In dependence upon one or more input operands to the evaluation, an embedded opcode within the intermediate value passed between the intermediate value generating instruction and the intermediate value consuming instruction may be set to have a value indicating that a substitute instruction should be used in place of the intermediate value consuming instruction. The instructions may be floating point instructions, such as a floating point power instruction evaluating the data processing function ab.
    Type: Application
    Filed: February 26, 2014
    Publication date: June 26, 2014
    Applicant: ARM Limited
    Inventor: Jorn NYSTAD
  • Publication number: 20140173256
    Abstract: A method of associating operation codes with instructions for execution in a processor includes the steps of assigning the operation codes to the instructions in a manner that allows a given instruction to have multiple assigned operation codes and selecting a particular one of the multiple assigned operation codes for use in executing a program containing the given instruction. The assigning step may be implemented in conjunction with design of the processor, and may further comprise the steps of determining frequency of occurrence of adjacent pairs of instructions in one or more programs likely to be run on the processor, and assigning the operation codes to the instructions based at least in part on the determined frequency of occurrence of the adjacent pairs of instructions. The selecting step may be implemented in conjunction with code generation for the program containing the given instruction, for example, in a code assembler.
    Type: Application
    Filed: February 21, 2014
    Publication date: June 19, 2014
    Applicant: LSI Corporation
    Inventors: Prasad AVSS, Jacob Matthews
  • Publication number: 20140173255
    Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.
    Type: Application
    Filed: December 18, 2012
    Publication date: June 19, 2014
    Inventors: Hariharan L. Thantry, Mani Azimi
  • Patent number: 8751823
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for obfuscating branches in computer code. A compiler or a post-compilation tool can obfuscate branches by receiving source code, and compiling the source code to yield computer-executable code. The compiler identifies branches in the computer-executable code, and determines a return address and a destination value for each branch. Then, based on the return address and the destination value for each branch, the compiler constructs a binary tree with nodes and leaf nodes, each node storing a balanced value, and each leaf node storing a destination value. The non-leaf nodes are arranged such that searching the binary tree by return address leads to a corresponding destination value. Then the compiler inserts the binary tree in the computer-executable code and replaces each branch with instructions in the computer-executable code for performing a branching operation based on the binary tree.
    Type: Grant
    Filed: August 1, 2011
    Date of Patent: June 10, 2014
    Assignee: Apple Inc.
    Inventors: Gideon M. Myles, Julien Lerouge, Jon McLachlan, Ganna Zaks, Augustin J. Farrugia
  • Publication number: 20140156972
    Abstract: In an embodiment, the present invention includes a processor having an execution logic to execute instructions and a control transfer termination (CTT) logic coupled to the execution logic. This logic is to cause a CTT fault to be raised if a target instruction of a control transfer instruction is not a CTT instruction. Other embodiments are described and claimed.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 5, 2014
    Inventors: Vedyvas Shanbhogue, Jason W. Brandt, Uday R. Savagaonkar, Ravi L. Sahita
  • Patent number: 8745419
    Abstract: A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages.
    Type: Grant
    Filed: June 21, 2012
    Date of Patent: June 3, 2014
    Assignee: Oracle America, Inc.
    Inventors: Shailender Chaudhry, Quinn A. Jacobson, Marc Tremblay
  • Publication number: 20140149718
    Abstract: Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.
    Type: Application
    Filed: November 28, 2012
    Publication date: May 29, 2014
    Inventors: Christopher J. Hughes, Changkyu Kim, Daehyun Kim, Victor W. Lee, Jong Soo Park
  • Patent number: 8707013
    Abstract: In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. The register set includes a plurality of legacy predicate registers. Separate from the legacy predicate registers, a plurality of on-demand predicate registers are selectively signaled without changing the opcode space for the DSP.
    Type: Grant
    Filed: July 13, 2010
    Date of Patent: April 22, 2014
    Assignee: Texas Instruments Incorporated
    Inventors: Jagadeesh Sankaran, Joseph R. Zbiciak, Steven D. Krueger
  • Patent number: 8700886
    Abstract: A processor configured to operate with multiple operation codes for each of a plurality of instructions comprises memory circuitry and processing circuitry coupled to the memory circuitry. The processing circuitry is configured to decode a first operation code to produce a given one of the instructions and to decode a second operation code different than the first operation code to also produce the given instruction. Thus, the same instruction is produced for execution by the processing circuitry regardless of whether the first operation code or the second operation code is decoded. The assignment of multiple operation codes to a given instruction may occur in conjunction with the design of the processor, and dynamic selection of a particular one of those operation codes may be performed in conjunction with assembly of code for execution by the processor.
    Type: Grant
    Filed: May 30, 2007
    Date of Patent: April 15, 2014
    Assignee: Agere Systems LLC
    Inventors: Prasad Avss, Jacob Mathews
  • Publication number: 20140101414
    Abstract: In response to determining an operation is a dependent operation, a mapper of a processor determines the source registers of the operation from which the dependent operation depends. The mapper translates the dependent operation to a new operation that uses as its source operands at least one of the determined source registers and a source register of the dependent operation. The new operation is independent of other pending operations and therefore can be executed without waiting for execution of other operations, thus reducing execution latency.
    Type: Application
    Filed: October 9, 2012
    Publication date: April 10, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Kevin A. Hurd
  • Publication number: 20140095831
    Abstract: An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Edward T. Grochowski, Dennis R. Bradford, George Z. Chrysos, Andrew T. Forsyth, Michael D. Upton, Lisa K. Wu
  • Publication number: 20140095834
    Abstract: Instructions and logic provide extended vector suffix comparisons for Boyer-Moore searches. Some embodiments, responsive to an instruction specifying: a pattern source operand and a target source operand, compare each of m data elements of the pattern operand with each data element of the target operand. A first and second equal ordered aggregation operation are performed from the comparisons according to the m data elements of the pattern source operand. A result of the first and second aggregation operations indicating whether or not a possible match exists between the m data elements of the pattern source operand and d data element positions relative to data elements of the target source operand is stored. Ordering of the data elements of the pattern and the target operands may be reversed for the second aggregation operation, and d may be a sum of m?1 and the quantity of target operand elements in some embodiments.
    Type: Application
    Filed: September 30, 2012
    Publication date: April 3, 2014
    Inventor: Shih J. Kuo
  • Publication number: 20140089638
    Abstract: Various techniques for processing instructions that specify multiple destinations. A first portion of a processor pipeline is configured to split a multi-destination instruction into a plurality of single-destination operations. A second portion of the pipeline is configured to process the plurality of single-destination operations. A third portion of the pipeline is configured to merge the plurality of single-destination operations into one or more multi-destination operations. The one or more multi-destination operations may be performed. The first portion of the pipeline may include a decode unit. The second portion of the pipeline may include a map unit, which may in turn include circuitry configured to maintain a list of free architectural registers and a mapping table that maps physical registers to architectural registers. The third portion of the pipeline may comprise a dispatch unit. In some embodiments, this may provide certain advantages such as reduced area and/or power consumption.
    Type: Application
    Filed: September 26, 2012
    Publication date: March 27, 2014
    Applicant: APPLE INC.
    Inventors: John H. Mylius, Gerard R. Williams III, James B. Keller, Fang Liu, Shyam Sundar
  • Publication number: 20140082328
    Abstract: According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.
    Type: Application
    Filed: September 14, 2012
    Publication date: March 20, 2014
    Applicant: INTEL CORPORATION
    Inventors: Vinodh Gopal, Erdinc Ozturk, James D. Guilford, Gilbert M. Wolrich
  • Publication number: 20140082329
    Abstract: Trustworthy systems require that code be validated as genuine. Most systems implement this requirement prior to execution by matching a cryptographic hash of the binary file against a reference hash value, leaving the code vulnerable to run time compromises, such as code injection, return and jump-oriented programming, and illegal linking of the code to compromised library functions. The Run-time Execution Validator (REV) validates, as the program executes, the control flow path and instructions executed along the control flow path. REV uses a signature cache integrated into the processor pipeline to perform live validation of executions, at basic block boundaries, and ensures that changes to the program state are not made by the instructions within a basic block until the control flow path into the basic block and the instructions within the basic block are both validated.
    Type: Application
    Filed: September 16, 2013
    Publication date: March 20, 2014
    Applicant: The Research Foundation of State University of New York
    Inventor: Kanad Ghose
  • Publication number: 20140068229
    Abstract: Coding circuitry comprises at least an encoder configured to encode an instruction address for transmission to a decoder. The encoder is operative to identify the instruction address as belonging to a particular one of a plurality of groups of instruction addresses associated with respective distinct program constructs, and to encode the instruction address based on the identified group. The decoder is operative to identify the encoded instruction address as belonging to the particular one of a plurality of groups of instruction addresses associated with respective distinct program constructs, and to decode the encoded instruction address based on the identified group. The coding circuitry may be implemented as part of an integrated circuit or other processing device that includes associated processor and memory elements. In such an arrangement, the processor may generate the instruction address for delivery over a bus to the memory.
    Type: Application
    Filed: August 28, 2012
    Publication date: March 6, 2014
    Applicant: LSI Corporation
    Inventors: Prakash Krishnamoorthy, Ramesh C. Tekumalla, Parag Madhani
  • Publication number: 20140059326
    Abstract: A calculation-processing-device includes: a decoder unit including, a first-counter to increment a first-count-value and to decrement the-first-count-value, and a second-counter configured to increment a second-count-value and to decrement the second-count-value; a first-instruction-executing-unit to execute an instruction of the first-class; a second-instruction-executing-unit to execute an instruction of the-second class; a first-instruction holding unit including a plurality of first-entries, to input the instruction of the first-class held in one of the plurality of first-entries into the first-instruction-executing-unit; a second-instruction-holding-unit including a plurality of second-entries, to input the instruction of the second-class held in one of the plurality of second-entries into the second-instruction-executing-unit; and first-control-unit to output the second-release-notification, and to change the output timing of the second-release-notification when a predetermined relationship is establish
    Type: Application
    Filed: June 19, 2013
    Publication date: February 27, 2014
    Inventors: Sota SAKASHITA, Yasunobu Akizuki, Toshio Yoshida
  • Publication number: 20140052963
    Abstract: A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values.
    Type: Application
    Filed: October 25, 2013
    Publication date: February 20, 2014
    Inventors: Avinash Sodani, Stephan Jourdan, Alexandre Farcy, Per Hammarlund
  • Publication number: 20140052962
    Abstract: A processing system includes a microprocessor, a hardware decoder arranged within the microprocessor, and a translator operatively coupled to the microprocessor. The hardware decoder is configured to decode instruction code non-native to the microprocessor for execution in the microprocessor. The translator is configured to form a translation of the instruction code in an instruction set native to the microprocessor and to connect a branch instruction in the translation to a chaining stub. The chaining stub is configured to selectively cause additional instruction code at a target address of the branch instruction to be received in the hardware decoder without causing the processing system to search for a translation of additional instruction code at the target address.
    Type: Application
    Filed: August 15, 2012
    Publication date: February 20, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ben Hertzberg, Nathan Tuck
  • Patent number: 8656376
    Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.
    Type: Grant
    Filed: September 1, 2011
    Date of Patent: February 18, 2014
    Assignee: National Tsing Hua University
    Inventors: Jenq Kuen Lee, Chi Bang Kuan
  • Patent number: 8650386
    Abstract: A data processor includes a first register file including registers, a second register file including registers, a number of which is larger than that of the registers of the first register file, an instruction decoder and an operation unit. The instruction decoder decodes an instruction described in first and second instruction formats. The first instruction format includes a first register-addressing field for designating the first register file. The second instruction format includes a second register-addressing field for designating the second register file, a size of which is larger than that of the first register-addressing field. The operation unit executes an instruction described in the first and second instruction formats using operand data stored in the first and second register files, respectively, based on the instruction decoder, and executes operations in parallel, a number of which is determined by a certain field included in the second instruction format.
    Type: Grant
    Filed: April 12, 2013
    Date of Patent: February 11, 2014
    Assignee: Panasonic Corporation
    Inventors: Takeshi Kishida, Masaitsu Nakajima
  • Publication number: 20140040597
    Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.
    Type: Application
    Filed: August 8, 2012
    Publication date: February 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
  • Publication number: 20140040598
    Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.
    Type: Application
    Filed: August 8, 2012
    Publication date: February 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
  • Publication number: 20140040601
    Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.
    Type: Application
    Filed: August 3, 2012
    Publication date: February 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
  • Publication number: 20140040602
    Abstract: A storage method, a memory and a storage system that have an accumulated write feature are provided in which the OR and AND operation are shifted from CPU/ALU (controller) to the memory, and the frequency for switching data transmission lines between read and write instructions can be reduced. In the memory, the interface unit includes a write arithmetic instruction interface, a write instruction interface, and an address instruction interface; the instruction/address decoder is configured to decode a write arithmetic instruction, a write instruction and an address instruction; and the pFET has a higher driving capability than the data switches, and the nFET has a lower driving capability than the data switches. The storage method, memory and storage system can reduce work load of CPU/ALU, and enable continuous data writing to the memory.
    Type: Application
    Filed: December 30, 2011
    Publication date: February 6, 2014
    Applicant: Xi'an Sinochip Semiconductors Co., Ltd.
    Inventor: Hoffmann Jochen
  • Publication number: 20140040595
    Abstract: A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.
    Type: Application
    Filed: August 1, 2012
    Publication date: February 6, 2014
    Applicant: FREESCALE SEMICONDUCTOR, INC.
    Inventor: Thang M. Tran
  • Patent number: 8639945
    Abstract: A microprocessor includes a storage element that stores decryption key data and a fetch unit that fetches and decrypts program instructions using a value of the decryption key data stored in the storage element. The fetch unit fetches an instance of a branch and switch key instruction and decrypts it using a first value of the decryption key data stored in the storage element. If the branch is taken, the microprocessor loads the storage element with a second value of the decryption key data for subsequent use by the fetch unit to decrypt an instruction fetched at a target address specified by the branch and switch key instruction. If the branch is not taken, the microprocessor retains the first value of the decryption key data in the storage element for subsequent use by the fetch unit to decrypt an instruction sequentially following the branch and switch key instruction.
    Type: Grant
    Filed: April 21, 2011
    Date of Patent: January 28, 2014
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Terry Parks, Brent Bean, Thomas A. Crispin
  • Publication number: 20140025933
    Abstract: A method for reducing a number of operations replayed in a processor includes decoding an operation to determine a memory address and a command in the operation. If data is not in a way predictor based on the memory address, a suppress wakeup signal is sent to an operation scheduler, and the operation scheduler suppresses waking up other operations that are dependent on the data.
    Type: Application
    Filed: July 17, 2012
    Publication date: January 23, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Ganesh Venkataramanan, Mike Butler, Krishnan V. Ramani
  • Patent number: 8635621
    Abstract: The invention relates to a method and apparatus for execution scheduling of a program thread of an application program and executing the scheduled program thread on a data processing system. The method includes: providing an application program thread priority to a thread execution scheduler; selecting for execution the program thread from a plurality of program threads inserted into the thread execution queue, wherein the program thread is selected for execution using a round-robin selection scheme, and wherein the round-robin selection scheme selects the program thread based on an execution priority associated with the program thread bit; placing the program thread in a data processing execution queue within the data processing system; and removing the program thread from the thread execution queue after a successful execution of the program thread by the data processing system.
    Type: Grant
    Filed: August 22, 2008
    Date of Patent: January 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: David Stephen Levitan, Jeffrey Richard Summers
  • Publication number: 20140019723
    Abstract: An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code for execution on the ASMP is analyzed and a determination is made as to whether to allow the program code, or a code segment thereof to execute on a first core natively or to use binary translation on the code and execute the translated code on a second core which consumes less power than the first core during execution.
    Type: Application
    Filed: December 28, 2011
    Publication date: January 16, 2014
    Inventors: Koichi Yamada, Ronny Ronen, Wei Li, Boris Ginzburg, Gadi Haber, Konstantin Levit-Gurevich, Esfir Natanzon, Alon Naveh, Eliezer Weissmann, Michael Mishaeli
  • Publication number: 20140013084
    Abstract: The disclosure provides a device for implementing address buffer management of a processor, including: an assembler configured to perform operations to obtain intermediate values when the assembler encodes a set instruction for an address automatic-increment value and boundary values, and to encapsulate the intermediate values into the set instruction for the address automatic-increment value and boundary values; and a processor configured to determine, according to the intermediate values, whether to perform the address automatic-increment operation or the address automatic-decrement operation, so as to achieve the address buffer management.
    Type: Application
    Filed: August 24, 2011
    Publication date: January 9, 2014
    Applicant: ZTE Corporation
    Inventors: Lihuang Li, Chunyu Tian, Hui Ren
  • Publication number: 20140013083
    Abstract: A cache coprocessing unit in a computing system includes a cache array to store data, a hardware decode unit to decode instructions that are offloaded from being executed by an execution cluster of the computing system to reduce load and store operations between the execution cluster and the cache coprocessing unit, and a set of one or more operation units to perform operations on the cache array according to the decoded instructions.
    Type: Application
    Filed: December 30, 2011
    Publication date: January 9, 2014
    Inventor: Ashish Jha
  • Publication number: 20130346728
    Abstract: In one embodiment, the present invention includes an instruction decoder that can receive an incoming instruction and a path select signal and decode the incoming instruction into a first instruction code or a second instruction code responsive to the path select signal. The two different instruction codes, both representing the same incoming instruction may be used by an execution unit to perform an operation optimized for different data lengths. Other embodiments are described and claimed.
    Type: Application
    Filed: August 28, 2013
    Publication date: December 26, 2013
    Inventors: Ohad Falik, Lihu Rappoport, Ron Gabor, Yulia Kurolap, Michael Mishaeli
  • Publication number: 20130339668
    Abstract: Embodiments of systems, apparatuses, and methods for performing delta decoding on packed data elements of a source and storing the results in packed data elements of a destination using a single vector packed delta decode instruction are described.
    Type: Application
    Filed: December 28, 2011
    Publication date: December 19, 2013
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Tracy Garrett Drysdale
  • Publication number: 20130339667
    Abstract: A method of changing a value of associated with a logical address in a computing device. The method includes: receiving an instruction at an instruction decoder, the instruction including a target register expressed as a logical value; determining at an instruction decoder that a result of the instruction is to set the target register to a constant value, the target register being in a physical register file associated with an execution unit; and mapping, in a register mapper, the logical address to a location represented by a special register tag.
    Type: Application
    Filed: March 13, 2013
    Publication date: December 19, 2013
    Applicant: International Business Machines Corporation
    Inventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Chung-Lum K. Shum
  • Publication number: 20130339666
    Abstract: A method of changing a value of associated with a logical address in a computing device. The method includes: receiving an instruction at an instruction decoder, the instruction including a target register expressed as a logical value; determining at an instruction decoder that a result of the instruction is to set the target register to a constant value, the target register being in a physical register file associated with an execution unit; and mapping, in a register mapper, the logical address to a location represented by a special register tag.
    Type: Application
    Filed: June 15, 2012
    Publication date: December 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Bruce C. Giamei, Edward T. Malley, Chung-Lung K. Shum
  • Patent number: 8607209
    Abstract: A processor framework includes a compiler to add control information to an instruction sequence at compile time. The control information is added in the instruction sequence prior to a control-flow changing instruction. Microarchitecture is configured to use the control information at runtime to predict an outcome of the control-flow changing instruction prior to fetching the control-flow changing instruction.
    Type: Grant
    Filed: January 18, 2005
    Date of Patent: December 10, 2013
    Assignee: BlueRISC Inc.
    Inventors: Saurabh Chheda, Kristopher Carver, Raksit Ashok
  • Publication number: 20130326195
    Abstract: Preventing execution of parity-error-induced unpredictable instructions, and related processor systems, methods, and computer-readable media are disclosed. In this regard, a method for processing instructions in a central processing unit (CPU) is provided. The method comprises decoding an instruction comprising a plurality of bits, and generating a parity error indicator indicating whether a parity error exists in the plurality of bits prior to execution of the instruction. If the parity error indicator indicates that the parity error exists in the plurality of bits, one or more of the plurality of bits are modified to indicate a no execution operation (NOP), without effecting a roll back of a program counter of the CPU and without re-decoding the instruction. In this manner, the possibility of the parity error causing an inadvertent execution of an unpredictable instruction is reduced.
    Type: Application
    Filed: March 7, 2013
    Publication date: December 5, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Michael Scott McIlvaine, James Norris Dieffenderfer, Brian Michael Stempel, Leslie Mark DeBruyne, Melinda J. Brown
  • Publication number: 20130326194
    Abstract: Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.
    Type: Application
    Filed: November 21, 2012
    Publication date: December 5, 2013
    Inventor: Gopalan Ramanujam
  • Publication number: 20130326196
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed unary value decoding using masks in response to a single vector packed unary decoding using masks instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.
    Type: Application
    Filed: December 23, 2011
    Publication date: December 5, 2013
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm
  • Publication number: 20130326192
    Abstract: Embodiments of systems, apparatuses, and methods for performing a mask broadcast instruction in a computer processor are described. In some embodiments, the execution of a mask broadcast instruction causes a broadcast of a data element of the source operand to a destination register of the destination operand according to the broadcast size.
    Type: Application
    Filed: December 22, 2011
    Publication date: December 5, 2013
    Inventors: Elmoustapha Ould-Ahmed-Vall, Milind Baburao Girkar, Robert C. Valentine, Suleyman Sair, Jesus Corbal San Adrian
  • Publication number: 20130311753
    Abstract: This invention constitutes a method and apparatus for enabling parallel computations of intermediate operations which are generic in many algorithms in given applications and also contain most of the computationally intensive operations. The method includes designing a set of intermediate level functions suitable for predefined application, obtaining instructions corresponding to intermediate level operations from a processor, computing the addresses of the operands and the results, performing computations involved in multiple intermediate level operations. In an exemplary embodiment the apparatus consists of a local data address generator that computes the addresses of a plurality of operands and results, a programmable computational unit that performs parallels computations of the intermediate level operations and a local memory interface that is interfaced to local memory organized in multiple blocks.
    Type: Application
    Filed: August 28, 2012
    Publication date: November 21, 2013
    Inventor: VENU KANDADAI