Patents Examined by Andrew J Cromer
  • Patent number: 10552165
    Abstract: Within a processor, speculative finishes of load instructions only are tracked in a speculative finish table by maintaining an oldest load instruction of a thread in the speculative finish table after data is loaded for the oldest load instruction, wherein a particular queue index tag assigned to the oldest load instruction by an execution unit points to a particular entry in the speculative finish table, wherein the oldest load instruction is waiting to be finished dependent upon an error check code result. Responsive to a flow unit receiving the particular queue index tag with an indicator that the error check code result for data retrieved for the oldest load instruction is good, finishing the oldest load instruction in the particular entry pointed to by the queue index tag and writing an instruction tag stored in the entry for the oldest load instruction out of the speculative finish table for completion.
    Type: Grant
    Filed: October 19, 2015
    Date of Patent: February 4, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Susan E. Eisen, David A. Hrusecky, Christopher M. Mueller, Dung Q. Nguyen, A. James Van Norstrand, Jr., Kenneth L. Ward
  • Patent number: 10545763
    Abstract: Detecting data dependencies of instructions associated with threads in a simultaneous multithreading (SMT) scheme is disclosed, including: dividing a plurality of comparators of an SMT-enabled device into groups of comparators corresponding to respective ones of threads associated with the SMT-enabled device; simultaneously distributing a first set of instructions associated with a first thread of the plurality of threads to a corresponding first group of comparators from the plurality of groups of comparators and distributing a second set of instructions associated with a second thread of the plurality of threads to a corresponding second group of comparators from the plurality of groups of comparators; and simultaneously performing data dependency detection on the first set of instructions associated with the first thread using the corresponding first group of comparators and performing data dependency detection on the second set of instructions associated with the second thread using the corresponding seco
    Type: Grant
    Filed: May 6, 2015
    Date of Patent: January 28, 2020
    Assignee: Alibaba Group Holding Limited
    Inventors: Ling Ma, Sihai Yao, Lei Zhang
  • Patent number: 10534606
    Abstract: Approaches are described to improve database performance by implementing a RLE decompression function at a low level within a general-purpose processor or an external block. Specifically, embodiments of a hardware implementation of an instruction for RLE decompression are disclosed. The described approaches improve performance by supporting the RLE decompression function within a processor and/or external block. Specifically, a RLE decompression hardware implementation is disclosed that produces a 64-bit RLE decompression result, with an example embodiment performing the task in two pipelined execution stages with a throughput of one per cycle. According to embodiments, hardware organization of narrow-width shifters operating in parallel, controlled by computed shift counts, is used to perform the decompression.
    Type: Grant
    Filed: September 28, 2015
    Date of Patent: January 14, 2020
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Jeffrey S. Brooks, Robert Golla, Albert Danysh, Shasank Chavan, Prateek Agrawal, Andrew Ewoldt, David Weaver
  • Patent number: 10514911
    Abstract: Examples of techniques for designing processors are described herein. In one example, a design structure can be tangibly embodied in a machine readable medium for designing, manufacturing, or testing an integrated circuit. The design structure can include a logic to determine whether a received instruction is an updating fixed point instruction or a non-updating fixed point instruction. The design structure can include a first arithmetic logic unit (ALU) to execute the received instruction if the received instruction is determined to be an updating fixed point instruction and store an update value in a general register. The design structure can include a second arithmetic logic unit (ALU) to execute the received instruction if the received instruction is determined to be a non-updating fixed point instruction.
    Type: Grant
    Filed: November 26, 2014
    Date of Patent: December 24, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Avraham Ayzenfeld, Lee E. Eisen, Brian W. Curran, Christian Jacobi
  • Patent number: 10514925
    Abstract: Systems, apparatuses, and methods for managing dependencies between instruction operations when speculatively issuing load instruction operations. A processor may maintain dependency vectors for sources of instruction operations dispatched to the scheduler. The dependency vector may include a column for each cycle of the load recovery window and a row for each load execution pipeline. When a load speculatively issues, any instruction operation which is dependent on the load may have a bit set in the earliest bit position of its dependency vector to indicate the dependency. The bit may shift in the dependency vector toward the cancel bit position during each clock cycle as the load executes. If the load does not produce its data at the expected latency, an instruction operation may be canceled if there is a bit in the cancel bit position of the dependency vector row corresponding to the execution pipeline of the load.
    Type: Grant
    Filed: January 28, 2016
    Date of Patent: December 24, 2019
    Assignee: Apple Inc.
    Inventor: Sean M. Reynolds
  • Patent number: 10503503
    Abstract: A method in a computer-aided design system for generating a functional design model of a processor, is described herein. The method comprises generating a functional representation of logic to determine whether an instruction is an updating instruction or a non-updating instruction. The method further comprises generating a functional representation of a first arithmetic logic unit (ALU) coupled to a general register in the processor, the first ALU to execute the instruction if the instruction is an updating instruction and store an update value in the general register, and generating a functional representation of a second ALU in the processor to execute the instruction if the instruction is a non-updating instruction.
    Type: Grant
    Filed: September 25, 2015
    Date of Patent: December 10, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Avraham Ayzenfeld, Lee E. Eisen, Brian W. Curran, Christian Jacobi
  • Patent number: 10503506
    Abstract: A mechanism is provided for improving performance when executing unaligned load instructions which load an unaligned block of data from a data store. In a first unaligned load handling mode, a final load operation of a series of load operations performed for the instruction loads a full data word extending beyond the end of the unaligned block of data to be loaded by that instruction. If an initial portion of the unaligned block of data to be loaded by a subsequent unaligned load instruction corresponds to the excess part in the stream buffer for the earlier instruction, then an initial load operation for the subsequent instruction can be suppressed. A mechanism is also described for allowing series of dependent data access operations triggered by a given instruction to be halted partway through when a stall condition arises, and resumed partway through later, by defining overlapping sequences of transactions.
    Type: Grant
    Filed: October 19, 2015
    Date of Patent: December 10, 2019
    Assignee: ARM Limited
    Inventor: Max John Batley
  • Patent number: 10459723
    Abstract: Systems and methods relate to performing data movement operations using single instruction multiple data (SIMD) instructions. A first SIMD instruction comprises a first input data vector having a number N of two or more data elements in corresponding N SIMD lanes and a control vector having N control elements in the corresponding N SIMD lanes. A first multi-stage cube network is controllable by the first SIMD instruction, and includes movement elements, with one movement element per SIMD lane, per stage. A movement element selects between one of two data elements based on a corresponding control element and moves the data elements across the stages of the first multi-stage cube network by a zero distance or power-of-two distance between adjacent stages to generate a first output data vector. A second multi-stage cube network can be used in conjunction to generate all possible data movement operations of the input data vector.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: October 29, 2019
    Assignee: QUALCOMM Incorporated
    Inventor: Eric Wayne Mahurin
  • Patent number: 10445094
    Abstract: A data processing apparatus includes a multi-level memory system, one or more first processing unit coupled to the memory system at a first level and one or more second processing units each coupled to the memory system at a second level. A first reorder buffer maintains data order during execution of instructions by the first and second processing units and a second reorder buffer maintains data order during execution of the instructions by an associated second processing unit. An entry in the first reorder buffer is configured, dependent upon an indicator bit, as an entry for a single instruction or a pointer to an entry in the second reorder buffer. An entry in the second reorder buffer includes instruction block start and end addresses and indicators of input and output register. Instructions are released to a processing unit when all inputs, as indicated by the reorder buffers, are available.
    Type: Grant
    Filed: May 27, 2016
    Date of Patent: October 15, 2019
    Assignee: Arm Limited
    Inventors: Jonathan Curtis Beard, Wendy Elsasser, Shibo Wang
  • Patent number: 10430191
    Abstract: Methods, apparatus, systems, and articles of manufacture to compile instructions for a vector of instruction pointers (VIP) processor architecture are disclosed. An example method includes identifying a strand including a fork instruction introducing a first speculative assumption. A basing instruction to initialize a basing value of the strand before execution of a first instruction under the first speculative assumption. A determination of whether a second instruction under a second speculative assumption modifies a first memory address that is also modified by the first instruction under the first speculative assumption is made. The second instruction is not modified when the second instruction does not modify the first memory address. The second instruction is modified based on the basing value when the second instruction modifies the first memory address, the basing value to cause the second instruction to modify a second memory address different from the first memory address.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: October 1, 2019
    Assignee: Intel Corporation
    Inventors: Yevgeniy M. Astigeyevich, Dmitry M. Maslennikov, Sergey P. Scherbinin, Marat Zakirov, Pavel G. Matveyev, Andrey Rodchenko, Andrey Chudnovets, Boris V. Shurygin
  • Patent number: 10423423
    Abstract: Within a processor, speculative finishes of load instructions only are tracked in a speculative finish table by maintaining an oldest load instruction of a thread in the speculative finish table after data is loaded for the oldest load instruction, wherein a particular queue index tag assigned to the oldest load instruction by an execution unit points to a particular entry in the speculative finish table, wherein the oldest load instruction is waiting to be finished dependent upon an error check code result. Responsive to a flow unit receiving the particular queue index tag with an indicator that the error check code result for data retrieved for the oldest load instruction is good, finishing the oldest load instruction in the particular entry pointed to by the queue index tag and writing an instruction tag stored in the entry for the oldest load instruction out of the speculative finish table for completion.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: September 24, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Susan E. Eisen, David A. Hrusecky, Christopher M. Mueller, Dung Q. Nguyen, A. James Van Norstrand, Jr., Kenneth L. Ward
  • Patent number: 10402199
    Abstract: One embodiment of this invention provides two conditional execution auxiliary instructions directed to disparate subsets of the plural functional units. Depending on the conditional execution desired, only one of the two conditional execution auxiliary instructions may be required for a particular execute packet. Another embodiment of this invention employs only one of two possible register files for the condition registers. In a VLIW processor it may be advantageous to split the functional units into separate sets with corresponding register files. This limits the number of functional units that may simultaneously access the register files. In the preferred embodiment of this invention the functional units are divided into a scalar set which access scalar registers and a vector set which access vector registers. The data registers storing the conditions for both scalar and vector instructions are in the scalar data register file.
    Type: Grant
    Filed: October 22, 2015
    Date of Patent: September 3, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy David Anderson, Duc Quang Bui, Joseph Raymond Michael Zbiciak
  • Patent number: 10394568
    Abstract: Managing exception handling. A plurality of instruction units of an instruction stream are selected to be decoded in parallel by a plurality of instruction decode units of a processor. The plurality of instruction units includes a prefix instruction and a prefixed instruction. The prefixed instruction is an instruction to be modified by the prefix instruction. An exception condition associated with the prefixed instruction is determined. Exception handling is performed for the prefixed instruction, in which the performing includes determining an address at which to restart execution of the instruction stream. The determining the address includes adjusting the address at which to restart execution based on the prefix instruction to be separately decoded by an instruction decode unit.
    Type: Grant
    Filed: September 30, 2015
    Date of Patent: August 27, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 10394569
    Abstract: Managing exception handling. A plurality of instruction units of an instruction stream are selected to be decoded in parallel by a plurality of instruction decode units of a processor. The plurality of instruction units includes a prefix instruction and a prefixed instruction. The prefixed instruction is an instruction to be modified by the prefix instruction. An exception condition associated with the prefixed instruction is determined. Exception handling is performed for the prefixed instruction, in which the performing includes determining an address at which to restart execution of the instruction stream. The determining the address includes adjusting the address at which to restart execution based on the prefix instruction to be separately decoded by an instruction decode unit.
    Type: Grant
    Filed: November 14, 2015
    Date of Patent: August 27, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 10387159
    Abstract: Methods and apparatuses relate to emulating architectural performance monitoring in a binary translation system. In one embodiment, a processor includes an architectural performance counter to maintain an architectural value associated with instruction execution, a register to store the architectural value of the architectural performance counter, binary translation logic to embed an architectural value from the architectural performance counter into a stream of translated instructions having a transactional code region and to store the architectural value into the register, and an execution unit to execute the transactional code region of the stream of translated instructions. The binary translation logic is configured to add the architectural value from the register to the architectural performance counter upon completion of the transactional code region of the stream of translated instructions.
    Type: Grant
    Filed: February 4, 2015
    Date of Patent: August 20, 2019
    Assignee: Intel Corporation
    Inventors: Jason M Agron, Polychronis Xekalakis, Paul Caprioli, Jiwei Oliver Lu, Koichi Yamada
  • Patent number: 10360153
    Abstract: Embodiments relate to a system operation queue for a transaction. An aspect includes determining whether a system operation is part of an in-progress transaction of a central processing unit (CPU). Another aspect includes based on determining that the system operation is part of the in-progress transaction, storing the system operation in a system operation queue corresponding to the in-progress transaction. Yet another aspect includes, based on the in-progress transaction ending, processing the system operation in the system operation queue.
    Type: Grant
    Filed: September 4, 2015
    Date of Patent: July 23, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz
  • Patent number: 10346170
    Abstract: In one embodiment, a processor includes logic, responsive to a first instruction, to perform an operation on a first source operand and a second source operand associated with the first instruction and write a result of the operation to a destination location comprising a third source operand. The write may be a partial write of the destination location to maintain an unmodified portion of the third source operand. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 5, 2015
    Date of Patent: July 9, 2019
    Assignee: Intel Corporation
    Inventors: Jayesh Iyer, Jamison D. Collins, Sebastian Winkel
  • Patent number: 10318289
    Abstract: A compute instruction to be executed is to use a memory operand in a computation. An address associated with the memory operand is to be used to locate a portion of memory from which data is to be obtained and placed in the memory operand. A determination is made as to whether the portion of memory extends across a specified memory boundary. Based on the portion of memory extending across the specified memory boundary, the portion of memory includes a plurality of memory units and a check is made as to whether at least one specified memory unit is accessible and whether at least one specified memory unit is inaccessible.
    Type: Grant
    Filed: November 14, 2015
    Date of Patent: June 11, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Brett Olsson
  • Patent number: 10318430
    Abstract: Embodiments relate to a system operation queue for a transaction. An aspect includes determining whether a system operation is part of an in-progress transaction of a central processing unit (CPU). Another aspect includes based on determining that the system operation is part of the in-progress transaction, storing the system operation in a system operation queue corresponding to the in-progress transaction. Yet another aspect includes, based on the in-progress transaction ending, processing the system operation in the system operation queue.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: June 11, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz
  • Patent number: 10318297
    Abstract: A self-timed parallelized multi-core processor has an instruction decoder unit for receiving a program code instruction, determining an operating code and latency for the instruction, and assigning a loop index to the instruction. An instruction decomposer creates a primitive by decomposing the instruction, replacing the loop index with a core index, and broadcasting the primitive. Self-timed processing cores each having a unique core index compare the core index to their unique processing core index. The processing cores act on the primitive when their processing core index is within a threshold of the core index.
    Type: Grant
    Filed: January 30, 2015
    Date of Patent: June 11, 2019
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Yiqun Ge, Wuxian Shi, Lan Hu