Patents Examined by Aimee Li
  • Patent number: 10564971
    Abstract: A processor includes: at least one operator; and at least one macro instruction processing unit configured to share the at least one operator, wherein the at least one macro instruction processing unit is configured to execute a macro instruction with respect to input data by using the at least one operator to output result data, and to control the at least one operator to perform an operation included in the macro instruction, and the at least one macro instruction processing unit comprises: a scheduler configured to manage schedules of the at least one operator and output input data and a control signal to the at least one operator; and a controller configured to control the scheduler to execute the macro instruction and to receive the result data from the scheduler.
    Type: Grant
    Filed: October 15, 2015
    Date of Patent: February 18, 2020
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Doo-hyun Kim, Jae-hyun Kim, Joon-ho Song
  • Patent number: 10545797
    Abstract: A method for dynamically providing a status of a hardware thread/hardware resource independent of the operation of the hardware thread/hardware resource using an inter-thread communication protocol. A master hardware thread may be configured to communicate status requests to associated slave hardware threads and/or hardware resources. Each slave hardware thread/hardware resource may be configured with hardware logic configured to automatically determine status information for the slave hardware thread/hardware resource and communicate a status response to the master hardware thread without interrupting processing of the slave hardware thread/hardware resource.
    Type: Grant
    Filed: February 8, 2016
    Date of Patent: January 28, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jamie R. Kuesel, Mark G. Kupferschmidt, Paul E. Schardt, Robert A. Shearer
  • Patent number: 10545763
    Abstract: Detecting data dependencies of instructions associated with threads in a simultaneous multithreading (SMT) scheme is disclosed, including: dividing a plurality of comparators of an SMT-enabled device into groups of comparators corresponding to respective ones of threads associated with the SMT-enabled device; simultaneously distributing a first set of instructions associated with a first thread of the plurality of threads to a corresponding first group of comparators from the plurality of groups of comparators and distributing a second set of instructions associated with a second thread of the plurality of threads to a corresponding second group of comparators from the plurality of groups of comparators; and simultaneously performing data dependency detection on the first set of instructions associated with the first thread using the corresponding first group of comparators and performing data dependency detection on the second set of instructions associated with the second thread using the corresponding seco
    Type: Grant
    Filed: May 6, 2015
    Date of Patent: January 28, 2020
    Assignee: Alibaba Group Holding Limited
    Inventors: Ling Ma, Sihai Yao, Lei Zhang
  • Patent number: 10534654
    Abstract: A circuit arrangement and program product for dynamically providing a status of a hardware thread/hardware resource independent of the operation of the hardware thread/hardware resource using an inter-thread communication protocol. A master hardware thread may be configured to communicate status requests to associated slave hardware threads and/or hardware resources. Each slave hardware thread/hardware resource may be configured with hardware logic configured to automatically determine status information for the slave hardware thread/hardware resource and communicate a status response to the master hardware thread without interrupting processing of the slave hardware thread/hardware resource.
    Type: Grant
    Filed: February 8, 2016
    Date of Patent: January 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jamie R. Kuesel, Mark G. Kupferschmidt, Paul E. Schardt, Robert A. Shearer
  • Patent number: 10534606
    Abstract: Approaches are described to improve database performance by implementing a RLE decompression function at a low level within a general-purpose processor or an external block. Specifically, embodiments of a hardware implementation of an instruction for RLE decompression are disclosed. The described approaches improve performance by supporting the RLE decompression function within a processor and/or external block. Specifically, a RLE decompression hardware implementation is disclosed that produces a 64-bit RLE decompression result, with an example embodiment performing the task in two pipelined execution stages with a throughput of one per cycle. According to embodiments, hardware organization of narrow-width shifters operating in parallel, controlled by computed shift counts, is used to perform the decompression.
    Type: Grant
    Filed: September 28, 2015
    Date of Patent: January 14, 2020
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Jeffrey S. Brooks, Robert Golla, Albert Danysh, Shasank Chavan, Prateek Agrawal, Andrew Ewoldt, David Weaver
  • Patent number: 10528352
    Abstract: Blocking instruction fetching in a computer processor, includes: receiving a non-branching instruction to be executed by the computer processor; determining whether executing the non-branching instruction will cause a flush; and responsive to determining that executing the non-branching instruction will cause a flush, disabling instruction fetching for the computer processor for a time, including recoding the instruction such that the recoded instruction will be interpreted by an instruction fetch unit as an unconditional branch instruction.
    Type: Grant
    Filed: March 8, 2016
    Date of Patent: January 7, 2020
    Assignee: International Business Machines Corporation
    Inventors: Bryan G. Hickerson, Sheldon Levenstein, David S. Levitan, Albert J. Van Norstrand, Jr.
  • Patent number: 10514925
    Abstract: Systems, apparatuses, and methods for managing dependencies between instruction operations when speculatively issuing load instruction operations. A processor may maintain dependency vectors for sources of instruction operations dispatched to the scheduler. The dependency vector may include a column for each cycle of the load recovery window and a row for each load execution pipeline. When a load speculatively issues, any instruction operation which is dependent on the load may have a bit set in the earliest bit position of its dependency vector to indicate the dependency. The bit may shift in the dependency vector toward the cancel bit position during each clock cycle as the load executes. If the load does not produce its data at the expected latency, an instruction operation may be canceled if there is a bit in the cancel bit position of the dependency vector row corresponding to the execution pipeline of the load.
    Type: Grant
    Filed: January 28, 2016
    Date of Patent: December 24, 2019
    Assignee: Apple Inc.
    Inventor: Sean M. Reynolds
  • Patent number: 10514911
    Abstract: Examples of techniques for designing processors are described herein. In one example, a design structure can be tangibly embodied in a machine readable medium for designing, manufacturing, or testing an integrated circuit. The design structure can include a logic to determine whether a received instruction is an updating fixed point instruction or a non-updating fixed point instruction. The design structure can include a first arithmetic logic unit (ALU) to execute the received instruction if the received instruction is determined to be an updating fixed point instruction and store an update value in a general register. The design structure can include a second arithmetic logic unit (ALU) to execute the received instruction if the received instruction is determined to be a non-updating fixed point instruction.
    Type: Grant
    Filed: November 26, 2014
    Date of Patent: December 24, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Avraham Ayzenfeld, Lee E. Eisen, Brian W. Curran, Christian Jacobi
  • Patent number: 10514919
    Abstract: A data processing apparatus has processing circuitry for processing vector operands from a vector register store in response to vector micro-operations, some of which have control information identifying which data elements of the vector operands are selected for processing. Control circuitry detects vector micro-operations for which the control information specifies that a portion of the vector operand to be processed has no selected elements. If this is the case, then the control circuitry controls the processing circuitry to process a lower latency replacement micro-operation instead of the original micro-operation. This provides better performance than if a branch instruction is used to bypass the micro-operation if there are no selected elements.
    Type: Grant
    Filed: January 21, 2015
    Date of Patent: December 24, 2019
    Assignee: ARM Limited
    Inventors: Matthias Boettcher, Mbou Eyole-Monono, Giacomo Gabrielli
  • Patent number: 10514920
    Abstract: A processor includes a processing core that detects a predetermined program is running on the processor and looks up a prefetch trait associated with the predetermined program running on the processor, wherein the prefetch trait is either exclusive or shared. The processor also includes a hardware data prefetcher that performs hardware prefetches for the predetermined program using the prefetch trait. Alternatively, the processing core loads each of one or more range registers of the processor with a respective address range in response to detecting that the predetermined program is running on the processor. Each of the one or more address ranges has an associated prefetch trait, wherein the prefetch trait is either exclusive or shared. The hardware data prefetcher performs hardware prefetches for the predetermined program using the prefetch traits associated with the address ranges loaded into the range registers.
    Type: Grant
    Filed: February 18, 2015
    Date of Patent: December 24, 2019
    Assignee: VIA TECHNOLOGIES, INC.
    Inventors: Rodney E. Hooker, Albert J. Loper, John Michael Greer
  • Patent number: 10514926
    Abstract: A microprocessor implemented method for performing early dependency resolution and data forwarding is disclosed. The method comprises mapping a plurality of instructions in a guest address space into a corresponding plurality of instructions in a native address space. For each current guest branch instruction in the native address space fetched during execution, performing (a) determining a youngest prior guest branch target stored in a guest branch target register, wherein the guest branch register is operable to speculatively store a plurality of prior guest branch targets corresponding to prior guest branch instructions; (b) determining a current branch target for a respective current guest branch instruction by adding an offset value for the respective current guest branch instruction to the youngest prior guest branch target; and (c) creating an entry in the guest branch target register for the current branch target.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: December 24, 2019
    Assignee: INTEL CORPORATION
    Inventor: Mohammad A. Abdallah
  • Patent number: 10509726
    Abstract: A processor includes an execution unit to execute instructions to load indices from an array of indices, optionally perform scatters, and prefetch (to a specified cache) contents of target locations for future scatters from arbitrary locations in memory. The execution unit includes logic to load, for each target location of a scatter or prefetch operation, an index value to be used in computing the address in memory for the operation. The index value may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction, the index value retrieved for the location, and a prefetch offset (for prefetch operations), with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction to be scattered to the memory.
    Type: Grant
    Filed: December 20, 2015
    Date of Patent: December 17, 2019
    Assignee: Intel Corporation
    Inventors: Indraneil M. Gokhale, Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Antonio C. Valles
  • Patent number: 10503503
    Abstract: A method in a computer-aided design system for generating a functional design model of a processor, is described herein. The method comprises generating a functional representation of logic to determine whether an instruction is an updating instruction or a non-updating instruction. The method further comprises generating a functional representation of a first arithmetic logic unit (ALU) coupled to a general register in the processor, the first ALU to execute the instruction if the instruction is an updating instruction and store an update value in the general register, and generating a functional representation of a second ALU in the processor to execute the instruction if the instruction is a non-updating instruction.
    Type: Grant
    Filed: September 25, 2015
    Date of Patent: December 10, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Avraham Ayzenfeld, Lee E. Eisen, Brian W. Curran, Christian Jacobi
  • Patent number: 10503506
    Abstract: A mechanism is provided for improving performance when executing unaligned load instructions which load an unaligned block of data from a data store. In a first unaligned load handling mode, a final load operation of a series of load operations performed for the instruction loads a full data word extending beyond the end of the unaligned block of data to be loaded by that instruction. If an initial portion of the unaligned block of data to be loaded by a subsequent unaligned load instruction corresponds to the excess part in the stream buffer for the earlier instruction, then an initial load operation for the subsequent instruction can be suppressed. A mechanism is also described for allowing series of dependent data access operations triggered by a given instruction to be halted partway through when a stall condition arises, and resumed partway through later, by defining overlapping sequences of transactions.
    Type: Grant
    Filed: October 19, 2015
    Date of Patent: December 10, 2019
    Assignee: ARM Limited
    Inventor: Max John Batley
  • Patent number: 10481912
    Abstract: Embodiments include method, systems and computer program products for variable branch target buffer line size for compression. In some embodiments, a branch target buffer (BTB) congruence class for a line of a first parent array of a BTB may be determined. A threshold indicative of a maximum number branches to be stored in the line may be set. A branch may be received to store in the line of the first parent array. A determination may be made that storing the branch in the line would exceed the threshold and the line can be responsively split into an even half line and an odd half line.
    Type: Grant
    Filed: June 24, 2016
    Date of Patent: November 19, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James J. Bonanno, Michael J. Cadigan, Jr., Brian R. Prasky
  • Patent number: 10467010
    Abstract: A method for performing memory disambiguation in an out-of-order microprocessor pipeline is disclosed. The method comprises storing a tag with a load operation, wherein the tag is an identification number representing a store instruction nearest to the load operation, wherein the store instruction is older with respect to the load operation and wherein the store has potential to result in a RAW violation in conjunction with the load operation. The method also comprises issuing the load operation from an instruction scheduling module. Further, the method comprises acquiring data for the load operation speculatively after the load operation has arrived at a load store queue module. Finally, the method comprises determining if an identification number associated with a last contiguous issued store with respect to the load operation is equal to or greater than the tag and gating a validation process for the load operation in response to the determination.
    Type: Grant
    Filed: March 13, 2014
    Date of Patent: November 5, 2019
    Assignee: Intel Corporation
    Inventors: Mohammad A. Abdallah, Mandeep Singh
  • Patent number: 10459723
    Abstract: Systems and methods relate to performing data movement operations using single instruction multiple data (SIMD) instructions. A first SIMD instruction comprises a first input data vector having a number N of two or more data elements in corresponding N SIMD lanes and a control vector having N control elements in the corresponding N SIMD lanes. A first multi-stage cube network is controllable by the first SIMD instruction, and includes movement elements, with one movement element per SIMD lane, per stage. A movement element selects between one of two data elements based on a corresponding control element and moves the data elements across the stages of the first multi-stage cube network by a zero distance or power-of-two distance between adjacent stages to generate a first output data vector. A second multi-stage cube network can be used in conjunction to generate all possible data movement operations of the input data vector.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: October 29, 2019
    Assignee: QUALCOMM Incorporated
    Inventor: Eric Wayne Mahurin
  • Patent number: 10430194
    Abstract: A method for managing tasks in a computer system comprising a processor and a memory, the method includes performing a first task by the processor, the first task comprising task-relating branch instructions and task-independent branch instructions and executing the branch prediction method, the execution resulting in task-relating branch prediction data in the branch prediction history table. In response to determining that the first task is to be interrupted or terminated, the method includes storing the task-relating branch prediction data of the first task in the task structure of the first task. In response to determining that a second task is to be continued, the method includes reading task-relating branch prediction data of the second task from the task structure of the second task, storing the task-relating branch prediction data of the second task in the branch prediction history table, and ensuring that task-independent branch prediction data is maintained.
    Type: Grant
    Filed: March 10, 2016
    Date of Patent: October 1, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Wolfgang Gellerich, Peter M. Held, Martin Schwidefsky, Chung-Lung K. Shum
  • Patent number: 10430190
    Abstract: Systems and methods which provide a modular processor framework and instruction set architecture designed to efficiently execute applications whose memory access patterns are irregular or non-unit stride are disclosed. A hybrid multithreading framework (HMTF) of embodiments provides a framework for constructing tightly coupled, chip-multithreading (CMT) processors that contain specific features well-suited to hiding latency to main memory and executing highly concurrent applications. The HMTF of embodiments includes an instruction set designed specifically to exploit the high degree of parallelism and concurrency control mechanisms present in the HMTF hardware modules. The instruction format implemented by a HMTF of embodiments is designed to give the architecture, the runtime libraries, and/or the application ultimate control over how and when concurrency between thread cache units is initiated.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: October 1, 2019
    Assignee: Micron Technology, Inc.
    Inventors: John D. Leidel, Kevin R. Wadleigh, Joe Bolding, Tony Brewer, Dean E. Walker
  • Patent number: 10423418
    Abstract: A method for managing tasks in a computer system comprising a processor and a memory, the method includes performing a first task by the processor, the first task comprising task-relating branch instructions and task-independent branch instructions and executing the branch prediction method, the execution resulting in task-relating branch prediction data in the branch prediction history table. In response to determining that the first task is to be interrupted or terminated, the method includes storing the task-relating branch prediction data of the first task in the task structure of the first task. In response to determining that a second task is to be continued, the method includes reading task-relating branch prediction data of the second task from the task structure of the second task, storing the task-relating branch prediction data of the second task in the branch prediction history table, and ensuring that task-independent branch prediction data is maintained.
    Type: Grant
    Filed: November 30, 2015
    Date of Patent: September 24, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Wolfgang Gellerich, Peter M. Held, Martin Schwidefsky, Chung-Lung K. Shum