Patents Examined by Shawn Doman
  • Patent number: 12210876
    Abstract: Instruction set architectures (ISAs) and apparatus and methods related thereto comprise an instruction set that includes one or more instructions which identify the global pointer (GP) register as an operand (e.g., base register or source register) of the instruction. Identification can be implicit. By implicitly identifying the GP register as an operand of the instruction, one or more bits of the instruction that were dedicated to explicitly identifying the operand (e.g., base register or source register) can be used to extend the size of one or more other operands, such as the offset or immediate, to provide longer offsets or immediates.
    Type: Grant
    Filed: August 31, 2018
    Date of Patent: January 28, 2025
    Assignee: MIPS Tech, LLC
    Inventors: James Hippisley Robinson, Morgyn Taylor, Matthew Fortune, Richard Fuhler, Sanjay Patel
  • Patent number: 12204908
    Abstract: A branch predictor predicts a first outcome of a first branch in a first block of instructions. Fetch logic fetches instructions for speculative execution along a first path indicated by the first outcome. Information representing a remainder of the first block is stored in response to the first predicted outcome being taken. In response to the first branch instruction being not taken, the branch predictor is restarted based on the remainder block. In some cases, entries corresponding to second blocks along speculative paths from the first block are accessed using an address of the first block as an index into a branch prediction structure. Outcomes of branch instructions in the second blocks are concurrently predicted using a corresponding set of instances of branch conditional logic and the predicted outcomes are used in combination with the remainder block to restart the branch predictor in response to mispredictions.
    Type: Grant
    Filed: June 4, 2018
    Date of Patent: January 21, 2025
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marius Evers, Douglas Williams, Ashok T. Venkatachar, Sudherssen Kalaiselvan
  • Patent number: 12197308
    Abstract: On-circuit utilization monitoring may be performed for a systolic array. A current utilization measurement may be determined for processing elements of a systolic array and compared with a prior utilization measurement. Based on the comparison, a throttling recommendation may be provided to a management component to determine whether to perform the throttling recommendation.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: January 14, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Ron Diamant
  • Patent number: 12198222
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
    Type: Grant
    Filed: December 7, 2023
    Date of Patent: January 14, 2025
    Assignee: Intel Corporation
    Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
  • Patent number: 12190115
    Abstract: A processor configured to store prediction information that is obtained from a call instruction that is fetched in a RAS fetch, acquire the prediction information that is used in a return instruction from the RAS fetch, delete the prediction information, store the prediction information in a RAS complete after completion of execution of the call instruction, and delete the prediction information from the RAS complete after completion of execution of the return instruction, and specify a first entry that is the closest to a top of the queue among the entries in which the branch instruction on which the branch prediction error is detected is re-executed, store and delete the prediction information in and from the RAS fetch according to the call instruction and the return instruction that are stored in entries closer to the top than the first entry is, and cause the first entry to be re-executes.
    Type: Grant
    Filed: June 6, 2023
    Date of Patent: January 7, 2025
    Assignee: FUJITSU LIMITED
    Inventors: Chunye You, Ryohei Okazaki
  • Patent number: 12190114
    Abstract: In one embodiment, a processor includes a branch predictor to predict whether a branch instruction is to be taken and a branch target buffer (BTB) coupled to the branch predictor. The branch target buffer may be segmented into a first cache portion and a second cache portion, where, in response to an indication that the branch is to be taken, the BTB is to access an entry in one of the first cache portion and the second cache portion based at least in part on a type of the branch instruction, an occurrence frequency of the branch instruction, and spatial information regarding a distance between a target address of a target of the branch instruction and an address of the branch instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: January 7, 2025
    Assignee: Intel Corporation
    Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Sr Swamy Saranam Chongala
  • Patent number: 12175252
    Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.
    Type: Grant
    Filed: June 14, 2022
    Date of Patent: December 24, 2024
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Tatiana Shpeisman, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
  • Patent number: 12169718
    Abstract: An apparatus comprises decoder circuitry to decode an instruction that includes an opcode to indicate a protected load operation, a source field for source memory address information, and a destination field to identify a destination register. The apparatus also comprises memory to store an allocate load-protect (LP) data structure with an entry for the identified destination register. The entry comprises an IP field and a status field. The apparatus also comprises load elision circuitry to (a) use the allocate LP data structure to determine whether the identified destination register has active status for the IP; (b) in response to determining that the identified destination register has active status for the IP, cause the instruction to be elided; and (c) in response to determining that the identified destination register does not have active status for the IP, cause the instruction to be executed. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: December 17, 2024
    Assignee: Intel Corporation
    Inventors: Vineeth Thamarassery Mekkat, Sebastian Christoph Albert Winkel, Rangeen Basu Roy Chowdhury
  • Patent number: 12159142
    Abstract: Techniques are disclosed relating to predicting values for load operations. In some embodiments, front-end circuitry is configured to predict values of load operations based on multiple value tagged geometric length predictor (VTAGE) prediction tables (based on program counter information and branch history information). Training circuitry may adjust multiple VTAGE learning tables based on completed load operations. Control circuitry may pre-compute access information (e.g., an index) for a VTAGE learning table for a load based on branch history information that is available to the front-end circuitry but that is unavailable to the training circuitry, store the pre-computed access information, and provide the pre-computed access information from the first storage circuitry to the training circuitry to access the VTAGE learning table based on completion of the load. This may facilitate VTAGE training without pipelining the branch history information.
    Type: Grant
    Filed: May 2, 2023
    Date of Patent: December 3, 2024
    Assignee: Apple Inc.
    Inventors: Yuan C. Chou, Chang Xu, Deepankar Duggal, Debasish Chandra
  • Patent number: 12147836
    Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.
    Type: Grant
    Filed: November 5, 2021
    Date of Patent: November 19, 2024
    Assignee: Intel Corporation
    Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 12118397
    Abstract: The present disclosure describes techniques for accelerating data processing by offloading thread computation. An application may be started based on creating and executing a process by a host, the process associated with a plurality of threads. Creating a plurality of computation threads on a storage device may be requested based on determining that the storage device represents a computational storage. The plurality of computation threads may be created based on preloading a plurality of libraries in the storage device. The plurality of libraries may comprise executable codes associated with the plurality of threads. Data processing associated with the plurality of threads may be offloaded to the storage device using the plurality of computation threads. Activities associated with the plurality of computation threads may be managed by the process.
    Type: Grant
    Filed: September 15, 2022
    Date of Patent: October 15, 2024
    Assignees: Lemon Inc., Beijing Youzhuju Network Technology Co. Ltd.
    Inventors: Viacheslav Dubeyko, Jian Wang
  • Patent number: 12112167
    Abstract: Embodiments for gathering and scattering matrix data by row are disclosed. In an embodiment, a processor includes a storage matrix, a decoder, and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode and a first operand field to specify a set of irregularly spaced memory locations. The execution circuitry is to, in response to the decoded instruction, calculate a set of addresses corresponding to the set of irregularly spaced memory locations and transfer a set of rows of data between the storage and the set of irregularly spaced memory locations.
    Type: Grant
    Filed: June 27, 2020
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Menachem Adelman, Evangelos Georganas, Mark J. Charney, Nikita A. Shustrov, Sara Baghsorkhi
  • Patent number: 12112141
    Abstract: A method for performing a convolution operation includes storing, a convolution kernel in a first storage device, the convolution kernel having dimensions x by y; storing, in a second storage device, a first subset of element values of an input feature map having dimensions n by m; performing a first simultaneous multiplication, of each value of the first subset of element values of the input feature map with a first element value from among the x*y elements of the convolution kernel; for each remaining value of the x*y elements of the convolution kernel, performing, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the input feature map; for each simultaneous multiplication, storing, result of the simultaneous multiplication in an accumulator; and outputting, the values of the accumulator as a first row of an output feature map.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: October 8, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ali Shafiee Ardestani, Joseph Hassoun
  • Patent number: 12106111
    Abstract: A prediction unit includes a first predictor that provides an output comprising a hashed fetch address of a current fetch block in response to an input. The first predictor input comprises a hashed fetch address of a previous fetch block that immediately precedes the current fetch block in program execution order. A second predictor provides an output comprising a fetch address of a next fetch block that immediately succeeds the current fetch block in program execution order in response to an input. The second predictor input comprises the hashed fetch address of the current fetch block output by the first predictor.
    Type: Grant
    Filed: August 2, 2022
    Date of Patent: October 1, 2024
    Assignee: Ventana Micro Systems Inc.
    Inventors: John G. Favor, Michael N. Michael
  • Patent number: 12106099
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.
    Type: Grant
    Filed: October 8, 2023
    Date of Patent: October 1, 2024
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 12106107
    Abstract: A memory device includes a memory having a memory bank, a processor in memory (PIM) circuit, and control logic. The PIM circuit includes instruction memory storing at least one instruction provided from a host. The PIM circuit is configured to process an operation using data provided by the host or data read from the memory bank and to store at least one instruction provided by the host. The control logic is configured to decode a command/address received from the host to generate a decoding result and to perform a control operation so that one of i) a memory operation on the memory bank is performed and ii) the PIM circuit performs a processing operation, based on the decoding result. A counting value of a program counter instructing a position of the instruction memory is controlled in response to the command/address instructing the processing operation be performed.
    Type: Grant
    Filed: March 31, 2023
    Date of Patent: October 1, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Sukhan Lee, Shinhaeng Kang, Namsung Kim, Seongil O, Hak-Soo Yu
  • Patent number: 12045615
    Abstract: A system, e.g., a system on a chip (SOC), may include one or more processors. A processor may execute an instruction synchronization barrier (ISB) instruction to enforce an ordering constraint on instructions. To execute the ISB instruction, the processor may determine whether contexts of the processor required for execution of instructions older than the ISB instruction are consumed for the older instructions. Responsive to determining that the contexts are consumed for the older instructions, the processor may initiate fetching of an instruction younger than the ISB instruction, without waiting for the older instructions to retire.
    Type: Grant
    Filed: September 16, 2022
    Date of Patent: July 23, 2024
    Assignee: Apple Inc.
    Inventors: Deepankar Duggal, Kulin N Kothari, Mridul Agarwal, Chang Xu, Yanran Yang, Richard F Russo, Yuan C Chou, Douglas C Holman
  • Patent number: 12038868
    Abstract: Devices and techniques for loading contexts in a coarse-grained reconfigurable array processor are described herein. A system or apparatus may include context load circuitry operable to load context for a coarse-grained reconfigurable array processor, where the context load circuitry is configured to: (a) receive a kernel identifier; (b) access a first registry to obtain a context mask base address; (c) determine a context mask address from the context mask base address and the kernel identifier; (d) access a second registry to obtain a context state base address; (e) determine a context state address from the context state base address and the kernel identifier; (f) use a context mask at the context mask address to determine corresponding active context state; and (g) load the corresponding active context state into the coarse-grained reconfigurable array processor.
    Type: Grant
    Filed: August 31, 2022
    Date of Patent: July 16, 2024
    Assignee: Micron Technology, Inc.
    Inventors: Bryan Hornung, Douglas Vanesko, David Patrick
  • Patent number: 12026518
    Abstract: An apparatus for parallel processing includes a memory and one or more processors, at least one of which operates a single instruction, multiple data (SIMD) model, and each of which are coupled to the memory. The processors are configured to process data samples associated with one or multiple chains or graphs of data processors, which chains or graphs describe processing steps to be executed repeatedly on data samples that are a subset of temporally ordered samples. The processors are additionally configured to dynamically schedule one or multiple sets of the samples associated with the one or multiple chains or graphs of data processors to reduce latency of processing of the data samples associated with a single chain or graph of data processors or different chains and graphs of data processors.
    Type: Grant
    Filed: September 13, 2022
    Date of Patent: July 2, 2024
    Assignee: BRAINGINES SA
    Inventors: Markus Steinberger, Alexander Talashov, Aleksandrs Procopcuks, Vasilii Sumatokhin
  • Patent number: 12001847
    Abstract: A processor may include an instruction pipeline that executes program instructions in-order according to a program order. During operation, the instruction pipeline may detect that data is missing for a first instruction. In response, the instruction pipeline may send a request to load the missing data for the first instruction. However, the instruction pipeline may not necessarily stall operation to wait for the missing data to be loaded. Instead, the instruction pipeline may continue executing instructions subsequent to the first instruction. During the continued execution, the instruction pipeline may detect that data is missing for a second instruction, and send a request to load the missing data for the second instruction. The instruction pipeline may continue such operation until it determines that a condition occurs that prevents the continued execution. When the condition occurs, the instruction pipeline may stop the continued execution, and then re-execute the first instruction.
    Type: Grant
    Filed: August 30, 2022
    Date of Patent: June 4, 2024
    Assignee: Apple Inc.
    Inventors: Justin M Deinlein, Michael L Karm, Brett S Feero, David E Kroesche