Patents Examined by Shawn Doman
  • Patent number: 12260221
    Abstract: Circuitry comprises processing circuitry configured to execute program instructions in dependence upon respective trigger conditions matching a current trigger state and to set a next trigger state in response to program instruction execution; the processing circuitry comprising: instruction storage configured to selectively provide a group of two or more program instructions for execution in parallel; and trigger circuitry responsive to the generation of a trigger state by execution of program instructions and to a trigger condition associated with a given group of program instructions, to control the instruction storage to provide program instructions of the given group of program instructions for execution.
    Type: Grant
    Filed: January 19, 2022
    Date of Patent: March 25, 2025
    Assignee: Arm Limited
    Inventors: Mbou Eyole, Giacomo Gabrielli, Balaji Venu
  • Patent number: 12260218
    Abstract: There is provided an apparatus, method for data processing. The apparatus comprises post decode cracking circuitry responsive to receipt of decoded instructions from decode circuitry of a processing pipeline, to crack the decoded instructions into micro-operations to be processed by processing circuitry of the processing pipeline. The post decode cracking circuitry is responsive to receipt of a decoded instruction suitable for cracking into a plurality of micro-operations including at least one pair of micro-operations having a producer-consumer data dependency, to generate the plurality of micro-operations including a producer micro-operation and a consumer micro-operation, and to assign a transfer register to transfer data between the producer micro-operation and the consumer micro-operation.
    Type: Grant
    Filed: June 28, 2023
    Date of Patent: March 25, 2025
    Assignee: Arm Limited
    Inventors: Quentin Éric Nouvel, Luca Nassi, Nicola Piano, Albin Pierrick Tonnerre, Geoffray Matthieu Lacourba
  • Patent number: 12242851
    Abstract: Methods and apparatus relating to verifying a compressed stream fused with copy or transform operation(s) are described. In an embodiment, compression logic circuitry compresses input data and stores the compressed data in a temporary buffer. The compression logic circuitry determines a first checksum value corresponding to the compressed data stored in the temporary buffer. Decompression logic circuitry performs a decompress-verify operation and a copy operation. The decompress-verify operation decompresses the compressed data stored in the temporary buffer to determine a second checksum value corresponding to the decompressed data from the temporary buffer. The copy operation transfers the compressed data from the temporary buffer to a destination buffer in response to a match between the first checksum value and the second checksum value. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: September 9, 2021
    Date of Patent: March 4, 2025
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Daniel F. Cutter
  • Patent number: 12242848
    Abstract: Examples of the present disclosure provide apparatuses and methods for determining a vector population count in a memory. An example method comprises determining, using sensing circuitry, a vector population count of a number of fixed length elements of a vector stored in a memory array.
    Type: Grant
    Filed: May 25, 2023
    Date of Patent: March 4, 2025
    Inventor: Sanjay Tiwari
  • Patent number: 12210876
    Abstract: Instruction set architectures (ISAs) and apparatus and methods related thereto comprise an instruction set that includes one or more instructions which identify the global pointer (GP) register as an operand (e.g., base register or source register) of the instruction. Identification can be implicit. By implicitly identifying the GP register as an operand of the instruction, one or more bits of the instruction that were dedicated to explicitly identifying the operand (e.g., base register or source register) can be used to extend the size of one or more other operands, such as the offset or immediate, to provide longer offsets or immediates.
    Type: Grant
    Filed: August 31, 2018
    Date of Patent: January 28, 2025
    Assignee: MIPS Tech, LLC
    Inventors: James Hippisley Robinson, Morgyn Taylor, Matthew Fortune, Richard Fuhler, Sanjay Patel
  • Patent number: 12204908
    Abstract: A branch predictor predicts a first outcome of a first branch in a first block of instructions. Fetch logic fetches instructions for speculative execution along a first path indicated by the first outcome. Information representing a remainder of the first block is stored in response to the first predicted outcome being taken. In response to the first branch instruction being not taken, the branch predictor is restarted based on the remainder block. In some cases, entries corresponding to second blocks along speculative paths from the first block are accessed using an address of the first block as an index into a branch prediction structure. Outcomes of branch instructions in the second blocks are concurrently predicted using a corresponding set of instances of branch conditional logic and the predicted outcomes are used in combination with the remainder block to restart the branch predictor in response to mispredictions.
    Type: Grant
    Filed: June 4, 2018
    Date of Patent: January 21, 2025
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marius Evers, Douglas Williams, Ashok T. Venkatachar, Sudherssen Kalaiselvan
  • Patent number: 12197308
    Abstract: On-circuit utilization monitoring may be performed for a systolic array. A current utilization measurement may be determined for processing elements of a systolic array and compared with a prior utilization measurement. Based on the comparison, a throttling recommendation may be provided to a management component to determine whether to perform the throttling recommendation.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: January 14, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Ron Diamant
  • Patent number: 12198222
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
    Type: Grant
    Filed: December 7, 2023
    Date of Patent: January 14, 2025
    Assignee: Intel Corporation
    Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
  • Patent number: 12190115
    Abstract: A processor configured to store prediction information that is obtained from a call instruction that is fetched in a RAS fetch, acquire the prediction information that is used in a return instruction from the RAS fetch, delete the prediction information, store the prediction information in a RAS complete after completion of execution of the call instruction, and delete the prediction information from the RAS complete after completion of execution of the return instruction, and specify a first entry that is the closest to a top of the queue among the entries in which the branch instruction on which the branch prediction error is detected is re-executed, store and delete the prediction information in and from the RAS fetch according to the call instruction and the return instruction that are stored in entries closer to the top than the first entry is, and cause the first entry to be re-executes.
    Type: Grant
    Filed: June 6, 2023
    Date of Patent: January 7, 2025
    Assignee: FUJITSU LIMITED
    Inventors: Chunye You, Ryohei Okazaki
  • Patent number: 12190114
    Abstract: In one embodiment, a processor includes a branch predictor to predict whether a branch instruction is to be taken and a branch target buffer (BTB) coupled to the branch predictor. The branch target buffer may be segmented into a first cache portion and a second cache portion, where, in response to an indication that the branch is to be taken, the BTB is to access an entry in one of the first cache portion and the second cache portion based at least in part on a type of the branch instruction, an occurrence frequency of the branch instruction, and spatial information regarding a distance between a target address of a target of the branch instruction and an address of the branch instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: January 7, 2025
    Assignee: Intel Corporation
    Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Sr Swamy Saranam Chongala
  • Patent number: 12175252
    Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.
    Type: Grant
    Filed: June 14, 2022
    Date of Patent: December 24, 2024
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Tatiana Shpeisman, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
  • Patent number: 12169718
    Abstract: An apparatus comprises decoder circuitry to decode an instruction that includes an opcode to indicate a protected load operation, a source field for source memory address information, and a destination field to identify a destination register. The apparatus also comprises memory to store an allocate load-protect (LP) data structure with an entry for the identified destination register. The entry comprises an IP field and a status field. The apparatus also comprises load elision circuitry to (a) use the allocate LP data structure to determine whether the identified destination register has active status for the IP; (b) in response to determining that the identified destination register has active status for the IP, cause the instruction to be elided; and (c) in response to determining that the identified destination register does not have active status for the IP, cause the instruction to be executed. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: December 17, 2024
    Assignee: Intel Corporation
    Inventors: Vineeth Thamarassery Mekkat, Sebastian Christoph Albert Winkel, Rangeen Basu Roy Chowdhury
  • Patent number: 12159142
    Abstract: Techniques are disclosed relating to predicting values for load operations. In some embodiments, front-end circuitry is configured to predict values of load operations based on multiple value tagged geometric length predictor (VTAGE) prediction tables (based on program counter information and branch history information). Training circuitry may adjust multiple VTAGE learning tables based on completed load operations. Control circuitry may pre-compute access information (e.g., an index) for a VTAGE learning table for a load based on branch history information that is available to the front-end circuitry but that is unavailable to the training circuitry, store the pre-computed access information, and provide the pre-computed access information from the first storage circuitry to the training circuitry to access the VTAGE learning table based on completion of the load. This may facilitate VTAGE training without pipelining the branch history information.
    Type: Grant
    Filed: May 2, 2023
    Date of Patent: December 3, 2024
    Assignee: Apple Inc.
    Inventors: Yuan C. Chou, Chang Xu, Deepankar Duggal, Debasish Chandra
  • Patent number: 12147836
    Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.
    Type: Grant
    Filed: November 5, 2021
    Date of Patent: November 19, 2024
    Assignee: Intel Corporation
    Inventors: Debabrata Mohapatra, Arnab Raha, Deepak Mathaikutty, Raymond Sung, Cormac Brick
  • Patent number: 12118397
    Abstract: The present disclosure describes techniques for accelerating data processing by offloading thread computation. An application may be started based on creating and executing a process by a host, the process associated with a plurality of threads. Creating a plurality of computation threads on a storage device may be requested based on determining that the storage device represents a computational storage. The plurality of computation threads may be created based on preloading a plurality of libraries in the storage device. The plurality of libraries may comprise executable codes associated with the plurality of threads. Data processing associated with the plurality of threads may be offloaded to the storage device using the plurality of computation threads. Activities associated with the plurality of computation threads may be managed by the process.
    Type: Grant
    Filed: September 15, 2022
    Date of Patent: October 15, 2024
    Assignees: Lemon Inc., Beijing Youzhuju Network Technology Co. Ltd.
    Inventors: Viacheslav Dubeyko, Jian Wang
  • Patent number: 12112167
    Abstract: Embodiments for gathering and scattering matrix data by row are disclosed. In an embodiment, a processor includes a storage matrix, a decoder, and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode and a first operand field to specify a set of irregularly spaced memory locations. The execution circuitry is to, in response to the decoded instruction, calculate a set of addresses corresponding to the set of irregularly spaced memory locations and transfer a set of rows of data between the storage and the set of irregularly spaced memory locations.
    Type: Grant
    Filed: June 27, 2020
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Menachem Adelman, Evangelos Georganas, Mark J. Charney, Nikita A. Shustrov, Sara Baghsorkhi
  • Patent number: 12112141
    Abstract: A method for performing a convolution operation includes storing, a convolution kernel in a first storage device, the convolution kernel having dimensions x by y; storing, in a second storage device, a first subset of element values of an input feature map having dimensions n by m; performing a first simultaneous multiplication, of each value of the first subset of element values of the input feature map with a first element value from among the x*y elements of the convolution kernel; for each remaining value of the x*y elements of the convolution kernel, performing, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the input feature map; for each simultaneous multiplication, storing, result of the simultaneous multiplication in an accumulator; and outputting, the values of the accumulator as a first row of an output feature map.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: October 8, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ali Shafiee Ardestani, Joseph Hassoun
  • Patent number: 12106111
    Abstract: A prediction unit includes a first predictor that provides an output comprising a hashed fetch address of a current fetch block in response to an input. The first predictor input comprises a hashed fetch address of a previous fetch block that immediately precedes the current fetch block in program execution order. A second predictor provides an output comprising a fetch address of a next fetch block that immediately succeeds the current fetch block in program execution order in response to an input. The second predictor input comprises the hashed fetch address of the current fetch block output by the first predictor.
    Type: Grant
    Filed: August 2, 2022
    Date of Patent: October 1, 2024
    Assignee: Ventana Micro Systems Inc.
    Inventors: John G. Favor, Michael N. Michael
  • Patent number: 12106099
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.
    Type: Grant
    Filed: October 8, 2023
    Date of Patent: October 1, 2024
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 12106107
    Abstract: A memory device includes a memory having a memory bank, a processor in memory (PIM) circuit, and control logic. The PIM circuit includes instruction memory storing at least one instruction provided from a host. The PIM circuit is configured to process an operation using data provided by the host or data read from the memory bank and to store at least one instruction provided by the host. The control logic is configured to decode a command/address received from the host to generate a decoding result and to perform a control operation so that one of i) a memory operation on the memory bank is performed and ii) the PIM circuit performs a processing operation, based on the decoding result. A counting value of a program counter instructing a position of the instruction memory is controlled in response to the command/address instructing the processing operation be performed.
    Type: Grant
    Filed: March 31, 2023
    Date of Patent: October 1, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Sukhan Lee, Shinhaeng Kang, Namsung Kim, Seongil O, Hak-Soo Yu