Patents Examined by Corey S Faherty
  • Patent number: 11966739
    Abstract: There is provided an apparatus, method and medium for data processing. The apparatus comprises a register file comprising a plurality of data registers, and frontend circuitry responsive to an issued instruction, to control processing circuitry to perform a processing operation to process an input data item to generate an output data item. The processing circuitry is responsive to a first encoding of the issued instruction specifying a data register, to read the input data item from the data register, and/or write the output data item to the data register. The processing circuitry is responsive to a second encoding of the issued instruction specifying a buffer-region of the register file for storing a queue of data items, to perform the processing operation and to perform a dequeue operation to dequeue the input data item from the queue, and/or perform an enqueue operation to enqueue the output data item to the queue.
    Type: Grant
    Filed: September 9, 2022
    Date of Patent: April 23, 2024
    Assignee: Arm Limited
    Inventors: Matthew James Walker, Mbou Eyole, Giacomo Gabrielli, Balaji Venu
  • Patent number: 11960897
    Abstract: In some implementations, a processor includes a plurality of parallel instruction pipes, a register file includes at least one shared read port configured to be shared across multiple pipes of the plurality of parallel instruction pipes. Control logic controls multiple parallel instruction pipes to read from the at least one shared read port. In certain examples, the at least one shared register file read port is coupled as a single read port for one of the parallel instruction pipes and as a shared register file read port for a plurality of other parallel instruction pipes.
    Type: Grant
    Filed: July 30, 2021
    Date of Patent: April 16, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Michael Estlick, Erik Swanson, Eric Dixon, Todd Baumgartner
  • Patent number: 11960887
    Abstract: Techniques related to packing pieces of data having variable bit lengths to serial packed data using a graphics processing unit and a central processing unit are discussed. Such techniques include executing bit shift operations for the pieces of data in parallel via execution units of the graphics processing unit and packing the bit shifted pieces of data via the central processing unit.
    Type: Grant
    Filed: March 3, 2020
    Date of Patent: April 16, 2024
    Assignee: Intel Corporation
    Inventors: Bin Wang, Bo Peng
  • Patent number: 11960884
    Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.
    Type: Grant
    Filed: November 2, 2021
    Date of Patent: April 16, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
  • Patent number: 11960889
    Abstract: A method and system for moving data from a source memory to a destination memory by a processor is disclosed herein. The destination memory stores a sequence of instructions and the sequence of instructions comprises one or more load instructions and one or more store instructions. The processor initially moves the one or more store instructions from the destination memory to the source memory. The processor then executes the one or more load instructions from the destination memory. On executing the one or more load instructions, the data is loaded from the source memory to at least one register in the processor. The processor further initiates execution of the one or more store instructions stored in the source memory. On executing the one or more store instructions from the source memory, the processor stores the data from the at least one register to the destination memory.
    Type: Grant
    Filed: March 25, 2021
    Date of Patent: April 16, 2024
    Assignee: Nordic Semiconductor ASA
    Inventor: Chris Smith
  • Patent number: 11954497
    Abstract: A method and system for moving data from a source memory to a destination memory by a processor are disclosed. The processor has a plurality of registers and the source memory stores a sequence of instructions that include one or more load instructions and one or more store instructions. The processor moves the load instructions from the source memory to the destination memory. Then, the processor initiates execution of the load instructions from the destination memory in order to load the data from the source memory to one or more registers in the processor. Execution then returns to the sequence of instructions stored in the source memory, and the processor stores the data from the registers to the destination memory.
    Type: Grant
    Filed: March 25, 2021
    Date of Patent: April 9, 2024
    Assignee: Nordic Semiconductor ASA
    Inventor: Chris Smith
  • Patent number: 11947486
    Abstract: The computing efficiency of an electronic computing device is improved. HPCs 20 to 23 include arithmetic processing units HA0 to HA3, respectively. Each of the arithmetic processing units HA0 to HA3 executes arithmetic processing in parallel. LPCs 30 to 33 includes management processing units LB0 to LB3, respectively. Each of the management processing units LB0 to LB3 manages execution of specific processing by an accelerator 6 when each of the arithmetic processing units HA0 to HA3 causes the accelerator 6 to execute the specific processing, and performs a series of commands for causing the accelerator 6 to execute the specific processing on a DMA controller 5 and the accelerator 6.
    Type: Grant
    Filed: March 25, 2020
    Date of Patent: April 2, 2024
    Assignee: Hitachi Astemo, Ltd.
    Inventors: Tatsuya Horiguchi, Tasuku Ishigooka, Kazuyoshi Serizawa, Tsunamichi Tsukidate
  • Patent number: 11940945
    Abstract: An exemplary SIMD computing system comprises a SIMD processing element (SPE) configured to perform a selected operation on a portion of a processor input data word, with the operation selected by control signals read from a control memory location addressed by a decoded instruction. The SPE may comprise one or more adder, multiplier, or multiplexer coupled to the control signals. The control signals may comprise one or more bit read from the control memory. The control memory may be an M×N (M rows by N columns) memory having M possible SIMD operations and N control signals. Each instruction decoded may select an SPE operation from among N rows. A plurality of SPEs may receive the same control signals. The control memory may be rewritable, advantageously permitting customizable SIMD operations that are reconfigurable by storing in the control memory locations control signals designed to cause the SPE to perform selected operations.
    Type: Grant
    Filed: December 31, 2021
    Date of Patent: March 26, 2024
    Inventor: Heonchul Park
  • Patent number: 11934837
    Abstract: An SIMD instruction generation and processing method and a related device are provided. The method may include: obtaining a length of each loop dimension of a first tensor formula; selecting, from a plurality of groups of information about a first SIMD instruction model based on the length of each loop dimension of a first tensor formula, information about a second SIMD instruction model matching the first tensor formula; generating, based on a length of at least one loop dimension of the first tensor formula and the second SIMD instruction model, a first SIMD instruction obtained after the first tensor formula is converted. The information about a second SIMD instruction model is selected from the plurality of groups of information about a first SIMD instruction model based on the length of each loop dimension of the tensor formula.
    Type: Grant
    Filed: September 12, 2022
    Date of Patent: March 19, 2024
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Chen Wu, Yifan Lin, Xiaoqiang Dan
  • Patent number: 11928472
    Abstract: Methods and apparatus relating to branch prefetch mechanisms for mitigating front-end branch resteers are described. In an embodiment, predecodes an entry in a cache to generate a predecoded branch operation. The entry is associated with a cold branch operation, where the cold branch operation corresponds to an operation that is detected for a first time after storage in an instruction cache and wherein the cold branch operation remains undecoded since it is stored at a location in a cache line prior to a subsequent location of a branch operation in the cache line. The predecoded branch operation is stored in a Branch Prefetch Buffer (BPB) in response to a cache line fill operation of the cold branch operation in an instruction cache. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: September 26, 2020
    Date of Patent: March 12, 2024
    Assignee: Intel Corporation
    Inventors: Gilles Pokam, Jared Warner Stark, IV, Niranjan Kumar Soundararajan, Oleg Ladin
  • Patent number: 11928475
    Abstract: An exemplary fault-tolerant computing system comprises a secondary processor configured to execute in delayed lock step with a primary processor from a common program store, comparators in the store data and writeback paths to detect a fault based on comparing primary and secondary processor states, and a writeback path delay permitting aborting execution when a fault is detected, before writeback of invalid data. The secondary processor execution and the primary processor store data and writeback may be delayed a predetermined number of cycles, permitting fault detection before writing invalid data. Store data and writeback paths may include triple module redundancy configured to pass only majority data through the store data and writeback path delay stages. Some implementations may forward data from the store data path delay stages to the writeback stage or memory if the load data address matches the address of data in a store data path delay stage.
    Type: Grant
    Filed: November 5, 2021
    Date of Patent: March 12, 2024
    Assignee: Ceremorphic, Inc.
    Inventor: Heonchul Park
  • Patent number: 11922166
    Abstract: A Very Long Instruction Word (VLIW) digital signal processor particularly adapted for single instruction multiple data (SIMD) operation on various operand widths and data sizes. A vector compare instruction compares first and second operands and stores compare bits. A companion vector conditional instruction performs conditional operations based upon the state of a corresponding predicate data register bit. A predicate unit performs data processing operations on data in at least one predicate data register including unary operations and binary operations. The predicate unit may also transfer data between a general data register file and the predicate data register file.
    Type: Grant
    Filed: January 17, 2023
    Date of Patent: March 5, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Duc Quang Bui, Mujibur Rahman, Joseph Raymond Michael Zbiciak, Eric Biscondi, Peter Dent, Jelena Milanovic, Ashish Shrivastava
  • Patent number: 11922167
    Abstract: Disclosed herein is a method for managing of NOP instructions in a microcontroller, the method comprising duplicating all jump instructions causing a NOP instruction to form a new instruction set; inserting an internal NOP instruction into each of the jump instructions; when a jump instruction is executed, executing a subsequent instruction of the new instruction set; and executing the internal NOP instruction when an execution of the subsequent instruction is skipped.
    Type: Grant
    Filed: February 24, 2023
    Date of Patent: March 5, 2024
    Assignee: SK hynix Inc.
    Inventors: Giulio Martinozzi, Federica Arosio, Lorenzo Di Lalla
  • Patent number: 11922293
    Abstract: An apparatus for identification of an input data against one or more learned signals is provided. The apparatus comprising a number of computational cores, each core comprises properties having at least some statistical independency from other of the computational, the properties being set independently of each other core, each core being able to independently produce an output indicating recognition of a previously learned signal, the apparatus being further configured to process the produced outputs from the number of computational cores and determining an identification of the input data based the produced outputs.
    Type: Grant
    Filed: September 16, 2019
    Date of Patent: March 5, 2024
    Assignee: Cortica Ltd.
    Inventors: Igal Raichelgauz, Karina Odinaev, Yehoshua Y. Zeevi
  • Patent number: 11915004
    Abstract: A data processing apparatus is provided that includes bimodal control flow prediction circuitry for performing a prediction of whether a conditional control flow instruction will be taken. Storage circuitry stores, in association with the control flow instruction, a stored state of the data processing apparatus and reversal circuitry reverses the prediction in dependence on the stored state of the data processing apparatus corresponding with a current state of the data processing apparatus when execution of the control flow instruction is to be performed.
    Type: Grant
    Filed: December 20, 2021
    Date of Patent: February 27, 2024
    Assignee: Arm Limited
    Inventors: Houdhaifa Bouzguarrou, Thibaut Elie Lanois, Guillaume Bolbenes
  • Patent number: 11907722
    Abstract: Aspects of the present disclosure relate to an apparatus comprising processing circuitry, prefetch circuitry and prefetch metadata storage comprising a plurality of entries. Metadata items, each associated with a given stream of instructions, are stored in the prefetch metadata storage. Responsive to a given entry of the plurality of entries being associated with the given stream associated with a given metadata item, the given entry is updated. Responsive to no entry of the plurality of entries being associated with the given stream associated with a given metadata item, an entry is selected according to a default replacement policy, the given stream is allocated thereto, and the selected entry is updated based on the given metadata item. Responsive to a switch condition being met, the default selection policy is switched to an alternative selection policy comprising locking one or more entries by preventing allocation of streams to the locked entries.
    Type: Grant
    Filed: April 20, 2022
    Date of Patent: February 20, 2024
    Assignee: Arm Limited
    Inventors: Luca Maroncelli, Harvin Iriawan, Peter Raphael Eid, Cédric Denis Robert Airaud
  • Patent number: 11907725
    Abstract: A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.
    Type: Grant
    Filed: February 3, 2023
    Date of Patent: February 20, 2024
    Assignee: GRAPHCORE LIMITED
    Inventors: Richard Osborne, Matthew Fyles
  • Patent number: 11907723
    Abstract: A data processing apparatus is provided. Rename circuitry performs a register rename stage of a pipeline by storing, in storage circuitry, mappings between registers. Each of the mappings is associated with an elimination field value. Operation elimination circuitry replaces an operation that indicates an action is to be performed on data from a source register and stored in a destination register, with a new mapping in the storage circuitry that references the destination register and has the elimination field value set. Operation circuitry responds to a subsequent operation that accesses the destination register when the elimination field value is set; by obtaining contents of the source register, performing the action on the contents to obtain a result, and returning the result.
    Type: Grant
    Filed: March 21, 2022
    Date of Patent: February 20, 2024
    Assignee: Arm Limited
    Inventors: Nicholas Andrew Plante, Joseph Michael Pusdesris, Jungsoo Kim
  • Patent number: 11907720
    Abstract: There is provided a data processing apparatus comprising a plurality of registers, each of the registers having data bits to store data and metadata bits to store metadata. Each of the registers is adapted to operate in a metadata mode in which the metadata bits and the data bits are valid, and a data mode in which the data bits are valid and the metadata bits are invalid. Mode bit storage circuitry indicates whether each of the registers is in the data mode or the metadata mode. Execution circuitry is responsive to a memory operation that is a store operation on one or more given registers.
    Type: Grant
    Filed: November 26, 2020
    Date of Patent: February 20, 2024
    Assignee: Arm Limited
    Inventors: Bradley John Smith, Thomas Christopher Grocutt
  • Patent number: 11907717
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: February 8, 2023
    Date of Patent: February 20, 2024
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz