Patents Examined by Corey S Faherty
-
Patent number: 11966739Abstract: There is provided an apparatus, method and medium for data processing. The apparatus comprises a register file comprising a plurality of data registers, and frontend circuitry responsive to an issued instruction, to control processing circuitry to perform a processing operation to process an input data item to generate an output data item. The processing circuitry is responsive to a first encoding of the issued instruction specifying a data register, to read the input data item from the data register, and/or write the output data item to the data register. The processing circuitry is responsive to a second encoding of the issued instruction specifying a buffer-region of the register file for storing a queue of data items, to perform the processing operation and to perform a dequeue operation to dequeue the input data item from the queue, and/or perform an enqueue operation to enqueue the output data item to the queue.Type: GrantFiled: September 9, 2022Date of Patent: April 23, 2024Assignee: Arm LimitedInventors: Matthew James Walker, Mbou Eyole, Giacomo Gabrielli, Balaji Venu
-
Patent number: 11960897Abstract: In some implementations, a processor includes a plurality of parallel instruction pipes, a register file includes at least one shared read port configured to be shared across multiple pipes of the plurality of parallel instruction pipes. Control logic controls multiple parallel instruction pipes to read from the at least one shared read port. In certain examples, the at least one shared register file read port is coupled as a single read port for one of the parallel instruction pipes and as a shared register file read port for a plurality of other parallel instruction pipes.Type: GrantFiled: July 30, 2021Date of Patent: April 16, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael Estlick, Erik Swanson, Eric Dixon, Todd Baumgartner
-
Patent number: 11960887Abstract: Techniques related to packing pieces of data having variable bit lengths to serial packed data using a graphics processing unit and a central processing unit are discussed. Such techniques include executing bit shift operations for the pieces of data in parallel via execution units of the graphics processing unit and packing the bit shifted pieces of data via the central processing unit.Type: GrantFiled: March 3, 2020Date of Patent: April 16, 2024Assignee: Intel CorporationInventors: Bin Wang, Bo Peng
-
Patent number: 11960884Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.Type: GrantFiled: November 2, 2021Date of Patent: April 16, 2024Assignee: Intel CorporationInventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
-
Patent number: 11960889Abstract: A method and system for moving data from a source memory to a destination memory by a processor is disclosed herein. The destination memory stores a sequence of instructions and the sequence of instructions comprises one or more load instructions and one or more store instructions. The processor initially moves the one or more store instructions from the destination memory to the source memory. The processor then executes the one or more load instructions from the destination memory. On executing the one or more load instructions, the data is loaded from the source memory to at least one register in the processor. The processor further initiates execution of the one or more store instructions stored in the source memory. On executing the one or more store instructions from the source memory, the processor stores the data from the at least one register to the destination memory.Type: GrantFiled: March 25, 2021Date of Patent: April 16, 2024Assignee: Nordic Semiconductor ASAInventor: Chris Smith
-
Patent number: 11954497Abstract: A method and system for moving data from a source memory to a destination memory by a processor are disclosed. The processor has a plurality of registers and the source memory stores a sequence of instructions that include one or more load instructions and one or more store instructions. The processor moves the load instructions from the source memory to the destination memory. Then, the processor initiates execution of the load instructions from the destination memory in order to load the data from the source memory to one or more registers in the processor. Execution then returns to the sequence of instructions stored in the source memory, and the processor stores the data from the registers to the destination memory.Type: GrantFiled: March 25, 2021Date of Patent: April 9, 2024Assignee: Nordic Semiconductor ASAInventor: Chris Smith
-
Patent number: 11947486Abstract: The computing efficiency of an electronic computing device is improved. HPCs 20 to 23 include arithmetic processing units HA0 to HA3, respectively. Each of the arithmetic processing units HA0 to HA3 executes arithmetic processing in parallel. LPCs 30 to 33 includes management processing units LB0 to LB3, respectively. Each of the management processing units LB0 to LB3 manages execution of specific processing by an accelerator 6 when each of the arithmetic processing units HA0 to HA3 causes the accelerator 6 to execute the specific processing, and performs a series of commands for causing the accelerator 6 to execute the specific processing on a DMA controller 5 and the accelerator 6.Type: GrantFiled: March 25, 2020Date of Patent: April 2, 2024Assignee: Hitachi Astemo, Ltd.Inventors: Tatsuya Horiguchi, Tasuku Ishigooka, Kazuyoshi Serizawa, Tsunamichi Tsukidate
-
Patent number: 11940945Abstract: An exemplary SIMD computing system comprises a SIMD processing element (SPE) configured to perform a selected operation on a portion of a processor input data word, with the operation selected by control signals read from a control memory location addressed by a decoded instruction. The SPE may comprise one or more adder, multiplier, or multiplexer coupled to the control signals. The control signals may comprise one or more bit read from the control memory. The control memory may be an M×N (M rows by N columns) memory having M possible SIMD operations and N control signals. Each instruction decoded may select an SPE operation from among N rows. A plurality of SPEs may receive the same control signals. The control memory may be rewritable, advantageously permitting customizable SIMD operations that are reconfigurable by storing in the control memory locations control signals designed to cause the SPE to perform selected operations.Type: GrantFiled: December 31, 2021Date of Patent: March 26, 2024Inventor: Heonchul Park
-
Patent number: 11934837Abstract: An SIMD instruction generation and processing method and a related device are provided. The method may include: obtaining a length of each loop dimension of a first tensor formula; selecting, from a plurality of groups of information about a first SIMD instruction model based on the length of each loop dimension of a first tensor formula, information about a second SIMD instruction model matching the first tensor formula; generating, based on a length of at least one loop dimension of the first tensor formula and the second SIMD instruction model, a first SIMD instruction obtained after the first tensor formula is converted. The information about a second SIMD instruction model is selected from the plurality of groups of information about a first SIMD instruction model based on the length of each loop dimension of the tensor formula.Type: GrantFiled: September 12, 2022Date of Patent: March 19, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Chen Wu, Yifan Lin, Xiaoqiang Dan
-
Patent number: 11928472Abstract: Methods and apparatus relating to branch prefetch mechanisms for mitigating front-end branch resteers are described. In an embodiment, predecodes an entry in a cache to generate a predecoded branch operation. The entry is associated with a cold branch operation, where the cold branch operation corresponds to an operation that is detected for a first time after storage in an instruction cache and wherein the cold branch operation remains undecoded since it is stored at a location in a cache line prior to a subsequent location of a branch operation in the cache line. The predecoded branch operation is stored in a Branch Prefetch Buffer (BPB) in response to a cache line fill operation of the cold branch operation in an instruction cache. Other embodiments are also disclosed and claimed.Type: GrantFiled: September 26, 2020Date of Patent: March 12, 2024Assignee: Intel CorporationInventors: Gilles Pokam, Jared Warner Stark, IV, Niranjan Kumar Soundararajan, Oleg Ladin
-
Patent number: 11928475Abstract: An exemplary fault-tolerant computing system comprises a secondary processor configured to execute in delayed lock step with a primary processor from a common program store, comparators in the store data and writeback paths to detect a fault based on comparing primary and secondary processor states, and a writeback path delay permitting aborting execution when a fault is detected, before writeback of invalid data. The secondary processor execution and the primary processor store data and writeback may be delayed a predetermined number of cycles, permitting fault detection before writing invalid data. Store data and writeback paths may include triple module redundancy configured to pass only majority data through the store data and writeback path delay stages. Some implementations may forward data from the store data path delay stages to the writeback stage or memory if the load data address matches the address of data in a store data path delay stage.Type: GrantFiled: November 5, 2021Date of Patent: March 12, 2024Assignee: Ceremorphic, Inc.Inventor: Heonchul Park
-
Patent number: 11922166Abstract: A Very Long Instruction Word (VLIW) digital signal processor particularly adapted for single instruction multiple data (SIMD) operation on various operand widths and data sizes. A vector compare instruction compares first and second operands and stores compare bits. A companion vector conditional instruction performs conditional operations based upon the state of a corresponding predicate data register bit. A predicate unit performs data processing operations on data in at least one predicate data register including unary operations and binary operations. The predicate unit may also transfer data between a general data register file and the predicate data register file.Type: GrantFiled: January 17, 2023Date of Patent: March 5, 2024Assignee: Texas Instruments IncorporatedInventors: Timothy David Anderson, Duc Quang Bui, Mujibur Rahman, Joseph Raymond Michael Zbiciak, Eric Biscondi, Peter Dent, Jelena Milanovic, Ashish Shrivastava
-
Patent number: 11922167Abstract: Disclosed herein is a method for managing of NOP instructions in a microcontroller, the method comprising duplicating all jump instructions causing a NOP instruction to form a new instruction set; inserting an internal NOP instruction into each of the jump instructions; when a jump instruction is executed, executing a subsequent instruction of the new instruction set; and executing the internal NOP instruction when an execution of the subsequent instruction is skipped.Type: GrantFiled: February 24, 2023Date of Patent: March 5, 2024Assignee: SK hynix Inc.Inventors: Giulio Martinozzi, Federica Arosio, Lorenzo Di Lalla
-
Patent number: 11922293Abstract: An apparatus for identification of an input data against one or more learned signals is provided. The apparatus comprising a number of computational cores, each core comprises properties having at least some statistical independency from other of the computational, the properties being set independently of each other core, each core being able to independently produce an output indicating recognition of a previously learned signal, the apparatus being further configured to process the produced outputs from the number of computational cores and determining an identification of the input data based the produced outputs.Type: GrantFiled: September 16, 2019Date of Patent: March 5, 2024Assignee: Cortica Ltd.Inventors: Igal Raichelgauz, Karina Odinaev, Yehoshua Y. Zeevi
-
Patent number: 11915004Abstract: A data processing apparatus is provided that includes bimodal control flow prediction circuitry for performing a prediction of whether a conditional control flow instruction will be taken. Storage circuitry stores, in association with the control flow instruction, a stored state of the data processing apparatus and reversal circuitry reverses the prediction in dependence on the stored state of the data processing apparatus corresponding with a current state of the data processing apparatus when execution of the control flow instruction is to be performed.Type: GrantFiled: December 20, 2021Date of Patent: February 27, 2024Assignee: Arm LimitedInventors: Houdhaifa Bouzguarrou, Thibaut Elie Lanois, Guillaume Bolbenes
-
Patent number: 11907722Abstract: Aspects of the present disclosure relate to an apparatus comprising processing circuitry, prefetch circuitry and prefetch metadata storage comprising a plurality of entries. Metadata items, each associated with a given stream of instructions, are stored in the prefetch metadata storage. Responsive to a given entry of the plurality of entries being associated with the given stream associated with a given metadata item, the given entry is updated. Responsive to no entry of the plurality of entries being associated with the given stream associated with a given metadata item, an entry is selected according to a default replacement policy, the given stream is allocated thereto, and the selected entry is updated based on the given metadata item. Responsive to a switch condition being met, the default selection policy is switched to an alternative selection policy comprising locking one or more entries by preventing allocation of streams to the locked entries.Type: GrantFiled: April 20, 2022Date of Patent: February 20, 2024Assignee: Arm LimitedInventors: Luca Maroncelli, Harvin Iriawan, Peter Raphael Eid, Cédric Denis Robert Airaud
-
Patent number: 11907725Abstract: A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.Type: GrantFiled: February 3, 2023Date of Patent: February 20, 2024Assignee: GRAPHCORE LIMITEDInventors: Richard Osborne, Matthew Fyles
-
Patent number: 11907723Abstract: A data processing apparatus is provided. Rename circuitry performs a register rename stage of a pipeline by storing, in storage circuitry, mappings between registers. Each of the mappings is associated with an elimination field value. Operation elimination circuitry replaces an operation that indicates an action is to be performed on data from a source register and stored in a destination register, with a new mapping in the storage circuitry that references the destination register and has the elimination field value set. Operation circuitry responds to a subsequent operation that accesses the destination register when the elimination field value is set; by obtaining contents of the source register, performing the action on the contents to obtain a result, and returning the result.Type: GrantFiled: March 21, 2022Date of Patent: February 20, 2024Assignee: Arm LimitedInventors: Nicholas Andrew Plante, Joseph Michael Pusdesris, Jungsoo Kim
-
Patent number: 11907720Abstract: There is provided a data processing apparatus comprising a plurality of registers, each of the registers having data bits to store data and metadata bits to store metadata. Each of the registers is adapted to operate in a metadata mode in which the metadata bits and the data bits are valid, and a data mode in which the data bits are valid and the metadata bits are invalid. Mode bit storage circuitry indicates whether each of the registers is in the data mode or the metadata mode. Execution circuitry is responsive to a memory operation that is a store operation on one or more given registers.Type: GrantFiled: November 26, 2020Date of Patent: February 20, 2024Assignee: Arm LimitedInventors: Bradley John Smith, Thomas Christopher Grocutt
-
Patent number: 11907717Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.Type: GrantFiled: February 8, 2023Date of Patent: February 20, 2024Assignee: NVIDIA CorporationInventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz