Patents Examined by Corey S Faherty
  • Patent number: 12001887
    Abstract: Systems, methods, and apparatuses relating to one or more instructions for element aligning of a tile of a matrix operations accelerator are described.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: June 4, 2024
    Assignee: Intel Corporation
    Inventor: Elmoustapha Ould-Ahmed-Vall
  • Patent number: 12001385
    Abstract: Systems, methods, and apparatuses relating to one or more instructions for loading a tile of a matrix operations accelerator are described.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: June 4, 2024
    Assignee: Intel Corporation
    Inventor: Elmoustapha Ould-Ahmed-Vall
  • Patent number: 12001383
    Abstract: Embodiments are directed to a deterministic streaming system with one or more deterministic streaming processors each having an array of processing elements and a first deterministic memory coupled to the processing elements. The deterministic streaming system further includes a second deterministic memory with multiple data banks having a global memory address space, and a controller. The controller initiates retrieval of first data from the data banks of the second deterministic memory as a first plurality of streams, each stream of the first plurality of streams streaming toward a respective group of processing elements of the array of processing elements. The controller further initiates writing of second data to the data banks of the second deterministic memory as a second plurality of streams, each stream of the second plurality of streams streaming from the respective group of processing elements toward a respective data bank of the second deterministic memory.
    Type: Grant
    Filed: July 6, 2022
    Date of Patent: June 4, 2024
    Assignee: Groq, Inc.
    Inventor: Dennis Charles Abts
  • Patent number: 12001842
    Abstract: Methods and apparatuses relating to switching of a shadow stack pointer are described. In one embodiment, a hardware processor includes a hardware decode unit to decode an instruction, and a hardware execution unit to execute the instruction to: pop a token for a thread from a shadow stack, wherein the token includes a shadow stack pointer for the thread with at least one least significant bit (LSB) of the shadow stack pointer overwritten with a bit value of an operating mode of the hardware processor for the thread, remove the bit value in the at least one LSB from the token to generate the shadow stack pointer, and set a current shadow stack pointer to the shadow stack pointer from the token when the operating mode from the token matches a current operating mode of the hardware processor.
    Type: Grant
    Filed: May 26, 2023
    Date of Patent: June 4, 2024
    Assignee: Intel Corporation
    Inventors: Vedvyas Shanbhogue, Jason W. Brandt, Ravi L. Sahita, Barry E. Huntley, Baiju V. Patel, Deepak K. Gupta
  • Patent number: 11995445
    Abstract: Provided is a method for assigning register tags to instructions at issue time. The method comprises receiving an instruction for execution by a microprocessor. The method further comprises dispatching the instruction to an issue queue without assigning a register tag to the instruction. The method further comprises determining that the instruction is ready to issue. In response to determining that the instruction is ready to issue, the method comprises assigning an available register tag to the instruction. The method further comprises issuing the instruction.
    Type: Grant
    Filed: October 31, 2022
    Date of Patent: May 28, 2024
    Assignee: International Business Machines Corporation
    Inventors: Steven J. Battle, Jentje Leenstra, Brian D. Barrick, Dung Q. Nguyen, Brian W. Thompto
  • Patent number: 11989594
    Abstract: A method may include creating an association identifier based on an association between a computational device function and a compute engine of a computational device, and invoking an execute command to perform an execution of the computational device function using the compute engine, wherein the execute command uses the association identifier. The compute engine may be a first compute engine, and the association may be further between the computational device function and a second compute engine of the computational device. The execute command may perform an execution of the computational device function using the second compute engine. The execution of the computational device function using the first compute engine and the execution of the computational device function using the second compute engine may overlap. The execute command may include the association identifier. The creating the association identifier may include invoking a create association command.
    Type: Grant
    Filed: April 26, 2022
    Date of Patent: May 21, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: William Martin, Oscar P. Pinto
  • Patent number: 11983534
    Abstract: The present disclosure provides a computing method that is applied to a computing device. The computing device includes: a memory, a register unit, and a matrix computing unit. The method includes the following steps: controlling, by the computing device, the matrix computing unit to obtain a first operation instruction, where the first operation instruction includes a matrix reading instruction for a matrix required for executing the instruction; controlling, by the computing device, an operating unit to send a reading command to the memory according to the matrix reading instruction; and controlling, by the computing device, the operating unit to read a matrix corresponding to the matrix reading instruction in a batch reading manner, and executing the first operation instruction on the matrix. The technical solutions in the present disclosure have the advantages of fast computing speed and high efficiency.
    Type: Grant
    Filed: September 5, 2022
    Date of Patent: May 14, 2024
    Assignee: CAMBRICON (XI'AN) SEMICONDUCTOR CO., LTD.
    Inventors: Tianshi Chen, Shaoli Liu, Zai Wang, Shuai Hu
  • Patent number: 11977888
    Abstract: A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.
    Type: Grant
    Filed: February 22, 2023
    Date of Patent: May 7, 2024
    Assignee: NVIDIA Corporation
    Inventors: Jeffrey Michael Pool, Andrew Kerr, John Tran, Ming Y. Siu, Stuart Oberman
  • Patent number: 11977889
    Abstract: Herein is innovative control flow integrity (CFI) based on code generation techniques that instrument data protection for access control of subroutines invoked across module boundaries. This approach is counterintuitive because, even though code is stored separately from data, access control to the data is used to provide access control to the code. In an embodiment, an instrumentation computer generates, at the beginning of a subroutine that is implemented in machine instructions, a prologue that contains: a first instruction of the subroutine that indicates that the first instruction is a target of a control flow branch and a second instruction of the subroutine that verifies that a memory address is accessible. Generated in the machine instructions are instruction(s) that, when executed by a processor, cause the memory address to have limited accessibility. Some code generation may be performed at the start of runtime by a loader or a dynamic linker.
    Type: Grant
    Filed: August 5, 2022
    Date of Patent: May 7, 2024
    Assignee: Oracle International Corporation
    Inventors: Matthias Neugschwandtner, William Blair
  • Patent number: 11971846
    Abstract: A logic unit in an array of processing units is configurable to consume source tokens and a status signal and to produce barrier tokens and an enable signal based on the source tokens and the status signal.
    Type: Grant
    Filed: February 14, 2023
    Date of Patent: April 30, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Manish K. Shah, Ram Sivaramakrishnan, Pramod Nataraja, David Brian Jackson, Gregory Frederick Grohoski
  • Patent number: 11966739
    Abstract: There is provided an apparatus, method and medium for data processing. The apparatus comprises a register file comprising a plurality of data registers, and frontend circuitry responsive to an issued instruction, to control processing circuitry to perform a processing operation to process an input data item to generate an output data item. The processing circuitry is responsive to a first encoding of the issued instruction specifying a data register, to read the input data item from the data register, and/or write the output data item to the data register. The processing circuitry is responsive to a second encoding of the issued instruction specifying a buffer-region of the register file for storing a queue of data items, to perform the processing operation and to perform a dequeue operation to dequeue the input data item from the queue, and/or perform an enqueue operation to enqueue the output data item to the queue.
    Type: Grant
    Filed: September 9, 2022
    Date of Patent: April 23, 2024
    Assignee: Arm Limited
    Inventors: Matthew James Walker, Mbou Eyole, Giacomo Gabrielli, Balaji Venu
  • Patent number: 11960884
    Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.
    Type: Grant
    Filed: November 2, 2021
    Date of Patent: April 16, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
  • Patent number: 11960897
    Abstract: In some implementations, a processor includes a plurality of parallel instruction pipes, a register file includes at least one shared read port configured to be shared across multiple pipes of the plurality of parallel instruction pipes. Control logic controls multiple parallel instruction pipes to read from the at least one shared read port. In certain examples, the at least one shared register file read port is coupled as a single read port for one of the parallel instruction pipes and as a shared register file read port for a plurality of other parallel instruction pipes.
    Type: Grant
    Filed: July 30, 2021
    Date of Patent: April 16, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Michael Estlick, Erik Swanson, Eric Dixon, Todd Baumgartner
  • Patent number: 11960887
    Abstract: Techniques related to packing pieces of data having variable bit lengths to serial packed data using a graphics processing unit and a central processing unit are discussed. Such techniques include executing bit shift operations for the pieces of data in parallel via execution units of the graphics processing unit and packing the bit shifted pieces of data via the central processing unit.
    Type: Grant
    Filed: March 3, 2020
    Date of Patent: April 16, 2024
    Assignee: Intel Corporation
    Inventors: Bin Wang, Bo Peng
  • Patent number: 11960889
    Abstract: A method and system for moving data from a source memory to a destination memory by a processor is disclosed herein. The destination memory stores a sequence of instructions and the sequence of instructions comprises one or more load instructions and one or more store instructions. The processor initially moves the one or more store instructions from the destination memory to the source memory. The processor then executes the one or more load instructions from the destination memory. On executing the one or more load instructions, the data is loaded from the source memory to at least one register in the processor. The processor further initiates execution of the one or more store instructions stored in the source memory. On executing the one or more store instructions from the source memory, the processor stores the data from the at least one register to the destination memory.
    Type: Grant
    Filed: March 25, 2021
    Date of Patent: April 16, 2024
    Assignee: Nordic Semiconductor ASA
    Inventor: Chris Smith
  • Patent number: 11954497
    Abstract: A method and system for moving data from a source memory to a destination memory by a processor are disclosed. The processor has a plurality of registers and the source memory stores a sequence of instructions that include one or more load instructions and one or more store instructions. The processor moves the load instructions from the source memory to the destination memory. Then, the processor initiates execution of the load instructions from the destination memory in order to load the data from the source memory to one or more registers in the processor. Execution then returns to the sequence of instructions stored in the source memory, and the processor stores the data from the registers to the destination memory.
    Type: Grant
    Filed: March 25, 2021
    Date of Patent: April 9, 2024
    Assignee: Nordic Semiconductor ASA
    Inventor: Chris Smith
  • Patent number: 11947486
    Abstract: The computing efficiency of an electronic computing device is improved. HPCs 20 to 23 include arithmetic processing units HA0 to HA3, respectively. Each of the arithmetic processing units HA0 to HA3 executes arithmetic processing in parallel. LPCs 30 to 33 includes management processing units LB0 to LB3, respectively. Each of the management processing units LB0 to LB3 manages execution of specific processing by an accelerator 6 when each of the arithmetic processing units HA0 to HA3 causes the accelerator 6 to execute the specific processing, and performs a series of commands for causing the accelerator 6 to execute the specific processing on a DMA controller 5 and the accelerator 6.
    Type: Grant
    Filed: March 25, 2020
    Date of Patent: April 2, 2024
    Assignee: Hitachi Astemo, Ltd.
    Inventors: Tatsuya Horiguchi, Tasuku Ishigooka, Kazuyoshi Serizawa, Tsunamichi Tsukidate
  • Patent number: 11940945
    Abstract: An exemplary SIMD computing system comprises a SIMD processing element (SPE) configured to perform a selected operation on a portion of a processor input data word, with the operation selected by control signals read from a control memory location addressed by a decoded instruction. The SPE may comprise one or more adder, multiplier, or multiplexer coupled to the control signals. The control signals may comprise one or more bit read from the control memory. The control memory may be an M×N (M rows by N columns) memory having M possible SIMD operations and N control signals. Each instruction decoded may select an SPE operation from among N rows. A plurality of SPEs may receive the same control signals. The control memory may be rewritable, advantageously permitting customizable SIMD operations that are reconfigurable by storing in the control memory locations control signals designed to cause the SPE to perform selected operations.
    Type: Grant
    Filed: December 31, 2021
    Date of Patent: March 26, 2024
    Inventor: Heonchul Park
  • Patent number: 11934837
    Abstract: An SIMD instruction generation and processing method and a related device are provided. The method may include: obtaining a length of each loop dimension of a first tensor formula; selecting, from a plurality of groups of information about a first SIMD instruction model based on the length of each loop dimension of a first tensor formula, information about a second SIMD instruction model matching the first tensor formula; generating, based on a length of at least one loop dimension of the first tensor formula and the second SIMD instruction model, a first SIMD instruction obtained after the first tensor formula is converted. The information about a second SIMD instruction model is selected from the plurality of groups of information about a first SIMD instruction model based on the length of each loop dimension of the tensor formula.
    Type: Grant
    Filed: September 12, 2022
    Date of Patent: March 19, 2024
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Chen Wu, Yifan Lin, Xiaoqiang Dan
  • Patent number: 11928475
    Abstract: An exemplary fault-tolerant computing system comprises a secondary processor configured to execute in delayed lock step with a primary processor from a common program store, comparators in the store data and writeback paths to detect a fault based on comparing primary and secondary processor states, and a writeback path delay permitting aborting execution when a fault is detected, before writeback of invalid data. The secondary processor execution and the primary processor store data and writeback may be delayed a predetermined number of cycles, permitting fault detection before writing invalid data. Store data and writeback paths may include triple module redundancy configured to pass only majority data through the store data and writeback path delay stages. Some implementations may forward data from the store data path delay stages to the writeback stage or memory if the load data address matches the address of data in a store data path delay stage.
    Type: Grant
    Filed: November 5, 2021
    Date of Patent: March 12, 2024
    Assignee: Ceremorphic, Inc.
    Inventor: Heonchul Park