Patents Examined by Corey S Faherty

Apparatuses, methods, and systems for instructions for aligning tiles of a matrix operations accelerator

Patent number: 12001887

Abstract: Systems, methods, and apparatuses relating to one or more instructions for element aligning of a tile of a matrix operations accelerator are described.

Type: Grant

Filed: December 24, 2020

Date of Patent: June 4, 2024

Assignee: Intel Corporation

Inventor: Elmoustapha Ould-Ahmed-Vall
Apparatuses, methods, and systems for instructions for loading a tile of a matrix operations accelerator

Patent number: 12001385

Abstract: Systems, methods, and apparatuses relating to one or more instructions for loading a tile of a matrix operations accelerator are described.

Type: Grant

Filed: December 24, 2020

Date of Patent: June 4, 2024

Assignee: Intel Corporation

Inventor: Elmoustapha Ould-Ahmed-Vall
Deterministic memory for tensor streaming processors

Patent number: 12001383

Abstract: Embodiments are directed to a deterministic streaming system with one or more deterministic streaming processors each having an array of processing elements and a first deterministic memory coupled to the processing elements. The deterministic streaming system further includes a second deterministic memory with multiple data banks having a global memory address space, and a controller. The controller initiates retrieval of first data from the data banks of the second deterministic memory as a first plurality of streams, each stream of the first plurality of streams streaming toward a respective group of processing elements of the array of processing elements. The controller further initiates writing of second data to the data banks of the second deterministic memory as a second plurality of streams, each stream of the second plurality of streams streaming from the respective group of processing elements toward a respective data bank of the second deterministic memory.

Type: Grant

Filed: July 6, 2022

Date of Patent: June 4, 2024

Assignee: Groq, Inc.

Inventor: Dennis Charles Abts
Hardware apparatuses and methods to switch shadow stack pointers

Patent number: 12001842

Abstract: Methods and apparatuses relating to switching of a shadow stack pointer are described. In one embodiment, a hardware processor includes a hardware decode unit to decode an instruction, and a hardware execution unit to execute the instruction to: pop a token for a thread from a shadow stack, wherein the token includes a shadow stack pointer for the thread with at least one least significant bit (LSB) of the shadow stack pointer overwritten with a bit value of an operating mode of the hardware processor for the thread, remove the bit value in the at least one LSB from the token to generate the shadow stack pointer, and set a current shadow stack pointer to the shadow stack pointer from the token when the operating mode from the token matches a current operating mode of the hardware processor.

Type: Grant

Filed: May 26, 2023

Date of Patent: June 4, 2024

Assignee: Intel Corporation

Inventors: Vedvyas Shanbhogue, Jason W. Brandt, Ravi L. Sahita, Barry E. Huntley, Baiju V. Patel, Deepak K. Gupta
Assignment of microprocessor register tags at issue time

Patent number: 11995445

Abstract: Provided is a method for assigning register tags to instructions at issue time. The method comprises receiving an instruction for execution by a microprocessor. The method further comprises dispatching the instruction to an issue queue without assigning a register tag to the instruction. The method further comprises determining that the instruction is ready to issue. In response to determining that the instruction is ready to issue, the method comprises assigning an available register tag to the instruction. The method further comprises issuing the instruction.

Type: Grant

Filed: October 31, 2022

Date of Patent: May 28, 2024

Assignee: International Business Machines Corporation

Inventors: Steven J. Battle, Jentje Leenstra, Brian D. Barrick, Dung Q. Nguyen, Brian W. Thompto
Systems, methods, and apparatus for associating computational device functions with compute engines

Patent number: 11989594

Abstract: A method may include creating an association identifier based on an association between a computational device function and a compute engine of a computational device, and invoking an execute command to perform an execution of the computational device function using the compute engine, wherein the execute command uses the association identifier. The compute engine may be a first compute engine, and the association may be further between the computational device function and a second compute engine of the computational device. The execute command may perform an execution of the computational device function using the second compute engine. The execution of the computational device function using the first compute engine and the execution of the computational device function using the second compute engine may overlap. The execute command may include the association identifier. The creating the association identifier may include invoking a create association command.

Type: Grant

Filed: April 26, 2022

Date of Patent: May 21, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: William Martin, Oscar P. Pinto
Calculation method and related product

Patent number: 11983534

Abstract: The present disclosure provides a computing method that is applied to a computing device. The computing device includes: a memory, a register unit, and a matrix computing unit. The method includes the following steps: controlling, by the computing device, the matrix computing unit to obtain a first operation instruction, where the first operation instruction includes a matrix reading instruction for a matrix required for executing the instruction; controlling, by the computing device, an operating unit to send a reading command to the memory according to the matrix reading instruction; and controlling, by the computing device, the operating unit to read a matrix corresponding to the matrix reading instruction in a batch reading manner, and executing the first operation instruction on the matrix. The technical solutions in the present disclosure have the advantages of fast computing speed and high efficiency.

Type: Grant

Filed: September 5, 2022

Date of Patent: May 14, 2024

Assignee: CAMBRICON (XI'AN) SEMICONDUCTOR CO., LTD.

Inventors: Tianshi Chen, Shaoli Liu, Zai Wang, Shuai Hu
Inline data inspection for workload simplification

Patent number: 11977888

Abstract: A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.

Type: Grant

Filed: February 22, 2023

Date of Patent: May 7, 2024

Assignee: NVIDIA Corporation

Inventors: Jeffrey Michael Pool, Andrew Kerr, John Tran, Ming Y. Siu, Stuart Oberman
Method for control flow isolation with protection keys and indirect branch tracking

Patent number: 11977889

Abstract: Herein is innovative control flow integrity (CFI) based on code generation techniques that instrument data protection for access control of subroutines invoked across module boundaries. This approach is counterintuitive because, even though code is stored separately from data, access control to the data is used to provide access control to the code. In an embodiment, an instrumentation computer generates, at the beginning of a subroutine that is implemented in machine instructions, a prologue that contains: a first instruction of the subroutine that indicates that the first instruction is a target of a control flow branch and a second instruction of the subroutine that verifies that a memory address is accessible. Generated in the machine instructions are instruction(s) that, when executed by a processor, cause the memory address to have limited accessibility. Some code generation may be performed at the start of runtime by a loader or a dynamic linker.

Type: Grant

Filed: August 5, 2022

Date of Patent: May 7, 2024

Assignee: Oracle International Corporation

Inventors: Matthias Neugschwandtner, William Blair
Logic unit for a reconfigurable processor

Patent number: 11971846

Abstract: A logic unit in an array of processing units is configurable to consume source tokens and a status signal and to produce barrier tokens and an enable signal based on the source tokens and the status signal.

Type: Grant

Filed: February 14, 2023

Date of Patent: April 30, 2024

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Manish K. Shah, Ram Sivaramakrishnan, Pramod Nataraja, David Brian Jackson, Gregory Frederick Grohoski
Processing of issued instructions

Patent number: 11966739

Abstract: There is provided an apparatus, method and medium for data processing. The apparatus comprises a register file comprising a plurality of data registers, and frontend circuitry responsive to an issued instruction, to control processing circuitry to perform a processing operation to process an input data item to generate an output data item. The processing circuitry is responsive to a first encoding of the issued instruction specifying a data register, to read the input data item from the data register, and/or write the output data item to the data register. The processing circuitry is responsive to a second encoding of the issued instruction specifying a buffer-region of the register file for storing a queue of data items, to perform the processing operation and to perform a dequeue operation to dequeue the input data item from the queue, and/or perform an enqueue operation to enqueue the output data item to the queue.

Type: Grant

Filed: September 9, 2022

Date of Patent: April 23, 2024

Assignee: Arm Limited

Inventors: Matthew James Walker, Mbou Eyole, Giacomo Gabrielli, Balaji Venu
Apparatus and method for complex multiplication

Patent number: 11960884

Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

Type: Grant

Filed: November 2, 2021

Date of Patent: April 16, 2024

Assignee: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
Apparatus and methods employing a shared read post register file

Patent number: 11960897

Abstract: In some implementations, a processor includes a plurality of parallel instruction pipes, a register file includes at least one shared read port configured to be shared across multiple pipes of the plurality of parallel instruction pipes. Control logic controls multiple parallel instruction pipes to read from the at least one shared read port. In certain examples, the at least one shared register file read port is coupled as a single read port for one of the parallel instruction pipes and as a shared register file read port for a plurality of other parallel instruction pipes.

Type: Grant

Filed: July 30, 2021

Date of Patent: April 16, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Michael Estlick, Erik Swanson, Eric Dixon, Todd Baumgartner
Graphics processing unit and central processing unit cooperative variable length data bit packing

Patent number: 11960887

Abstract: Techniques related to packing pieces of data having variable bit lengths to serial packed data using a graphics processing unit and a central processing unit are discussed. Such techniques include executing bit shift operations for the pieces of data in parallel via execution units of the graphics processing unit and packing the bit shifted pieces of data via the central processing unit.

Type: Grant

Filed: March 3, 2020

Date of Patent: April 16, 2024

Assignee: Intel Corporation

Inventors: Bin Wang, Bo Peng
Method and system for optimizing data transfer from one memory to another memory

Patent number: 11960889

Abstract: A method and system for moving data from a source memory to a destination memory by a processor is disclosed herein. The destination memory stores a sequence of instructions and the sequence of instructions comprises one or more load instructions and one or more store instructions. The processor initially moves the one or more store instructions from the destination memory to the source memory. The processor then executes the one or more load instructions from the destination memory. On executing the one or more load instructions, the data is loaded from the source memory to at least one register in the processor. The processor further initiates execution of the one or more store instructions stored in the source memory. On executing the one or more store instructions from the source memory, the processor stores the data from the at least one register to the destination memory.

Type: Grant

Filed: March 25, 2021

Date of Patent: April 16, 2024

Assignee: Nordic Semiconductor ASA

Inventor: Chris Smith
Method and system for optimizing data transfer from one memory to another memory

Patent number: 11954497

Abstract: A method and system for moving data from a source memory to a destination memory by a processor are disclosed. The processor has a plurality of registers and the source memory stores a sequence of instructions that include one or more load instructions and one or more store instructions. The processor moves the load instructions from the source memory to the destination memory. Then, the processor initiates execution of the load instructions from the destination memory in order to load the data from the source memory to one or more registers in the processor. Execution then returns to the sequence of instructions stored in the source memory, and the processor stores the data from the registers to the destination memory.

Type: Grant

Filed: March 25, 2021

Date of Patent: April 9, 2024

Assignee: Nordic Semiconductor ASA

Inventor: Chris Smith
Electronic computing device having improved computing efficiency

Patent number: 11947486

Abstract: The computing efficiency of an electronic computing device is improved. HPCs 20 to 23 include arithmetic processing units HA0 to HA3, respectively. Each of the arithmetic processing units HA0 to HA3 executes arithmetic processing in parallel. LPCs 30 to 33 includes management processing units LB0 to LB3, respectively. Each of the management processing units LB0 to LB3 manages execution of specific processing by an accelerator 6 when each of the arithmetic processing units HA0 to HA3 causes the accelerator 6 to execute the specific processing, and performs a series of commands for causing the accelerator 6 to execute the specific processing on a DMA controller 5 and the accelerator 6.

Type: Grant

Filed: March 25, 2020

Date of Patent: April 2, 2024

Assignee: Hitachi Astemo, Ltd.

Inventors: Tatsuya Horiguchi, Tasuku Ishigooka, Kazuyoshi Serizawa, Tsunamichi Tsukidate
Reconfigurable SIMD engine

Patent number: 11940945

Abstract: An exemplary SIMD computing system comprises a SIMD processing element (SPE) configured to perform a selected operation on a portion of a processor input data word, with the operation selected by control signals read from a control memory location addressed by a decoded instruction. The SPE may comprise one or more adder, multiplier, or multiplexer coupled to the control signals. The control signals may comprise one or more bit read from the control memory. The control memory may be an M×N (M rows by N columns) memory having M possible SIMD operations and N control signals. Each instruction decoded may select an SPE operation from among N rows. A plurality of SPEs may receive the same control signals. The control memory may be rewritable, advantageously permitting customizable SIMD operations that are reconfigurable by storing in the control memory locations control signals designed to cause the SPE to perform selected operations.

Type: Grant

Filed: December 31, 2021

Date of Patent: March 26, 2024

Inventor: Heonchul Park
Single instruction multiple data SIMD instruction generation and processing method and related device

Patent number: 11934837

Abstract: An SIMD instruction generation and processing method and a related device are provided. The method may include: obtaining a length of each loop dimension of a first tensor formula; selecting, from a plurality of groups of information about a first SIMD instruction model based on the length of each loop dimension of a first tensor formula, information about a second SIMD instruction model matching the first tensor formula; generating, based on a length of at least one loop dimension of the first tensor formula and the second SIMD instruction model, a first SIMD instruction obtained after the first tensor formula is converted. The information about a second SIMD instruction model is selected from the plurality of groups of information about a first SIMD instruction model based on the length of each loop dimension of the tensor formula.

Type: Grant

Filed: September 12, 2022

Date of Patent: March 19, 2024

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Chen Wu, Yifan Lin, Xiaoqiang Dan
Fast recovery for dual core lock step

Patent number: 11928475

Abstract: An exemplary fault-tolerant computing system comprises a secondary processor configured to execute in delayed lock step with a primary processor from a common program store, comparators in the store data and writeback paths to detect a fault based on comparing primary and secondary processor states, and a writeback path delay permitting aborting execution when a fault is detected, before writeback of invalid data. The secondary processor execution and the primary processor store data and writeback may be delayed a predetermined number of cycles, permitting fault detection before writing invalid data. Store data and writeback paths may include triple module redundancy configured to pass only majority data through the store data and writeback path delay stages. Some implementations may forward data from the store data path delay stages to the writeback stage or memory if the load data address matches the address of data in a store data path delay stage.

Type: Grant

Filed: November 5, 2021

Date of Patent: March 12, 2024

Assignee: Ceremorphic, Inc.

Inventor: Heonchul Park

prev 1 2 3 4 5 6 7 8 … next