Patents Examined by Michael J Metzger

Thread commencement and completion using work descriptor packets in a system having a self-scheduling processor and a hybrid threading fabric

Patent number: 11513837

Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.

Type: Grant

Filed: April 30, 2019

Date of Patent: November 29, 2022

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Thread creation on local or remote compute elements by a multi-threaded, self-scheduling processor

Patent number: 11513840

Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.

Type: Grant

Filed: April 30, 2019

Date of Patent: November 29, 2022

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Method and device for dynamically adjusting decimal point positions in neural network computations

Patent number: 11507370

Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Type: Grant

Filed: December 16, 2019

Date of Patent: November 22, 2022

Assignee: Cambricon (Xi'an) Semiconductor Co., Ltd.

Inventors: Yao Zhang, Bingrui Wang
Processing of instructions fetched from memory

Patent number: 11507372

Abstract: An apparatus and method are provided for processing instructions fetched from memory. Decode circuitry is used to decode the fetched instructions in order to produce decoded instructions, and downstream circuitry then processes the decoded instructions in order to perform the operations specified by those decoded instructions. Dispatch circuitry is arranged to dispatch to the downstream circuitry up to N decoded instructions per dispatch cycle, and is arranged to determine, based on a given candidate sequence of decoded instructions being considered for dispatch in a given dispatch cycle, whether at least one resource conflict within the downstream circuitry would occur in the event that the given candidate sequence of decoded instructions is dispatched in the given dispatch cycle.

Type: Grant

Filed: October 7, 2020

Date of Patent: November 22, 2022

Assignee: Arm Limited

Inventors: Michael Brian Schinzler, Yasuo Ishii, Muhammad Umar Farooq, Jason Lee Setter
Processor device for executing SIMD instructions

Patent number: 11500632

Abstract: In a processor device according to the present invention, a memory access unit reads data to be processed from an external memory and writes the data to a first register group that a plurality of processors does not access among a plurality of register groups. A control unit sequentially makes each of the plurality of processors implement a same instruction, in parallel with changing an address of a register group that stores the data to be processed. A scheduler, based on specified scenario information, specifies an instruction to be implemented and a register group to be accessed for the plurality of processors, and specifies a register group to be written to among the plurality of register groups and data to be processed that is to be written for the memory access unit.

Type: Grant

Filed: April 23, 2019

Date of Patent: November 15, 2022

Assignee: ArchiTek Corporation

Inventor: Shuichi Takada
Synchronizing scheduling tasks with atomic ALU

Patent number: 11500677

Abstract: A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.

Type: Grant

Filed: November 3, 2020

Date of Patent: November 15, 2022

Assignee: Imagination Technologies Limited

Inventors: Ollie Mower, Yoong-Chert Foo
Devices, methods, and media for efficient data dependency management for in-order issue processors

Patent number: 11500641

Abstract: Methods, devices and media for efficient data dependency management for in-order issue processors are described. In various embodiments described herein, methods, devices and media are disclosed that provide techniques for managing RAW data dependencies between instructions in a constrained hardware environment. The described techniques include initial wait station allocation of write instructions, followed by wait station allocation conflict resolution methods that use a greedy algorithm to optimize a cost function based on the estimated latency of a single instruction. Efficient compilation and reduced execution time may be achieved in some embodiments. Methods and devices for compiling source code are described, as well as devices for executing the compiled machine code and media for storing compiled machine code.

Type: Grant

Filed: October 7, 2020

Date of Patent: November 15, 2022

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Hazem A. Abdelhafez, Ning Xie, Ahmed Mohammed ElShafiey Mohammed Eltantawy
System and method enabling one-hot neural networks on a machine learning compute platform

Patent number: 11481218

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands including a first operand and a second operand, the first operand including vector of one-hot coded weights and the second operand including a vector of input data; and a general-purpose graphics compute unit including a first logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform multiple operations on the first set of operands and the second set of operands.

Type: Grant

Filed: August 2, 2017

Date of Patent: October 25, 2022

Assignee: Intel Corporation

Inventors: Jianguo Li, Yurong Chen
Microthreading for accelerated deep learning

Patent number: 11475282

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of compute elements and routers performs flow-based computations on wavelets of data. Some instructions are performed in iterations, such as one iteration per element of a fabric vector or FIFO. When sources for an iteration of an instruction are unavailable, and/or there is insufficient space to store results of the iteration, indicators associated with operands of the instruction are checked to determine whether other work can be performed. In some scenarios, other work cannot be performed and processing stalls. Alternatively, information about the instruction is saved, the other work is performed, and sometime after the sources become available and/or sufficient space to store the results becomes available, the iteration is performed using the saved information.

Type: Grant

Filed: April 17, 2018

Date of Patent: October 18, 2022

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Gary R. Lauterbach, Srikanth Arekapudi
Look-up table read

Patent number: 11455169

Abstract: A digital data processor includes an instruction memory storing instructions specifying data processing operations and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and an instruction decoder to perform an operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the operation. The operational unit is configured to perform a table recall in response to a look up table read instruction by recalling data elements from a specified location and adjacent location to the specified location, in a specified number of at least one table and storing the recalled data elements in successive slots in a destination register. Recalled data elements include at least one interpolated data element in the adjacent location.

Type: Grant

Filed: September 13, 2019

Date of Patent: September 27, 2022

Assignee: Texas Instruments Incorporated

Inventors: Naveen Bhoria, Dheera Balasubramanian Samudrala, Duc Bui, Alan Davis
High-level programming language which utilizes virtual memory

Patent number: 11429390

Abstract: Systems and methods for utilizing virtual memory with a high-level programming language are provided. Multiple address spaces are created in virtual memory, wherein each of the multiple address spaces include data entries, each of which have a value. A machine executable software program is operated which utilizes each of said multiple address spaces. At least a first one of the address spaces is independent from at least a second one of said address spaces, and at least a third one of the address spaces is electronically associated with at least a fourth one of the address spaces.

Type: Grant

Filed: December 4, 2020

Date of Patent: August 30, 2022

Assignee: Rankin Labs, LLC

Inventor: John Rankin
Multi-channel data path circuitry

Patent number: 11422822

Abstract: Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry may include shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle. The arbitration circuitry may select an instruction based on one or more of: stall counts, whether available instructions are being speculatively executed, whether ones of available instructions target a particular portion of the shared execution circuitry, numbers of execution cycles, and SIMD group ages.

Type: Grant

Filed: May 8, 2020

Date of Patent: August 23, 2022

Assignee: Apple Inc.

Inventors: Robert D. Kenney, Jason N. Dale
Execution synchronization and tracking

Patent number: 11416749

Abstract: An integrated circuit includes a processing engine configured to execute instructions that are synchronized using a set of events. The integrated circuit also includes a set of event registers and an age bit register. Each event in the set of events corresponds to a respective event register in the set of event registers. The age bit register includes a set of age bits, where each age bit in the age bit register corresponds to a respective event register in the set of event registers. Each age bit in the age bit register is configured to be set by an external circuit and to be cleared in response to a value change in a corresponding event register in the set of event registers. Executing the instructions by the processing engine changes a value of an event register in the set of event registers.

Type: Grant

Filed: December 11, 2018

Date of Patent: August 16, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Nafea Bshara, Thomas A. Volpe
Systems and methods for implementing chained tile operations

Patent number: 11416260

Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.

Type: Grant

Filed: April 30, 2020

Date of Patent: August 16, 2022

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
Computer-implemented systems and methods for serialisation of arithmetic circuits

Patent number: 11416249

Abstract: Techniques described herein may be utilized to serialise and de-serialise arithmetic circuits that are utilized in the execution of computer programs. The arithmetic circuit may be utilized to build a Quadratic Arithmetic Problem (QAP) that is compiled into a set of cryptographic routines for a client and a prover. The client and prover may utilize a protocol to delegate execution of a program to the prover in a manner that allows the client to efficiently verify the prover correctly executed the program. The arithmetic circuit may comprise a set of symbols (e.g., arithmetic gates and values) that is compressed to produce a serialised circuit comprising a set of codes, wherein the set of symbols is derivable from the set of codes in a lossless manner. Serialisation and de-serialisation techniques may be utilized by nodes of a blockchain network.

Type: Grant

Filed: March 15, 2019

Date of Patent: August 16, 2022

Assignee: nChain Licensing AG

Inventors: Alexandra Covaci, Patrick Motylinski, Simone Madeo, Stephane Vincent, Craig Steven Wright
Instruction execution method and instruction execution device

Patent number: 11416255

Abstract: An instruction execution method suitable for being executed by a processor is provided. The first processor comprises a register alias table (RAT) and a reservation station. The instruction execution method includes: a register alias table receives a first micro-instruction and a second micro-instruction and issues the first micro-instruction and the second micro-instruction to the reservation station; and the reservation station assigns one of a plurality of execution units to execute the first micro-instruction, according to the first specific message of the first micro-instruction; and the reservation station assigns one of the execution units to execute the second micro-instruction, according to the second specific message of the second micro-instruction.

Type: Grant

Filed: March 10, 2020

Date of Patent: August 16, 2022

Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.

Inventors: Penghao Zou, Chen-Chen Song, Kang-Kang Zhang, Jianbin Wang
Systems and methods to skip inconsequential matrix operations

Patent number: 11403097

Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.

Type: Grant

Filed: June 26, 2019

Date of Patent: August 2, 2022

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, William Rash, Subramaniam Maiyuran, Varghese George, Rajesh Sankaran
Tracking speculative data caching

Patent number: 11397584

Abstract: An apparatus and method of operating a data processing apparatus are disclosed. The apparatus comprises data processing circuitry to perform data processing operations in response to a sequence of instructions, wherein the data processing circuitry is capable of performing speculative execution of at least some of the sequence of instructions. A cache structure comprising entries stores temporary copies of data items which are subjected to the data processing operations and speculative execution tracking circuitry monitors correctness of the speculative execution and responsive to indication of incorrect speculative execution to cause entries in the cache structure allocated by the incorrect speculative execution to be evicted from the cache structure.

Type: Grant

Filed: March 21, 2019

Date of Patent: July 26, 2022

Assignee: Arm Limited

Inventors: Ian Michael Caulfield, Peter Richard Greenhalgh, Frederic Claude Marie Piry, Albin Pierrick Tonnerre
Operand pool instruction reservation clustering in a scheduler circuit in a processor

Patent number: 11392410

Abstract: Operand pool instruction reservation clustering in a scheduler circuit in a processor is disclosed. The scheduler circuit includes a plurality of operand pool reservation circuits each having an assigned number of source operands for an instruction stored that must be ready before the instruction is issued. Instructions having the same number of source operands that are not yet ready for its issuance can be stored in an operand pool reservation circuit having the same assigned number of source operands. In this manner, the number of reservation entries and associated comparator circuits in the clustered scheduler circuit is distributed among the plurality of operand pool reservation circuits to avoid or reduce an increase in the number of scheduling path connections and complexity in each reservation circuit. This can avoid or reduce an increase in scheduling latency for a given number of reservation entries in the clustered scheduler circuit.

Type: Grant

Filed: April 8, 2020

Date of Patent: July 19, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Shivam Priyadarshi, Yusuf Cagatay Tekmen, Rodney Wayne Smith, Vignyan Reddy Kothinti Naresh
System-on-chip, data processing method thereof, and neural network device

Patent number: 11392377

Abstract: A System-on-Chip (SoC) includes a first memory configured to store first data, a second memory, and a data processing circuit configured to divide the first data obtained from the first memory into a plurality of pieces of division data, assign a plurality of tags to the plurality of pieces of division data, each of the plurality of tags including a coordinate value for a corresponding piece of division data, obtain second data based on at least one of the first data and the plurality of tags for the plurality of pieces of division data, and provide an address and the second data to the second memory. The address and the second data are obtained based on the plurality of tags.

Type: Grant

Filed: August 31, 2020

Date of Patent: July 19, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Jonghyup Lee

prev 1 2 3 4 5 6 7 8 9 … next