Patents Examined by Michael J Metzger
-
Patent number: 11513837Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.Type: GrantFiled: April 30, 2019Date of Patent: November 29, 2022Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 11513840Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.Type: GrantFiled: April 30, 2019Date of Patent: November 29, 2022Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 11507370Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.Type: GrantFiled: December 16, 2019Date of Patent: November 22, 2022Assignee: Cambricon (Xi'an) Semiconductor Co., Ltd.Inventors: Yao Zhang, Bingrui Wang
-
Patent number: 11507372Abstract: An apparatus and method are provided for processing instructions fetched from memory. Decode circuitry is used to decode the fetched instructions in order to produce decoded instructions, and downstream circuitry then processes the decoded instructions in order to perform the operations specified by those decoded instructions. Dispatch circuitry is arranged to dispatch to the downstream circuitry up to N decoded instructions per dispatch cycle, and is arranged to determine, based on a given candidate sequence of decoded instructions being considered for dispatch in a given dispatch cycle, whether at least one resource conflict within the downstream circuitry would occur in the event that the given candidate sequence of decoded instructions is dispatched in the given dispatch cycle.Type: GrantFiled: October 7, 2020Date of Patent: November 22, 2022Assignee: Arm LimitedInventors: Michael Brian Schinzler, Yasuo Ishii, Muhammad Umar Farooq, Jason Lee Setter
-
Patent number: 11500632Abstract: In a processor device according to the present invention, a memory access unit reads data to be processed from an external memory and writes the data to a first register group that a plurality of processors does not access among a plurality of register groups. A control unit sequentially makes each of the plurality of processors implement a same instruction, in parallel with changing an address of a register group that stores the data to be processed. A scheduler, based on specified scenario information, specifies an instruction to be implemented and a register group to be accessed for the plurality of processors, and specifies a register group to be written to among the plurality of register groups and data to be processed that is to be written for the memory access unit.Type: GrantFiled: April 23, 2019Date of Patent: November 15, 2022Assignee: ArchiTek CorporationInventor: Shuichi Takada
-
Patent number: 11500677Abstract: A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.Type: GrantFiled: November 3, 2020Date of Patent: November 15, 2022Assignee: Imagination Technologies LimitedInventors: Ollie Mower, Yoong-Chert Foo
-
Patent number: 11500641Abstract: Methods, devices and media for efficient data dependency management for in-order issue processors are described. In various embodiments described herein, methods, devices and media are disclosed that provide techniques for managing RAW data dependencies between instructions in a constrained hardware environment. The described techniques include initial wait station allocation of write instructions, followed by wait station allocation conflict resolution methods that use a greedy algorithm to optimize a cost function based on the estimated latency of a single instruction. Efficient compilation and reduced execution time may be achieved in some embodiments. Methods and devices for compiling source code are described, as well as devices for executing the compiled machine code and media for storing compiled machine code.Type: GrantFiled: October 7, 2020Date of Patent: November 15, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Hazem A. Abdelhafez, Ning Xie, Ahmed Mohammed ElShafiey Mohammed Eltantawy
-
Patent number: 11481218Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands including a first operand and a second operand, the first operand including vector of one-hot coded weights and the second operand including a vector of input data; and a general-purpose graphics compute unit including a first logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform multiple operations on the first set of operands and the second set of operands.Type: GrantFiled: August 2, 2017Date of Patent: October 25, 2022Assignee: Intel CorporationInventors: Jianguo Li, Yurong Chen
-
Patent number: 11475282Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of compute elements and routers performs flow-based computations on wavelets of data. Some instructions are performed in iterations, such as one iteration per element of a fabric vector or FIFO. When sources for an iteration of an instruction are unavailable, and/or there is insufficient space to store results of the iteration, indicators associated with operands of the instruction are checked to determine whether other work can be performed. In some scenarios, other work cannot be performed and processing stalls. Alternatively, information about the instruction is saved, the other work is performed, and sometime after the sources become available and/or sufficient space to store the results becomes available, the iteration is performed using the saved information.Type: GrantFiled: April 17, 2018Date of Patent: October 18, 2022Assignee: Cerebras Systems Inc.Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Gary R. Lauterbach, Srikanth Arekapudi
-
Patent number: 11455169Abstract: A digital data processor includes an instruction memory storing instructions specifying data processing operations and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and an instruction decoder to perform an operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the operation. The operational unit is configured to perform a table recall in response to a look up table read instruction by recalling data elements from a specified location and adjacent location to the specified location, in a specified number of at least one table and storing the recalled data elements in successive slots in a destination register. Recalled data elements include at least one interpolated data element in the adjacent location.Type: GrantFiled: September 13, 2019Date of Patent: September 27, 2022Assignee: Texas Instruments IncorporatedInventors: Naveen Bhoria, Dheera Balasubramanian Samudrala, Duc Bui, Alan Davis
-
Patent number: 11429390Abstract: Systems and methods for utilizing virtual memory with a high-level programming language are provided. Multiple address spaces are created in virtual memory, wherein each of the multiple address spaces include data entries, each of which have a value. A machine executable software program is operated which utilizes each of said multiple address spaces. At least a first one of the address spaces is independent from at least a second one of said address spaces, and at least a third one of the address spaces is electronically associated with at least a fourth one of the address spaces.Type: GrantFiled: December 4, 2020Date of Patent: August 30, 2022Assignee: Rankin Labs, LLCInventor: John Rankin
-
Patent number: 11422822Abstract: Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry may include shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle. The arbitration circuitry may select an instruction based on one or more of: stall counts, whether available instructions are being speculatively executed, whether ones of available instructions target a particular portion of the shared execution circuitry, numbers of execution cycles, and SIMD group ages.Type: GrantFiled: May 8, 2020Date of Patent: August 23, 2022Assignee: Apple Inc.Inventors: Robert D. Kenney, Jason N. Dale
-
Patent number: 11416749Abstract: An integrated circuit includes a processing engine configured to execute instructions that are synchronized using a set of events. The integrated circuit also includes a set of event registers and an age bit register. Each event in the set of events corresponds to a respective event register in the set of event registers. The age bit register includes a set of age bits, where each age bit in the age bit register corresponds to a respective event register in the set of event registers. Each age bit in the age bit register is configured to be set by an external circuit and to be cleared in response to a value change in a corresponding event register in the set of event registers. Executing the instructions by the processing engine changes a value of an event register in the set of event registers.Type: GrantFiled: December 11, 2018Date of Patent: August 16, 2022Assignee: Amazon Technologies, Inc.Inventors: Nafea Bshara, Thomas A. Volpe
-
Patent number: 11416260Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.Type: GrantFiled: April 30, 2020Date of Patent: August 16, 2022Assignee: Intel CorporationInventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 11416249Abstract: Techniques described herein may be utilized to serialise and de-serialise arithmetic circuits that are utilized in the execution of computer programs. The arithmetic circuit may be utilized to build a Quadratic Arithmetic Problem (QAP) that is compiled into a set of cryptographic routines for a client and a prover. The client and prover may utilize a protocol to delegate execution of a program to the prover in a manner that allows the client to efficiently verify the prover correctly executed the program. The arithmetic circuit may comprise a set of symbols (e.g., arithmetic gates and values) that is compressed to produce a serialised circuit comprising a set of codes, wherein the set of symbols is derivable from the set of codes in a lossless manner. Serialisation and de-serialisation techniques may be utilized by nodes of a blockchain network.Type: GrantFiled: March 15, 2019Date of Patent: August 16, 2022Assignee: nChain Licensing AGInventors: Alexandra Covaci, Patrick Motylinski, Simone Madeo, Stephane Vincent, Craig Steven Wright
-
Patent number: 11416255Abstract: An instruction execution method suitable for being executed by a processor is provided. The first processor comprises a register alias table (RAT) and a reservation station. The instruction execution method includes: a register alias table receives a first micro-instruction and a second micro-instruction and issues the first micro-instruction and the second micro-instruction to the reservation station; and the reservation station assigns one of a plurality of execution units to execute the first micro-instruction, according to the first specific message of the first micro-instruction; and the reservation station assigns one of the execution units to execute the second micro-instruction, according to the second specific message of the second micro-instruction.Type: GrantFiled: March 10, 2020Date of Patent: August 16, 2022Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.Inventors: Penghao Zou, Chen-Chen Song, Kang-Kang Zhang, Jianbin Wang
-
Patent number: 11403097Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.Type: GrantFiled: June 26, 2019Date of Patent: August 2, 2022Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, William Rash, Subramaniam Maiyuran, Varghese George, Rajesh Sankaran
-
Patent number: 11397584Abstract: An apparatus and method of operating a data processing apparatus are disclosed. The apparatus comprises data processing circuitry to perform data processing operations in response to a sequence of instructions, wherein the data processing circuitry is capable of performing speculative execution of at least some of the sequence of instructions. A cache structure comprising entries stores temporary copies of data items which are subjected to the data processing operations and speculative execution tracking circuitry monitors correctness of the speculative execution and responsive to indication of incorrect speculative execution to cause entries in the cache structure allocated by the incorrect speculative execution to be evicted from the cache structure.Type: GrantFiled: March 21, 2019Date of Patent: July 26, 2022Assignee: Arm LimitedInventors: Ian Michael Caulfield, Peter Richard Greenhalgh, Frederic Claude Marie Piry, Albin Pierrick Tonnerre
-
Patent number: 11392410Abstract: Operand pool instruction reservation clustering in a scheduler circuit in a processor is disclosed. The scheduler circuit includes a plurality of operand pool reservation circuits each having an assigned number of source operands for an instruction stored that must be ready before the instruction is issued. Instructions having the same number of source operands that are not yet ready for its issuance can be stored in an operand pool reservation circuit having the same assigned number of source operands. In this manner, the number of reservation entries and associated comparator circuits in the clustered scheduler circuit is distributed among the plurality of operand pool reservation circuits to avoid or reduce an increase in the number of scheduling path connections and complexity in each reservation circuit. This can avoid or reduce an increase in scheduling latency for a given number of reservation entries in the clustered scheduler circuit.Type: GrantFiled: April 8, 2020Date of Patent: July 19, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Shivam Priyadarshi, Yusuf Cagatay Tekmen, Rodney Wayne Smith, Vignyan Reddy Kothinti Naresh
-
Patent number: 11392377Abstract: A System-on-Chip (SoC) includes a first memory configured to store first data, a second memory, and a data processing circuit configured to divide the first data obtained from the first memory into a plurality of pieces of division data, assign a plurality of tags to the plurality of pieces of division data, each of the plurality of tags including a coordinate value for a corresponding piece of division data, obtain second data based on at least one of the first data and the plurality of tags for the plurality of pieces of division data, and provide an address and the second data to the second memory. The address and the second data are obtained based on the plurality of tags.Type: GrantFiled: August 31, 2020Date of Patent: July 19, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Jonghyup Lee