Patents Examined by Michael J Metzger

System and method for instruction unwinding in an out-of-order processor

Patent number: 11593116

Abstract: A system and corresponding method unwind instructions in an out-of-order (OoO) processor. The system comprises a mapper. In response to a restart event causing at least one instruction to be unwound, the mapper restores a present integer mapper state and present floating-point (FP) mapper state, used for mapping instructions, to a former integer mapper state and former FP mapper state, respectively. The mapper stores integer snapshots and FP snapshots of the present integer and FP mapper state, respectively, to expedite restoration to the former integer and FP mapper state, respectively. Access to the FP snapshots is blocked, intermittently, as a function of at least one FP present indicator used by the mapper to record presence of FP registers used as destinations in the instructions. Blocking the access, intermittently, improves power efficiency of the OoO processor.

Type: Grant

Filed: April 30, 2021

Date of Patent: February 28, 2023

Assignee: Marvell Asia Pte, Ltd.

Inventor: David A. Carlson
Processor, device, and method for executing instructions

Patent number: 11593115

Abstract: The present disclosure discloses an instruction execution device, a processor including the instruction execution device, a system on chip, and a method for executing a data storage instruction in the processor. The method includes: splitting the data storage instruction into a first split instruction and a second split instruction, wherein the first split instruction is associated with an address operand of the data storage instruction, and the second split instruction is associated with a data operand of the data storage instruction; executing the first split instruction to determine a data storage address corresponding to the address operand; executing the second split instruction to acquire data content corresponding to the data operand; and storing the acquired data content to the determined data storage address in a data storage region. The present disclosure further discloses a corresponding instruction execution device, a processor including the execution device and a system on chip.

Type: Grant

Filed: March 20, 2020

Date of Patent: February 28, 2023

Assignee: Alibaba Group Holding Limited

Inventors: Yimin Lu, Xiaoyan Xiang
Scalable hardware thread scheduler

Patent number: 11586465

Abstract: A device includes a hardware data processing node configured to execute a respective task, and a hardware thread scheduler including a hardware task scheduler. The hardware task scheduler is coupled to the hardware data processing node and has a producer socket, a consumer socket, and a spare socket. The spare socket is configured to provide data control signals also provided by a first socket of the producer and consumer sockets responsive to a memory-mapped register being a first value. The spare socket is configured to provide data control signals also provided by a second socket of the producer and consumer sockets responsive to the memory-mapped register being a second value.

Type: Grant

Filed: December 30, 2020

Date of Patent: February 21, 2023

Assignee: Texas Instruments Incorporated

Inventors: Niraj Nandan, Mihir Mody
System and method for convolving image with sparse kernels

Patent number: 11586442

Abstract: An image processing system for convolving an image includes processing circuitry that is configured to retrieve the image including a set of rows, a merged kernel, multiple skip values and a pixel base address. The merged kernel includes all non-zero coefficients of a set of kernels. Each skip value corresponds to a location offset of each non-zero coefficient with respect to a previous non-zero coefficient. Further, the processing circuitry is configured to execute a multiply-accumulate (MAC) instruction and a load instruction parallelly in one clock cycle for multiple times, on the set of rows and the merged kernel to convolve the image with the merged kernel. Each row on which the MAC and load instructions are executed is associated with a corresponding non-zero coefficient and a corresponding skip value. The load instruction is executed based on the pixel base address, the corresponding skip value, and a width of each row.

Type: Grant

Filed: August 6, 2020

Date of Patent: February 21, 2023

Assignee: NXP USA, Inc.

Inventors: Atul Gupta, Amit Goel
Dynamic graphical processing unit register allocation

Patent number: 11579922

Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

Type: Grant

Filed: December 29, 2020

Date of Patent: February 14, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
Handling load-exclusive instructions in apparatus having support for transactional memory

Patent number: 11579873

Abstract: An apparatus is described with support for transactional memory and load/store-exclusive instructions using an exclusive monitor indication to track exclusive access to a given address. In response to a predetermined type of load instruction specifying a load target address, which is executed within a given transaction, any exclusive monitor indication previously set for the load target address is cleared. In response to a load-exclusive instruction, an abort is triggered for a transaction for which the given address is specified as one of its working set of addresses. This helps to maintain mutual exclusion between transactional and non-transactional threads even if there is load speculation in the non-transactional thread.

Type: Grant

Filed: May 9, 2019

Date of Patent: February 14, 2023

Assignee: Arm Limited

Inventors: Matthew James Horsnell, Grigorios Magklis, Richard Roy Grisenthwaite, Nathan Yong Seng Chong
Human-machine-interface system comprising a convolutional neural network hardware accelerator

Patent number: 11567770

Abstract: A human-machine-interface system comprising: register-file-memory, configured to store input-data; a first-processing-element-slice, a second-processing-element-slice, and a controller. Each of the processing-slices comprise: a register configured to store register-data; and a processing-element configured to apply an arithmetic and logic operation on the register-data in order to provide convolution-output-data. The controller is configured to: load input-data from the register-file-memory into the first-register as the first-register-data; and load: (i) input-data from the register-file-memory, or (ii) the first-register-data from the first-register, into the second-register as the second-register-data.

Type: Grant

Filed: April 3, 2018

Date of Patent: January 31, 2023

Assignee: NXP B.V.

Inventors: Jose de Jesus Pineda de Gyvez, Hamed Fatemi, Gonzalo Moro Pérez, Hendrik Corporaal
System and method for low latency node local scheduling in distributed resource management

Patent number: 11561830

Abstract: A system for allocation of resources and processing jobs within a distributed system includes a processor and a memory coupled to the processor. The memory includes at least one process and at least one resource allocator. The process is adapted for processing jobs within a distributed system which receives jobs to be processed. The resource allocator is communicably coupled with at least one process, and is adapted to generate one or more sub-processes within a limit of one or more resources allocated to the process for processing jobs.

Type: Grant

Filed: October 21, 2019

Date of Patent: January 24, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Naganarasimha Ramesh Garla, Varun Saxena, Guilin Sun
Neural network activation compression with non-uniform mantissas

Patent number: 11562247

Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Type: Grant

Filed: January 24, 2019

Date of Patent: January 24, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
Mixed-precision computation unit

Patent number: 11561767

Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.

Type: Grant

Filed: March 31, 2020

Date of Patent: January 24, 2023

Assignee: Arm Limited

Inventors: Dibakar Gope, Jesse Garrett Beu, Paul Nicholas Whatmough, Matthew Mattina
Security enhancement in hierarchical protection domains

Patent number: 11556346

Abstract: Methods and systems for allowing software components that operate at a specific exception level (e.g., EL-3 to EL-1, etc.) to repeatedly or continuously observe or evaluate the integrity of software components operating at a lower exception level (e.g., EL-2 to EL-0) to ensure that the software components have not been corrupted or compromised (e.g., subjected to malware, cyberattacks, etc.) include a computing device that identifies, by a component operating at a higher exception level (“HEL component”), at least one of a current vector base address (VBA), an exception raising instruction (ERI) address, or a control and system register value associated with a component operating at a lower exception level (“LEL component”). The computing device may perform a responsive action in response to determining that the current VBA, the ERT address, or control and system register value do not match the corresponding reference data.

Type: Grant

Filed: June 10, 2020

Date of Patent: January 17, 2023

Assignee: QUALCOMM Incorporated

Inventors: Naresh Kumar Sharma, Saurabh Gorecha, Pravin Kumar
Method and apparatus to process SHA-2 secure hashing algorithm

Patent number: 11550582

Abstract: A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.

Type: Grant

Filed: June 17, 2020

Date of Patent: January 10, 2023

Assignee: Intel Corporation

Inventors: Kirk S. Yap, Gilbert M. Wolrich, James D. Guilford, Vinodh Gopal, Erdinc Ozturk, Sean M. Gulley, Wajdi K. Feghali, Martin G. Dixon
Computing device and method

Patent number: 11544542

Abstract: A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.

Type: Grant

Filed: November 28, 2019

Date of Patent: January 3, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.

Inventors: Zai Wang, Shengyuan Zhou, Zidong Du, Tianshi Chen
Accelerating AI training by an all-reduce process with compression over a distributed system

Patent number: 11544067

Abstract: According to various embodiments, methods and systems are provided to accelerate artificial intelligence (AI) model training with advanced interconnect communication technologies and systematic zero-value compression over a distributed training system. According to an exemplary method, during each iteration of a Scatter-Reduce process performed on a cluster of processors arranged in a logical ring to train a neural network model, a processor receives a compressed data block from a prior processor in the logical ring, performs an operation on the received compressed data block and a compressed data block generated on the processor to obtain a calculated data block, and sends the calculated data block to a following processor in the logical ring. A compressed data block calculated from corresponding data blocks from the processors can be identified on each processor and distributed to each other processor and decompressed therein for use in the AI model training.

Type: Grant

Filed: October 12, 2019

Date of Patent: January 3, 2023

Assignees: BAIDU USA LLC, BAIDU.COM TIMES TECHNOLOGY (BEIJING) CO., LTD., KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITED

Inventors: Zhibiao Zhao, Jian Ouyang, Hefei Zhu, Qingshu Chen, Wei Qi
Computing device and method

Patent number: 11537858

Abstract: A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.

Type: Grant

Filed: November 28, 2019

Date of Patent: December 27, 2022

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.

Inventors: Tianshi Chen, Xuda Zhou, Shaoli Liu, Zidong Du
PLC device that transmits an instruction to a control device

Patent number: 11531551

Abstract: To preferentially execute an instruction with higher priority in a case of the CNC being unable to respond due to being an unresponsive timing, load on the bus or the like. A PLC device includes: a special instruction control unit that sets a priority degree indicating a degree of priority for executing predetermined processing to a special instruction for performing the predetermined processing in a control device that controls an industrial machine, and transmits the special instruction in which the priority degree is set to the control device; an instruction storage determining unit that determines whether or not to queue the special instruction according to an operation state of the control device; and an instruction storage unit that sequentially stores the special instruction received, on the basis of a determination result of the instruction storage determining unit.

Type: Grant

Filed: June 11, 2020

Date of Patent: December 20, 2022

Assignee: FANUC CORPORATION

Inventor: Nao Onose
System and method for instruction mapping in an out-of-order processor

Patent number: 11531549

Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.

Type: Grant

Filed: March 31, 2021

Date of Patent: December 20, 2022

Assignee: Marvell Asia Pte, Ltd.

Inventor: David A. Carlson
Prefetch mechanism for a cache structure

Patent number: 11526356

Abstract: An apparatus and method is provided, the apparatus comprising a processor pipeline to execute instructions, a cache structure to store information for reference by the processor pipeline when executing said instructions; and prefetch circuitry to issue prefetch requests to the cache structure to cause the cache structure to prefetch information into the cache structure in anticipation of a demand request for that information being issued to the cache structure by the processor pipeline. The processor pipeline is arranged to issue a trigger to the prefetch circuitry on detection of a given event that will result in a reduced level of demand requests being issued by the processor pipeline, and the prefetch circuitry is configured to control issuing of prefetch requests in dependence on reception of the trigger.

Type: Grant

Filed: May 29, 2020

Date of Patent: December 13, 2022

Assignee: Arm Limited

Inventors: Lingzhe Cai, Krishnendra Nathella, Jaekyu Lee, Dam Sunwoo
Thread state monitoring in a system having a multi-threaded, self-scheduling processor

Patent number: 11513838

Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.

Type: Grant

Filed: April 30, 2019

Date of Patent: November 29, 2022

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Memory request size management in a multi-threaded, self-scheduling processor

Patent number: 11513839

Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.

Type: Grant

Filed: April 30, 2019

Date of Patent: November 29, 2022

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer

prev 1 2 3 4 5 6 7 8 … next