Patents Examined by Michael J Metzger
  • Patent number: 11593116
    Abstract: A system and corresponding method unwind instructions in an out-of-order (OoO) processor. The system comprises a mapper. In response to a restart event causing at least one instruction to be unwound, the mapper restores a present integer mapper state and present floating-point (FP) mapper state, used for mapping instructions, to a former integer mapper state and former FP mapper state, respectively. The mapper stores integer snapshots and FP snapshots of the present integer and FP mapper state, respectively, to expedite restoration to the former integer and FP mapper state, respectively. Access to the FP snapshots is blocked, intermittently, as a function of at least one FP present indicator used by the mapper to record presence of FP registers used as destinations in the instructions. Blocking the access, intermittently, improves power efficiency of the OoO processor.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: February 28, 2023
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: David A. Carlson
  • Patent number: 11593115
    Abstract: The present disclosure discloses an instruction execution device, a processor including the instruction execution device, a system on chip, and a method for executing a data storage instruction in the processor. The method includes: splitting the data storage instruction into a first split instruction and a second split instruction, wherein the first split instruction is associated with an address operand of the data storage instruction, and the second split instruction is associated with a data operand of the data storage instruction; executing the first split instruction to determine a data storage address corresponding to the address operand; executing the second split instruction to acquire data content corresponding to the data operand; and storing the acquired data content to the determined data storage address in a data storage region. The present disclosure further discloses a corresponding instruction execution device, a processor including the execution device and a system on chip.
    Type: Grant
    Filed: March 20, 2020
    Date of Patent: February 28, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Yimin Lu, Xiaoyan Xiang
  • Patent number: 11586465
    Abstract: A device includes a hardware data processing node configured to execute a respective task, and a hardware thread scheduler including a hardware task scheduler. The hardware task scheduler is coupled to the hardware data processing node and has a producer socket, a consumer socket, and a spare socket. The spare socket is configured to provide data control signals also provided by a first socket of the producer and consumer sockets responsive to a memory-mapped register being a first value. The spare socket is configured to provide data control signals also provided by a second socket of the producer and consumer sockets responsive to the memory-mapped register being a second value.
    Type: Grant
    Filed: December 30, 2020
    Date of Patent: February 21, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Niraj Nandan, Mihir Mody
  • Patent number: 11586442
    Abstract: An image processing system for convolving an image includes processing circuitry that is configured to retrieve the image including a set of rows, a merged kernel, multiple skip values and a pixel base address. The merged kernel includes all non-zero coefficients of a set of kernels. Each skip value corresponds to a location offset of each non-zero coefficient with respect to a previous non-zero coefficient. Further, the processing circuitry is configured to execute a multiply-accumulate (MAC) instruction and a load instruction parallelly in one clock cycle for multiple times, on the set of rows and the merged kernel to convolve the image with the merged kernel. Each row on which the MAC and load instructions are executed is associated with a corresponding non-zero coefficient and a corresponding skip value. The load instruction is executed based on the pixel base address, the corresponding skip value, and a width of each row.
    Type: Grant
    Filed: August 6, 2020
    Date of Patent: February 21, 2023
    Assignee: NXP USA, Inc.
    Inventors: Atul Gupta, Amit Goel
  • Patent number: 11579922
    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: February 14, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 11579873
    Abstract: An apparatus is described with support for transactional memory and load/store-exclusive instructions using an exclusive monitor indication to track exclusive access to a given address. In response to a predetermined type of load instruction specifying a load target address, which is executed within a given transaction, any exclusive monitor indication previously set for the load target address is cleared. In response to a load-exclusive instruction, an abort is triggered for a transaction for which the given address is specified as one of its working set of addresses. This helps to maintain mutual exclusion between transactional and non-transactional threads even if there is load speculation in the non-transactional thread.
    Type: Grant
    Filed: May 9, 2019
    Date of Patent: February 14, 2023
    Assignee: Arm Limited
    Inventors: Matthew James Horsnell, Grigorios Magklis, Richard Roy Grisenthwaite, Nathan Yong Seng Chong
  • Patent number: 11567770
    Abstract: A human-machine-interface system comprising: register-file-memory, configured to store input-data; a first-processing-element-slice, a second-processing-element-slice, and a controller. Each of the processing-slices comprise: a register configured to store register-data; and a processing-element configured to apply an arithmetic and logic operation on the register-data in order to provide convolution-output-data. The controller is configured to: load input-data from the register-file-memory into the first-register as the first-register-data; and load: (i) input-data from the register-file-memory, or (ii) the first-register-data from the first-register, into the second-register as the second-register-data.
    Type: Grant
    Filed: April 3, 2018
    Date of Patent: January 31, 2023
    Assignee: NXP B.V.
    Inventors: Jose de Jesus Pineda de Gyvez, Hamed Fatemi, Gonzalo Moro Pérez, Hendrik Corporaal
  • Patent number: 11561830
    Abstract: A system for allocation of resources and processing jobs within a distributed system includes a processor and a memory coupled to the processor. The memory includes at least one process and at least one resource allocator. The process is adapted for processing jobs within a distributed system which receives jobs to be processed. The resource allocator is communicably coupled with at least one process, and is adapted to generate one or more sub-processes within a limit of one or more resources allocated to the process for processing jobs.
    Type: Grant
    Filed: October 21, 2019
    Date of Patent: January 24, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Naganarasimha Ramesh Garla, Varun Saxena, Guilin Sun
  • Patent number: 11562247
    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
    Type: Grant
    Filed: January 24, 2019
    Date of Patent: January 24, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Daniel Lo, Amar Phanishayee, Eric S. Chung, Yiren Zhao
  • Patent number: 11561767
    Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: January 24, 2023
    Assignee: Arm Limited
    Inventors: Dibakar Gope, Jesse Garrett Beu, Paul Nicholas Whatmough, Matthew Mattina
  • Patent number: 11556346
    Abstract: Methods and systems for allowing software components that operate at a specific exception level (e.g., EL-3 to EL-1, etc.) to repeatedly or continuously observe or evaluate the integrity of software components operating at a lower exception level (e.g., EL-2 to EL-0) to ensure that the software components have not been corrupted or compromised (e.g., subjected to malware, cyberattacks, etc.) include a computing device that identifies, by a component operating at a higher exception level (“HEL component”), at least one of a current vector base address (VBA), an exception raising instruction (ERI) address, or a control and system register value associated with a component operating at a lower exception level (“LEL component”). The computing device may perform a responsive action in response to determining that the current VBA, the ERT address, or control and system register value do not match the corresponding reference data.
    Type: Grant
    Filed: June 10, 2020
    Date of Patent: January 17, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Naresh Kumar Sharma, Saurabh Gorecha, Pravin Kumar
  • Patent number: 11550582
    Abstract: A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: January 10, 2023
    Assignee: Intel Corporation
    Inventors: Kirk S. Yap, Gilbert M. Wolrich, James D. Guilford, Vinodh Gopal, Erdinc Ozturk, Sean M. Gulley, Wajdi K. Feghali, Martin G. Dixon
  • Patent number: 11544542
    Abstract: A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.
    Type: Grant
    Filed: November 28, 2019
    Date of Patent: January 3, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.
    Inventors: Zai Wang, Shengyuan Zhou, Zidong Du, Tianshi Chen
  • Patent number: 11544067
    Abstract: According to various embodiments, methods and systems are provided to accelerate artificial intelligence (AI) model training with advanced interconnect communication technologies and systematic zero-value compression over a distributed training system. According to an exemplary method, during each iteration of a Scatter-Reduce process performed on a cluster of processors arranged in a logical ring to train a neural network model, a processor receives a compressed data block from a prior processor in the logical ring, performs an operation on the received compressed data block and a compressed data block generated on the processor to obtain a calculated data block, and sends the calculated data block to a following processor in the logical ring. A compressed data block calculated from corresponding data blocks from the processors can be identified on each processor and distributed to each other processor and decompressed therein for use in the AI model training.
    Type: Grant
    Filed: October 12, 2019
    Date of Patent: January 3, 2023
    Assignees: BAIDU USA LLC, BAIDU.COM TIMES TECHNOLOGY (BEIJING) CO., LTD., KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITED
    Inventors: Zhibiao Zhao, Jian Ouyang, Hefei Zhu, Qingshu Chen, Wei Qi
  • Patent number: 11537858
    Abstract: A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.
    Type: Grant
    Filed: November 28, 2019
    Date of Patent: December 27, 2022
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.
    Inventors: Tianshi Chen, Xuda Zhou, Shaoli Liu, Zidong Du
  • Patent number: 11531551
    Abstract: To preferentially execute an instruction with higher priority in a case of the CNC being unable to respond due to being an unresponsive timing, load on the bus or the like. A PLC device includes: a special instruction control unit that sets a priority degree indicating a degree of priority for executing predetermined processing to a special instruction for performing the predetermined processing in a control device that controls an industrial machine, and transmits the special instruction in which the priority degree is set to the control device; an instruction storage determining unit that determines whether or not to queue the special instruction according to an operation state of the control device; and an instruction storage unit that sequentially stores the special instruction received, on the basis of a determination result of the instruction storage determining unit.
    Type: Grant
    Filed: June 11, 2020
    Date of Patent: December 20, 2022
    Assignee: FANUC CORPORATION
    Inventor: Nao Onose
  • Patent number: 11531549
    Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: December 20, 2022
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: David A. Carlson
  • Patent number: 11526356
    Abstract: An apparatus and method is provided, the apparatus comprising a processor pipeline to execute instructions, a cache structure to store information for reference by the processor pipeline when executing said instructions; and prefetch circuitry to issue prefetch requests to the cache structure to cause the cache structure to prefetch information into the cache structure in anticipation of a demand request for that information being issued to the cache structure by the processor pipeline. The processor pipeline is arranged to issue a trigger to the prefetch circuitry on detection of a given event that will result in a reduced level of demand requests being issued by the processor pipeline, and the prefetch circuitry is configured to control issuing of prefetch requests in dependence on reception of the trigger.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: December 13, 2022
    Assignee: Arm Limited
    Inventors: Lingzhe Cai, Krishnendra Nathella, Jaekyu Lee, Dam Sunwoo
  • Patent number: 11513838
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: November 29, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11513839
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: November 29, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer