Patents Examined by Michael J Metzger
  • Patent number: 11734013
    Abstract: An exception summary is provided for an invalid value detected during instruction execution. An indication that a value determined to be invalid was included in input data to a computation of one or more computations or in output data resulting from the one or more computations is obtained. The value is determined to be invalid due to one exception of a plurality of exceptions. Based on obtaining the indication that the value is determined to be invalid, a summary indicator is set. The summary indicator represents the plurality of exceptions collectively.
    Type: Grant
    Filed: June 17, 2021
    Date of Patent: August 22, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Laith M. AlBarakat, Jonathan D. Bradbury, Timothy Slegel, Cedric Lichtenau, Joachim von Buttlar
  • Patent number: 11726844
    Abstract: The present disclosure provides a processing device for performing generative adversarial network and a method for machine creation applying the processing device. The processing device includes a memory configured to receive input data including a random noise and reference data, and store a discriminator neural network parameter and a generator neural network parameter, and the processing device further includes a computation device configured to transmit the random noise input data into a generator neural network and perform operation to obtain a noise generation result, and input both of the noise generation result and the reference data into a discriminator neural network and perform operation to obtain a discrimination result, and further configured to update the discriminator neural network parameter and the generator neural network parameter according to the discrimination result.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: August 15, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Tianshi Chen, Shuai Hu, Yifan Hao, Yufeng Gao
  • Patent number: 11727268
    Abstract: A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.
    Type: Grant
    Filed: November 28, 2019
    Date of Patent: August 15, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.
    Inventors: Zai Wang, Shengyuan Zhou, Shuai Hu, Tianshi Chen
  • Patent number: 11720405
    Abstract: A processor-implemented accelerator method includes: reading, from a memory, an instruction to be executed in an accelerator; reading, from the memory, input data based on the instruction; and performing, on the input data and a parameter value included in the instruction, an inference task corresponding to the instruction.
    Type: Grant
    Filed: January 11, 2021
    Date of Patent: August 8, 2023
    Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation
    Inventors: Wookeun Jung, Jaejin Lee, Seung Wook Lee
  • Patent number: 11720353
    Abstract: The present disclosure provides a processing device and method. The device includes: an input/output module, a controller module, a computing module, and a storage module. The input/output module is configured to store and transmit input and output data; the controller module is configured to decode a computation instruction into a control signal to control other modules to perform operation; the computing module is configured to perform four arithmetic operation, logical operation, shift operation, and complement operation on data; and the storage module is configured to temporarily store instructions and data. The present disclosure can execute a composite scalar instruction accurately and efficiently.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: August 8, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Shaoli Liu, Yuzhe Luo, Qi Guo, Tianshi Chen
  • Patent number: 11720783
    Abstract: Aspects of a neural network operation device are described herein. The aspects may include a matrix element storage module configured to receive a first matrix that includes one or more first values, each of the first values being represented in a sequence that includes one or more bits. The matrix element storage module may be further configured to respectively store the one or more bits in one or more storage spaces in accordance with positions of the bits in the sequence. The aspects may further include a numeric operation module configured to calculate an intermediate result for each storage space based on one or more second values in a second matrix and an accumulation module configured to sum the intermediate results to generate an output value.
    Type: Grant
    Filed: October 21, 2019
    Date of Patent: August 8, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.
    Inventors: Tianshi Chen, Yimin Zhuang, Qi Guo, Shaoli Liu, Yunji Chen
  • Patent number: 11709681
    Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.
    Type: Grant
    Filed: December 11, 2017
    Date of Patent: July 25, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jay Fleischman, Michael Estlick, Michael Christopher Sedmak, Erik Swanson, Sneha V. Desai
  • Patent number: 11675906
    Abstract: Infection by viruses and rootkits from data memory devices, data messages and data operations are rendered impossible by construction for the Simultaneous Multi-Processor (SiMulPro) cores, core modules, Programmable Execution Modules (PEM), PEM Arrays, STAR messaging protocol implementations, integrated circuits (referred to as chips herein), and systems composed of these components. Greatly improved energy efficiency is disclosed. A system implementation of an Application Specific Integrated Circuit (ASIC) communicating with a DRAM controller interacting with a DRAM array is presented with this resistance to virus and rootkit infection, and simultaneously capable of 1 Teraflop (Tflop) FP16, 1 TFlop FP32 and 1 Tflop FP64 performance while accessing 1 Tbyte of DRAM with a power budget comparable to today's desktop or notebook computers accessing 8 Gbytes of DRAM.
    Type: Grant
    Filed: November 12, 2019
    Date of Patent: June 13, 2023
    Assignee: QSIGMA, INC.
    Inventor: Earle Jennings
  • Patent number: 11669418
    Abstract: Apparatus adapted for exascale computers are disclosed. The apparatus includes, but is not limited to at least one of: a system, data processor chip (DPC), Landing module (LM), chips including LM, anticipator chips, simultaneous multi-processor (SMP) cores, SMP channel (SMPC) cores, channels, bundles of channels, printed circuit boards (PCB) including bundles, floating point adders, accumulation managers, QUAD Link Anticipating Memory (QUADLAM), communication networks extended by coupling links of QUADLAM, log2 calculators, exp2 calculators, logALU, Non-Linear Accelerator (NLA), and stairways. Methods of algorithm and program development, verification and debugging are also disclosed. Collectively, embodiments of these elements disclose a class of supercomputers that obsolete Amdahl's Law, providing cabinets of petaflop performance and systems that may meet or exceed an exaflop of performance for Block LU Decomposition (Linpack).
    Type: Grant
    Filed: November 5, 2019
    Date of Patent: June 6, 2023
    Assignee: QSIGMA, INC.
    Inventors: Earle Jennings, George Landers
  • Patent number: 11663011
    Abstract: Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.
    Type: Grant
    Filed: July 7, 2020
    Date of Patent: May 30, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Peter Sassone, Christopher Koob, Suresh Kumar Venkumahanti
  • Patent number: 11663107
    Abstract: A computer implemented method, performed in a data processing system comprising a performance monitoring unit. The method comprises receiving a set of computer-readable instructions to be executed by the data processing system to implement at least a portion of a neural network, wherein one or more of the instructions is labeled with one or more performance monitoring labels based upon one or more features of the neural network. The method further comprises configuring the performance monitoring unit to count one or more events occurring in one or more components of the data processing system based on the one or more performance monitoring labels.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: May 30, 2023
    Assignee: ARM LIMITED
    Inventors: Elliot Maurice Simon Rosemarine, Rachel Jean Trimble
  • Patent number: 11657261
    Abstract: A neural processing device is provided. The neural processing device comprises a plurality of neural processors, a shared memory shared by the plurality of neural processors, a plurality of semaphore memories, and global interconnection. The plurality of neural processors generates a plurality of L3 sync targets, respectively. Each semaphore memory is associated with a respective one of the plurality of neural processors, and the plurality of semaphore memories receive and store the plurality of L3 sync targets, respectively. Synchronization of the plurality of neural processors is performed according to the plurality of L3 sync targets. The global interconnection connects the plurality of neural processors with the shared memory, and comprises an L3 sync channel through which an L3 synchronization signal corresponding to at least one L3 sync target is transmitted.
    Type: Grant
    Filed: April 29, 2022
    Date of Patent: May 23, 2023
    Assignee: Rebellions Inc.
    Inventors: Jinwook Oh, Jinseok Kim, Kyeongryeol Bong, Wongyu Shin, Chang-Hyo Yu
  • Patent number: 11656876
    Abstract: Techniques are disclosed relating to an apparatus, including a data storage circuit having a plurality of entries, and a load-store pipeline configured to allocate an entry in the data storage circuit in response to a determination that a first instruction includes an access to an external memory circuit. The apparatus further includes an execution pipeline configured to make a determination, while performing a second instruction and using the entry in the data storage circuit, that the second instruction uses a result of the first instruction, and cease performance of the second instruction in response to the determination.
    Type: Grant
    Filed: February 10, 2021
    Date of Patent: May 23, 2023
    Assignee: Cadence Design Systems, Inc.
    Inventors: Robert T. Golla, Deepak Panwar
  • Patent number: 11630997
    Abstract: A processor-implemented data processing method includes encoding a plurality of weights of a filter of a neural network using an inverted two's complement fixed-point format; generating weight data based on values of the encoded weights corresponding to same filter positions of a plurality of filters; and performing an operation on the weight data and input activation data using a bit-serial scheme to control when to perform an activation function with respect to the weight data and input activation data.
    Type: Grant
    Filed: January 23, 2019
    Date of Patent: April 18, 2023
    Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation
    Inventors: Seungwon Lee, Dongwoo Lee, Kiyoung Choi, Sungbum Kang
  • Patent number: 11630991
    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.
    Type: Grant
    Filed: February 4, 2020
    Date of Patent: April 18, 2023
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Kenneth W. Waters, Youchang Kim
  • Patent number: 11630667
    Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: April 18, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jiasheng Chen, Bin He, Jian Huang, Michael Mantor
  • Patent number: 11625250
    Abstract: The disclosed systems, structures, and methods are directed to parallel processing of tasks in a multiple thread computing system. Execution of an instruction sequence of a thread allocated to a first task proceeds until an exit point of the instruction sequence is reached. The execution of the instruction sequence of the thread for the first task is terminated at a convergence point of the instruction sequence. The thread is selectively reallocated to process a second task.
    Type: Grant
    Filed: January 29, 2021
    Date of Patent: April 11, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Ahmed Mohammed ElShafiey Mohammed Eltantawy, Yan Luo, Tyler Bryce Nowicki
  • Patent number: 11625249
    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: April 11, 2023
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Jagadish B. Kotra, John Kalamatianos
  • Patent number: 11614936
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: March 28, 2023
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 11609786
    Abstract: The embodiments provide a register file device which increases energy efficiency using a spin transfer torque-random access memory for a register file used to compute a general purpose graphic processing device, and hierarchically uses a register cache and a buffer together with the spin transfer torque-random access memory, to minimize leakage current, reduce a write operation power, and solve the write delay.
    Type: Grant
    Filed: February 5, 2020
    Date of Patent: March 21, 2023
    Assignee: INDUSTRY-ACADEMIC COOPERATION FOUNDATION, YONSEI UNIVERSITY
    Inventors: Won Woo Ro, Jun Hyun Park