Patents Examined by Michael J Metzger

Exception summary for invalid values detected during instruction execution

Patent number: 11734013

Abstract: An exception summary is provided for an invalid value detected during instruction execution. An indication that a value determined to be invalid was included in input data to a computation of one or more computations or in output data resulting from the one or more computations is obtained. The value is determined to be invalid due to one exception of a plurality of exceptions. Based on obtaining the indication that the value is determined to be invalid, a summary indicator is set. The summary indicator represents the plurality of exceptions collectively.

Type: Grant

Filed: June 17, 2021

Date of Patent: August 22, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Laith M. AlBarakat, Jonathan D. Bradbury, Timothy Slegel, Cedric Lichtenau, Joachim von Buttlar
Data sharing system and data sharing method therefor

Patent number: 11726844

Abstract: The present disclosure provides a processing device for performing generative adversarial network and a method for machine creation applying the processing device. The processing device includes a memory configured to receive input data including a random noise and reference data, and store a discriminator neural network parameter and a generator neural network parameter, and the processing device further includes a computation device configured to transmit the random noise input data into a generator neural network and perform operation to obtain a noise generation result, and input both of the noise generation result and the reference data into a discriminator neural network and perform operation to obtain a discrimination result, and further configured to update the discriminator neural network parameter and the generator neural network parameter according to the discrimination result.

Type: Grant

Filed: November 25, 2019

Date of Patent: August 15, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD

Inventors: Tianshi Chen, Shuai Hu, Yifan Hao, Yufeng Gao
Sparse training in neural networks

Patent number: 11727268

Abstract: A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.

Type: Grant

Filed: November 28, 2019

Date of Patent: August 15, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.

Inventors: Zai Wang, Shengyuan Zhou, Shuai Hu, Tianshi Chen
Accelerator, method of operating the same, and electronic device including the same

Patent number: 11720405

Abstract: A processor-implemented accelerator method includes: reading, from a memory, an instruction to be executed in an accelerator; reading, from the memory, input data based on the instruction; and performing, on the input data and a parameter value included in the instruction, an inference task corresponding to the instruction.

Type: Grant

Filed: January 11, 2021

Date of Patent: August 8, 2023

Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation

Inventors: Wookeun Jung, Jaejin Lee, Seung Wook Lee
Processing apparatus and processing method

Patent number: 11720353

Abstract: The present disclosure provides a processing device and method. The device includes: an input/output module, a controller module, a computing module, and a storage module. The input/output module is configured to store and transmit input and output data; the controller module is configured to decode a computation instruction into a control signal to control other modules to perform operation; the computing module is configured to perform four arithmetic operation, logical operation, shift operation, and complement operation on data; and the storage module is configured to temporarily store instructions and data. The present disclosure can execute a composite scalar instruction accurately and efficiently.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 8, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD

Inventors: Shaoli Liu, Yuzhe Luo, Qi Guo, Tianshi Chen
Multiplication and addition device for matrices, neural network computing device, and method

Patent number: 11720783

Abstract: Aspects of a neural network operation device are described herein. The aspects may include a matrix element storage module configured to receive a first matrix that includes one or more first values, each of the first values being represented in a sequence that includes one or more bits. The matrix element storage module may be further configured to respectively store the one or more bits in one or more storage spaces in accordance with positions of the bits in the sequence. The aspects may further include a numeric operation module configured to calculate an intermediate result for each storage space based on one or more second values in a second matrix and an accumulation module configured to sum the intermediate results to generate an output value.

Type: Grant

Filed: October 21, 2019

Date of Patent: August 8, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.

Inventors: Tianshi Chen, Yimin Zhuang, Qi Guo, Shaoli Liu, Yunji Chen
Differential pipeline delays in a coprocessor

Patent number: 11709681

Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.

Type: Grant

Filed: December 11, 2017

Date of Patent: July 25, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Jay Fleischman, Michael Estlick, Michael Christopher Sedmak, Erik Swanson, Sneha V. Desai
Simultaneous multi-processor (SiMulPro) apparatus, simultaneous transmit and receive (STAR) apparatus, DRAM interface apparatus, and associated methods

Patent number: 11675906

Abstract: Infection by viruses and rootkits from data memory devices, data messages and data operations are rendered impossible by construction for the Simultaneous Multi-Processor (SiMulPro) cores, core modules, Programmable Execution Modules (PEM), PEM Arrays, STAR messaging protocol implementations, integrated circuits (referred to as chips herein), and systems composed of these components. Greatly improved energy efficiency is disclosed. A system implementation of an Application Specific Integrated Circuit (ASIC) communicating with a DRAM controller interacting with a DRAM array is presented with this resistance to virus and rootkit infection, and simultaneously capable of 1 Teraflop (Tflop) FP16, 1 TFlop FP32 and 1 Tflop FP64 performance while accessing 1 Tbyte of DRAM with a power budget comparable to today's desktop or notebook computers accessing 8 Gbytes of DRAM.

Type: Grant

Filed: November 12, 2019

Date of Patent: June 13, 2023

Assignee: QSIGMA, INC.

Inventor: Earle Jennings
Simultaneous multi-processor apparatus applicable to achieving exascale performance for algorithms and program systems

Patent number: 11669418

Abstract: Apparatus adapted for exascale computers are disclosed. The apparatus includes, but is not limited to at least one of: a system, data processor chip (DPC), Landing module (LM), chips including LM, anticipator chips, simultaneous multi-processor (SMP) cores, SMP channel (SMPC) cores, channels, bundles of channels, printed circuit boards (PCB) including bundles, floating point adders, accumulation managers, QUAD Link Anticipating Memory (QUADLAM), communication networks extended by coupling links of QUADLAM, log2 calculators, exp2 calculators, logALU, Non-Linear Accelerator (NLA), and stairways. Methods of algorithm and program development, verification and debugging are also disclosed. Collectively, embodiments of these elements disclose a class of supercomputers that obsolete Amdahl's Law, providing cabinets of petaflop performance and systems that may meet or exceed an exaflop of performance for Block LU Decomposition (Linpack).

Type: Grant

Filed: November 5, 2019

Date of Patent: June 6, 2023

Assignee: QSIGMA, INC.

Inventors: Earle Jennings, George Landers
System and method of VLIW instruction processing using reduced-width VLIW processor

Patent number: 11663011

Abstract: Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.

Type: Grant

Filed: July 7, 2020

Date of Patent: May 30, 2023

Assignee: Qualcomm Incorporated

Inventors: Peter Sassone, Christopher Koob, Suresh Kumar Venkumahanti
Data processing system performance monitoring

Patent number: 11663107

Abstract: A computer implemented method, performed in a data processing system comprising a performance monitoring unit. The method comprises receiving a set of computer-readable instructions to be executed by the data processing system to implement at least a portion of a neural network, wherein one or more of the instructions is labeled with one or more performance monitoring labels based upon one or more features of the neural network. The method further comprises configuring the performance monitoring unit to count one or more events occurring in one or more components of the data processing system based on the one or more performance monitoring labels.

Type: Grant

Filed: February 21, 2020

Date of Patent: May 30, 2023

Assignee: ARM LIMITED

Inventors: Elliot Maurice Simon Rosemarine, Rachel Jean Trimble
Neural processing device and method for synchronization thereof

Patent number: 11657261

Abstract: A neural processing device is provided. The neural processing device comprises a plurality of neural processors, a shared memory shared by the plurality of neural processors, a plurality of semaphore memories, and global interconnection. The plurality of neural processors generates a plurality of L3 sync targets, respectively. Each semaphore memory is associated with a respective one of the plurality of neural processors, and the plurality of semaphore memories receive and store the plurality of L3 sync targets, respectively. Synchronization of the plurality of neural processors is performed according to the plurality of L3 sync targets. The global interconnection connects the plurality of neural processors with the shared memory, and comprises an L3 sync channel through which an L3 synchronization signal corresponding to at least one L3 sync target is transmitted.

Type: Grant

Filed: April 29, 2022

Date of Patent: May 23, 2023

Assignee: Rebellions Inc.

Inventors: Jinwook Oh, Jinseok Kim, Kyeongryeol Bong, Wongyu Shin, Chang-Hyo Yu
Removal of dependent instructions from an execution pipeline

Patent number: 11656876

Abstract: Techniques are disclosed relating to an apparatus, including a data storage circuit having a plurality of entries, and a load-store pipeline configured to allocate an entry in the data storage circuit in response to a determination that a first instruction includes an access to an external memory circuit. The apparatus further includes an execution pipeline configured to make a determination, while performing a second instruction and using the entry in the data storage circuit, that the second instruction uses a result of the first instruction, and cease performance of the second instruction in response to the determination.

Type: Grant

Filed: February 10, 2021

Date of Patent: May 23, 2023

Assignee: Cadence Design Systems, Inc.

Inventors: Robert T. Golla, Deepak Panwar
Method and apparatus with bit-serial data processing of a neural network

Patent number: 11630997

Abstract: A processor-implemented data processing method includes encoding a plurality of weights of a filter of a neural network using an inverted two's complement fixed-point format; generating weight data based on values of the encoded weights corresponding to same filter positions of a plurality of filters; and performing an operation on the weight data and input activation data using a bit-serial scheme to control when to perform an activation function with respect to the weight data and input activation data.

Type: Grant

Filed: January 23, 2019

Date of Patent: April 18, 2023

Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation

Inventors: Seungwon Lee, Dongwoo Lee, Kiyoung Choi, Sungbum Kang
Broadcasting mode of planar engine for neural processor

Patent number: 11630991

Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

Type: Grant

Filed: February 4, 2020

Date of Patent: April 18, 2023

Assignee: Apple Inc.

Inventors: Christopher L. Mills, Kenneth W. Waters, Youchang Kim
Dedicated vector sub-processor system

Patent number: 11630667

Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.

Type: Grant

Filed: November 27, 2019

Date of Patent: April 18, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Jiasheng Chen, Bin He, Jian Huang, Michael Mantor
Method and system for parallel processing of tasks in multiple thread computing

Patent number: 11625250

Abstract: The disclosed systems, structures, and methods are directed to parallel processing of tasks in a multiple thread computing system. Execution of an instruction sequence of a thread allocated to a first task proceeds until an exit point of the instruction sequence is reached. The execution of the instruction sequence of the thread for the first task is terminated at a convergence point of the instruction sequence. The thread is selectively reallocated to process a second task.

Type: Grant

Filed: January 29, 2021

Date of Patent: April 11, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Ahmed Mohammed ElShafiey Mohammed Eltantawy, Yan Luo, Tyler Bryce Nowicki
Preserving memory ordering between offloaded instructions and non-offloaded instructions

Patent number: 11625249

Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

Type: Grant

Filed: December 29, 2020

Date of Patent: April 11, 2023

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Jagadish B. Kotra, John Kalamatianos
Systems and methods for performing 16-bit floating-point matrix dot product instructions

Patent number: 11614936

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

Type: Grant

Filed: March 29, 2021

Date of Patent: March 28, 2023

Assignee: Intel Corporation

Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Hierarchical register file device based on spin transfer torque-random access memory

Patent number: 11609786

Abstract: The embodiments provide a register file device which increases energy efficiency using a spin transfer torque-random access memory for a register file used to compute a general purpose graphic processing device, and hierarchically uses a register cache and a buffer together with the spin transfer torque-random access memory, to minimize leakage current, reduce a write operation power, and solve the write delay.

Type: Grant

Filed: February 5, 2020

Date of Patent: March 21, 2023

Assignee: INDUSTRY-ACADEMIC COOPERATION FOUNDATION, YONSEI UNIVERSITY

Inventors: Won Woo Ro, Jun Hyun Park

prev 1 2 3 4 5 6 7 … next