Patents Examined by Michael J Metzger
-
Patent number: 11734013Abstract: An exception summary is provided for an invalid value detected during instruction execution. An indication that a value determined to be invalid was included in input data to a computation of one or more computations or in output data resulting from the one or more computations is obtained. The value is determined to be invalid due to one exception of a plurality of exceptions. Based on obtaining the indication that the value is determined to be invalid, a summary indicator is set. The summary indicator represents the plurality of exceptions collectively.Type: GrantFiled: June 17, 2021Date of Patent: August 22, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Laith M. AlBarakat, Jonathan D. Bradbury, Timothy Slegel, Cedric Lichtenau, Joachim von Buttlar
-
Patent number: 11726844Abstract: The present disclosure provides a processing device for performing generative adversarial network and a method for machine creation applying the processing device. The processing device includes a memory configured to receive input data including a random noise and reference data, and store a discriminator neural network parameter and a generator neural network parameter, and the processing device further includes a computation device configured to transmit the random noise input data into a generator neural network and perform operation to obtain a noise generation result, and input both of the noise generation result and the reference data into a discriminator neural network and perform operation to obtain a discrimination result, and further configured to update the discriminator neural network parameter and the generator neural network parameter according to the discrimination result.Type: GrantFiled: November 25, 2019Date of Patent: August 15, 2023Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTDInventors: Tianshi Chen, Shuai Hu, Yifan Hao, Yufeng Gao
-
Patent number: 11727268Abstract: A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.Type: GrantFiled: November 28, 2019Date of Patent: August 15, 2023Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.Inventors: Zai Wang, Shengyuan Zhou, Shuai Hu, Tianshi Chen
-
Patent number: 11720405Abstract: A processor-implemented accelerator method includes: reading, from a memory, an instruction to be executed in an accelerator; reading, from the memory, input data based on the instruction; and performing, on the input data and a parameter value included in the instruction, an inference task corresponding to the instruction.Type: GrantFiled: January 11, 2021Date of Patent: August 8, 2023Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB FoundationInventors: Wookeun Jung, Jaejin Lee, Seung Wook Lee
-
Patent number: 11720353Abstract: The present disclosure provides a processing device and method. The device includes: an input/output module, a controller module, a computing module, and a storage module. The input/output module is configured to store and transmit input and output data; the controller module is configured to decode a computation instruction into a control signal to control other modules to perform operation; the computing module is configured to perform four arithmetic operation, logical operation, shift operation, and complement operation on data; and the storage module is configured to temporarily store instructions and data. The present disclosure can execute a composite scalar instruction accurately and efficiently.Type: GrantFiled: November 27, 2019Date of Patent: August 8, 2023Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTDInventors: Shaoli Liu, Yuzhe Luo, Qi Guo, Tianshi Chen
-
Patent number: 11720783Abstract: Aspects of a neural network operation device are described herein. The aspects may include a matrix element storage module configured to receive a first matrix that includes one or more first values, each of the first values being represented in a sequence that includes one or more bits. The matrix element storage module may be further configured to respectively store the one or more bits in one or more storage spaces in accordance with positions of the bits in the sequence. The aspects may further include a numeric operation module configured to calculate an intermediate result for each storage space based on one or more second values in a second matrix and an accumulation module configured to sum the intermediate results to generate an output value.Type: GrantFiled: October 21, 2019Date of Patent: August 8, 2023Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.Inventors: Tianshi Chen, Yimin Zhuang, Qi Guo, Shaoli Liu, Yunji Chen
-
Patent number: 11709681Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.Type: GrantFiled: December 11, 2017Date of Patent: July 25, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Jay Fleischman, Michael Estlick, Michael Christopher Sedmak, Erik Swanson, Sneha V. Desai
-
Patent number: 11675906Abstract: Infection by viruses and rootkits from data memory devices, data messages and data operations are rendered impossible by construction for the Simultaneous Multi-Processor (SiMulPro) cores, core modules, Programmable Execution Modules (PEM), PEM Arrays, STAR messaging protocol implementations, integrated circuits (referred to as chips herein), and systems composed of these components. Greatly improved energy efficiency is disclosed. A system implementation of an Application Specific Integrated Circuit (ASIC) communicating with a DRAM controller interacting with a DRAM array is presented with this resistance to virus and rootkit infection, and simultaneously capable of 1 Teraflop (Tflop) FP16, 1 TFlop FP32 and 1 Tflop FP64 performance while accessing 1 Tbyte of DRAM with a power budget comparable to today's desktop or notebook computers accessing 8 Gbytes of DRAM.Type: GrantFiled: November 12, 2019Date of Patent: June 13, 2023Assignee: QSIGMA, INC.Inventor: Earle Jennings
-
Patent number: 11669418Abstract: Apparatus adapted for exascale computers are disclosed. The apparatus includes, but is not limited to at least one of: a system, data processor chip (DPC), Landing module (LM), chips including LM, anticipator chips, simultaneous multi-processor (SMP) cores, SMP channel (SMPC) cores, channels, bundles of channels, printed circuit boards (PCB) including bundles, floating point adders, accumulation managers, QUAD Link Anticipating Memory (QUADLAM), communication networks extended by coupling links of QUADLAM, log2 calculators, exp2 calculators, logALU, Non-Linear Accelerator (NLA), and stairways. Methods of algorithm and program development, verification and debugging are also disclosed. Collectively, embodiments of these elements disclose a class of supercomputers that obsolete Amdahl's Law, providing cabinets of petaflop performance and systems that may meet or exceed an exaflop of performance for Block LU Decomposition (Linpack).Type: GrantFiled: November 5, 2019Date of Patent: June 6, 2023Assignee: QSIGMA, INC.Inventors: Earle Jennings, George Landers
-
Patent number: 11663011Abstract: Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.Type: GrantFiled: July 7, 2020Date of Patent: May 30, 2023Assignee: Qualcomm IncorporatedInventors: Peter Sassone, Christopher Koob, Suresh Kumar Venkumahanti
-
Patent number: 11663107Abstract: A computer implemented method, performed in a data processing system comprising a performance monitoring unit. The method comprises receiving a set of computer-readable instructions to be executed by the data processing system to implement at least a portion of a neural network, wherein one or more of the instructions is labeled with one or more performance monitoring labels based upon one or more features of the neural network. The method further comprises configuring the performance monitoring unit to count one or more events occurring in one or more components of the data processing system based on the one or more performance monitoring labels.Type: GrantFiled: February 21, 2020Date of Patent: May 30, 2023Assignee: ARM LIMITEDInventors: Elliot Maurice Simon Rosemarine, Rachel Jean Trimble
-
Patent number: 11657261Abstract: A neural processing device is provided. The neural processing device comprises a plurality of neural processors, a shared memory shared by the plurality of neural processors, a plurality of semaphore memories, and global interconnection. The plurality of neural processors generates a plurality of L3 sync targets, respectively. Each semaphore memory is associated with a respective one of the plurality of neural processors, and the plurality of semaphore memories receive and store the plurality of L3 sync targets, respectively. Synchronization of the plurality of neural processors is performed according to the plurality of L3 sync targets. The global interconnection connects the plurality of neural processors with the shared memory, and comprises an L3 sync channel through which an L3 synchronization signal corresponding to at least one L3 sync target is transmitted.Type: GrantFiled: April 29, 2022Date of Patent: May 23, 2023Assignee: Rebellions Inc.Inventors: Jinwook Oh, Jinseok Kim, Kyeongryeol Bong, Wongyu Shin, Chang-Hyo Yu
-
Patent number: 11656876Abstract: Techniques are disclosed relating to an apparatus, including a data storage circuit having a plurality of entries, and a load-store pipeline configured to allocate an entry in the data storage circuit in response to a determination that a first instruction includes an access to an external memory circuit. The apparatus further includes an execution pipeline configured to make a determination, while performing a second instruction and using the entry in the data storage circuit, that the second instruction uses a result of the first instruction, and cease performance of the second instruction in response to the determination.Type: GrantFiled: February 10, 2021Date of Patent: May 23, 2023Assignee: Cadence Design Systems, Inc.Inventors: Robert T. Golla, Deepak Panwar
-
Patent number: 11630997Abstract: A processor-implemented data processing method includes encoding a plurality of weights of a filter of a neural network using an inverted two's complement fixed-point format; generating weight data based on values of the encoded weights corresponding to same filter positions of a plurality of filters; and performing an operation on the weight data and input activation data using a bit-serial scheme to control when to perform an activation function with respect to the weight data and input activation data.Type: GrantFiled: January 23, 2019Date of Patent: April 18, 2023Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB FoundationInventors: Seungwon Lee, Dongwoo Lee, Kiyoung Choi, Sungbum Kang
-
Patent number: 11630991Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.Type: GrantFiled: February 4, 2020Date of Patent: April 18, 2023Assignee: Apple Inc.Inventors: Christopher L. Mills, Kenneth W. Waters, Youchang Kim
-
Patent number: 11630667Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.Type: GrantFiled: November 27, 2019Date of Patent: April 18, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Jiasheng Chen, Bin He, Jian Huang, Michael Mantor
-
Patent number: 11625250Abstract: The disclosed systems, structures, and methods are directed to parallel processing of tasks in a multiple thread computing system. Execution of an instruction sequence of a thread allocated to a first task proceeds until an exit point of the instruction sequence is reached. The execution of the instruction sequence of the thread for the first task is terminated at a convergence point of the instruction sequence. The thread is selectively reallocated to process a second task.Type: GrantFiled: January 29, 2021Date of Patent: April 11, 2023Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Ahmed Mohammed ElShafiey Mohammed Eltantawy, Yan Luo, Tyler Bryce Nowicki
-
Patent number: 11625249Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.Type: GrantFiled: December 29, 2020Date of Patent: April 11, 2023Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Jagadish B. Kotra, John Kalamatianos
-
Patent number: 11614936Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.Type: GrantFiled: March 29, 2021Date of Patent: March 28, 2023Assignee: Intel CorporationInventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 11609786Abstract: The embodiments provide a register file device which increases energy efficiency using a spin transfer torque-random access memory for a register file used to compute a general purpose graphic processing device, and hierarchically uses a register cache and a buffer together with the spin transfer torque-random access memory, to minimize leakage current, reduce a write operation power, and solve the write delay.Type: GrantFiled: February 5, 2020Date of Patent: March 21, 2023Assignee: INDUSTRY-ACADEMIC COOPERATION FOUNDATION, YONSEI UNIVERSITYInventors: Won Woo Ro, Jun Hyun Park