Patents by Inventor Suyog Gupta

Suyog Gupta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NEURAL NETWORK ARCHITECTURE FOR IMPLEMENTING GROUP CONVOLUTIONS

Publication number: 20250124700

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for processing an input image using a convolutional neural network (CNN). The CNN includes a sequence of layer blocks. Each of a first subset of the layer blocks in the sequence is configured to perform operations that include: i) receiving an input feature map for the layer block, ii) generating an expanded feature map from the input feature map using a group convolution, and iii) generating a reduced feature map from the expanded feature map. The input feature map is an h w feature map with c1 channels. The expanded feature map is an h w feature map with c2 channels, whereas the reduced feature map is an h w feature map with c1 channels. C2 is greater than c1. An output feature map is generated for the layer block from the reduced feature map.

Type: Application

Filed: October 8, 2021

Publication date: April 17, 2025

Inventors: Berkin Akin, Suyog Gupta, Cao Gao, Ping Zhou, Gabriel Mintzer Bender, Hanxiao Liu
Preemption in a machine learning hardware accelerator

Patent number: 12197959

Abstract: The present disclosure describes a system and method for preempting a long-running process with a higher priority process in a machine learning system, such as a hardware accelerator. The machine learning hardware accelerator can be a multi-chip system including semiconductor chips that can be application-specific integrated circuits (ASIC) designed to perform machine learning operations. An ASIC is an integrated circuit (IC) that is customized for a particular use.

Type: Grant

Filed: December 21, 2020

Date of Patent: January 14, 2025

Assignee: Google LLC

Inventors: Temitayo Fadelu, Ravi Narayanaswami, JiHong Min, Dongdong Li, Suyog Gupta, Jason Jong Kyu Park
HARDWARE ACCELERATOR OPTIMIZED GROUP CONVOLUTION BASED NEURAL NETWORK MODELS

Publication number: 20240386260

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for processing an input image using integrated circuit that implements a convolutional neural network with a group convolution layer. The processing includes determining a mapping of partitions along a channel dimension of an input feature map to multiply accumulate cells (MACs) in a computational unit of the circuit and applying a group convolution to the input feature map. Applying the group convolution includes, for each partition: providing weights for the group convolution layer to a subset of MACs based on the mapping; providing, via an input bus of the circuit, an input of the feature map to each MAC in the subset; and computing, at each MAC in the subset, a product using the input and a weight for the group convolution layer. An output feature map is generated for the group convolution layer based on an accumulation of products.

Type: Application

Filed: October 8, 2021

Publication date: November 21, 2024

Inventors: Berkin Akin, Suyog Gupta, Cao Gao, Ping Zhou, Gabriel Mintzer Bender, Hanxiao Liu
Multi-partition memory sharing with multiple components

Patent number: 12013780

Abstract: Components on an IC chip may operate faster or provide higher performance relative to power consumption if allowed access to sufficient memory resources. If every component is provided its own memory, however, the chip becomes expensive. In described implementations, memory is shared between two or more components. For example, a processing component can include computational circuitry and a memory coupled thereto. A multi-component cache controller is coupled to the memory. Logic circuitry is coupled to the cache controller and the memory. The logic circuitry selectively separates the memory into multiple memory partitions. A first memory partition can be allocated to the computational circuitry and provide storage to the computational circuitry. A second memory partition can be allocated to the cache controller and provide storage to multiple components.

Type: Grant

Filed: August 19, 2020

Date of Patent: June 18, 2024

Assignee: Google LLC

Inventors: Suyog Gupta, Ravi Narayanaswami, Uday Kumar Dasari, Ali Iranli, Pavan Thirunagari, Vinu Vijay Kumar, Sunitha R. Kosireddy
Compression of fully connected / recurrent layers of deep network(s) through enforcing spatial locality to weight matrices and effecting frequency compression

Patent number: 11977974

Abstract: A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components.

Type: Grant

Filed: November 30, 2017

Date of Patent: May 7, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Pritish Narayanan
PREEMPTION IN A MACHINE LEARNING HARDWARE ACCELERATOR

Publication number: 20230418677

Abstract: The present disclosure describes a system and method for preempting a long-running process with a higher priority process in a machine learning system, such as a hardware accelerator. The machine learning hardware accelerator can be a multi-chip system including semiconductor chips that can be application-specific integrated circuits (ASIC) designed to perform machine learning operations. An ASIC is an integrated circuit (IC) that is customized for a particular use.

Type: Application

Filed: December 21, 2020

Publication date: December 28, 2023

Inventors: Temitayo Fadelu, Ravi Narayanaswami, JiHong Min, Dongdong Li, Suyog Gupta, Jason Jong Kyu Park
Memory Sharing

Publication number: 20220300421

Abstract: Components on an IC chip may operate faster or provide higher performance relative to power consumption if allowed access to sufficient memory resources. If every component is provided its own memory, however, the chip becomes expensive. In described implementations, memory is shared between two or more components. For example, a processing component can include computational circuitry and a memory coupled thereto. A multi-component cache controller is coupled to the memory. Logic circuitry is coupled to the cache controller and the memory. The logic circuitry selectively separates the memory into multiple memory partitions. A first memory partition can be allocated to the computational circuitry and provide storage to the computational circuitry. A second memory partition can be allocated to the cache controller and provide storage to multiple components.

Type: Application

Filed: August 19, 2020

Publication date: September 22, 2022

Applicant: Google LLC

Inventors: Suyog Gupta, Ravi Narayanaswami, Uday Kumar Dasari, Ali Iranli, Pavan Thirunagari, Vinu Vijay Kumar, Sunitha R. Kosireddy
HARDWARE CIRCUIT FOR ACCELERATING NEURAL NETWORK COMPUTATIONS

Publication number: 20210326683

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for a hardware circuit configured to implement a neural network. The circuit includes multiple super tiles. Each super tile includes a unified memory for storing inputs to a neural network layer and weights for the layer. Each super tile includes multiple compute tiles. Each compute tile executes a compute thread that is used to perform the computations to generate an output for the neural network layer. Each super tile includes arbitration logic coupled to the unified memory and each compute tile. The arbitration logic is configured to: pass inputs stored in the unified memory to the compute tiles; pass weights stored in the unified memory to the compute tiles; and pass, to the unified memory, the output generated for the layer based on computations performed at the compute tiles using the inputs and the weights for the layer.

Type: Application

Filed: December 19, 2019

Publication date: October 21, 2021

Inventors: Ravi Narayanaswami, Dong Hyuk Woo, Suyog Gupta, Uday Kumar Dasari
Acceleration of convolutional neural network training using stochastic perforation

Patent number: 10540583

Abstract: Technical solutions are described to accelerate training of a multi-layer convolutional neural network. According to one aspect, a computer implemented method is described. A convolutional layer includes input maps, convolutional kernels, and output maps. The method includes a forward pass, a backward pass, and an update pass that each include convolution calculations. The described method performs the convolutional operations involved in the forward, the backward, and the update passes based on a first, a second, and a third perforation map respectively. The perforation maps are stochastically generated, and distinct from each other. The method further includes interpolating results of the selective convolution operations to obtain remaining results. The method includes iteratively repeating the forward pass, the backward pass, and the update pass until the convolutional neural network is trained. Other aspects such as a system, apparatus, and computer program product are also described.

Type: Grant

Filed: November 30, 2015

Date of Patent: January 21, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Leland Chang, Suyog Gupta
Random telegraph noise native device for true random number generator and noise injection

Patent number: 10416965

Abstract: A method (and system) for generating random numbers includes setting a drain voltage Vd on an MOSFET (metal oxide semiconductor field effect transistor) device and a gate voltage Vg of the MOSFET device so that the MOSFET device comprises a noise source configured in a manner such as to tune as desired a random number statistical distribution of an output of the MOSFET device. An output voltage of the MOSFET is provided as an input signal into a low noise amplifier and an output voltage of the low noise amplifier provides values for a random number generator.

Type: Grant

Filed: July 18, 2018

Date of Patent: September 17, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-yu Chen, Damon Farmer, Suyog Gupta, Shu-jen Han
Acceleration of convolutional neural network training using stochastic perforation

Patent number: 10380479

Abstract: Technical solutions are described to accelerate training of a multi-layer convolutional neural network. According to one aspect, a computer implemented method is described. A convolutional layer includes input maps, convolutional kernels, and output maps. The method includes a forward pass, a backward pass, and an update pass that each include convolution calculations. The described method performs the convolutional operations involved in the forward, the backward, and the update passes based on a first, a second, and a third perforation map respectively. The perforation maps are stochastically generated, and distinct from each other. The method further includes interpolating results of the selective convolution operations to obtain remaining results. The method includes iteratively repeating the forward pass, the backward pass, and the update pass until the convolutional neural network is trained. Other aspects such as a system, apparatus, and computer program product are also described.

Type: Grant

Filed: October 8, 2015

Date of Patent: August 13, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Leland Chang, Suyog Gupta
Approximate synchronization for parallel deep learning

Patent number: 10338931

Abstract: Techniques facilitating synchronization of processing engines for parallel deep learning are provided. In one example, a first processing component associated with a processor and processing components can: generate first output data based on input data associated with a machine learning process, wherein the processing components are communicatively coupled with an assignment component via a network; transmit the first output data to a second processing component of the processing components, wherein the first processing component and the second processing component comprise a first group of the processing components and the first group of the processing components is determined by the assignment component based on a first defined criterion; receive communication data generated by the second processing component; and generate second output data based on the communication data, wherein the second output data is an updated version of the first output data stored in the memory of the first processing component.

Type: Grant

Filed: April 29, 2016

Date of Patent: July 2, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Suyog Gupta, Ravi Nair
COMPRESSION OF FULLY CONNECTED / RECURRENT LAYERS OF DEEP NETWORK(S) THROUGH ENFORCING SPATIAL LOCALITY TO WEIGHT MATRICES AND EFFECTING FREQUENCY COMPRESSION

Publication number: 20190164050

Abstract: A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components.

Type: Application

Filed: November 30, 2017

Publication date: May 30, 2019

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Pritish Narayanan
Random Telegraph Noise Native Device for True Random Number Generator and Noise Injection

Publication number: 20180341462

Abstract: A method (and system) for generating random numbers includes setting a drain voltage Vd on an MOSFET (metal oxide semiconductor field effect transistor) device and a gate voltage Vg of the MOSFET device so that the MOSFET device comprises a noise source configured in a manner such as to tune as desired a random number statistical distribution of an output of the MOSFET device. An output voltage of the MOSFET is provided as an input signal into a low noise amplifier and an output voltage of the low noise amplifier provides values for a random number generator.

Type: Application

Filed: July 18, 2018

Publication date: November 29, 2018

Inventors: Chia-yu Chen, Damon Farmer, Suyog Gupta, Shu-jen Han
Random telegraph noise native device for true random number generator and noise injection

Patent number: 10095476

Abstract: A method (and system) for generating random numbers includes setting a drain voltage Vd on an MOSFET device to maximize a transconductance of the MOSFET device and setting a gate voltage Vg of the MOSFET device to tune as desired a random number statistical distribution of an output of the MOSFET device. The MOSFET device includes a gate structure with an oxide layer including at least one artificial trapping layer in which carrier traps are designed to occupy a predetermined distance from conduction and valance bands of material of the artificial trapping layer.

Type: Grant

Filed: December 2, 2015

Date of Patent: October 9, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-yu Chen, Damon Farmer, Suyog Gupta, Shu-jen Han
Magnetic tunnel junction (MTJ) based true random number generators (TRNG)

Patent number: 10078496

Abstract: An apparatus is presented for generating a true random number generator (TRNG). The apparatus includes a magnetic tunnel junction (MTJ) device including a first layer, a second layer, and third layer, as well as a bias circuit to bias the MTJ device along with a pulse height discriminator and a time-to-amplitude convertor to generate random bit-streams. The second layer is a barrier layer with an energy barrier height in the order of 20 kT, where k is the Boltzmann constant and T is the absolute temperature. Random flipping of an orientation of magnetization of the third layer is induced by thermal fluctuations in the MTJ device.

Type: Grant

Filed: February 23, 2017

Date of Patent: September 18, 2018

Assignee: International Business Machines Corporation

Inventors: Suyog Gupta, Chandrasekharan Kothandaraman, Jonathan Z. Sun
MAGNETIC TUNNEL JUNCTION (MTJ) BASED TRUE RANDOM NUMBER GENERATORS (TRNG)

Publication number: 20180239590

Abstract: An apparatus is presented for generating a true random number generator (TRNG). The apparatus includes a magnetic tunnel junction (MTJ) device including a first layer, a second layer, and third layer, as well as a bias circuit to bias the MTJ device along with a pulse height discriminator and a time-to-amplitude convertor to generate random bit-streams. The second layer is a barrier layer with an energy barrier height in the order of 20kT, where k is the Boltzmann constant and T is the absolute temperature. Random flipping of an orientation of magnetization of the third layer is induced by thermal fluctuations in the MTJ device.

Type: Application

Filed: February 23, 2017

Publication date: August 23, 2018

Inventors: Suyog Gupta, Chandrasekharan Kothandaraman, Jonathan Z. Sun
Spatially decoupled floating gate semiconductor device

Patent number: 10043875

Abstract: A method includes forming a tunneling dielectric layer on a semiconductor substrate, a first portion of the tunneling dielectric layer is directly above a channel region in the semiconductor substrate and a second portion of the tunneling dielectric layer is directly above source-drain regions located on opposing sides of the channel region, the second portion of the tunneling dielectric layer is thicker than the first portion of the tunneling dielectric layer, forming a floating gate directly above the first portion of the tunneling dielectric layer and the second portion of the tunneling dielectric layer, and forming a control dielectric layer directly above the floating gate.

Type: Grant

Filed: January 5, 2018

Date of Patent: August 7, 2018

Assignee: International Business Machines Corporation

Inventors: Suyog Gupta, Bahman Hekmatshoartabari
Spatially decoupled floating gate semiconductor device

Patent number: 10038067

Abstract: A method includes forming a tunneling dielectric layer on a semiconductor substrate, a first portion of the tunneling dielectric layer is directly above a channel region in the semiconductor substrate and a second portion of the tunneling dielectric layer is directly above source-drain regions located on opposing sides of the channel region, the second portion of the tunneling dielectric layer is thicker than the first portion of the tunneling dielectric layer, forming a floating gate directly above the first portion of the tunneling dielectric layer and the second portion of the tunneling dielectric layer, and forming a control dielectric layer directly above the floating gate.

Type: Grant

Filed: January 5, 2018

Date of Patent: July 31, 2018

Assignee: International Business Machines Corporation

Inventors: Suyog Gupta, Bahman Hekmatshoartabari
SPATIALLY DECOUPLED FLOATING GATE SEMICONDUCTOR DEVICE

Publication number: 20180145141

Abstract: A method includes forming a tunneling dielectric layer on a semiconductor substrate, a first portion of the tunneling dielectric layer is directly above a channel region in the semiconductor substrate and a second portion of the tunneling dielectric layer is directly above source-drain regions located on opposing sides of the channel region, the second portion of the tunneling dielectric layer is thicker than the first portion of the tunneling dielectric layer, forming a floating gate directly above the first portion of the tunneling dielectric layer and the second portion of the tunneling dielectric layer, and forming a control dielectric layer directly above the floating gate.

Type: Application

Filed: January 5, 2018

Publication date: May 24, 2018

Inventors: Suyog Gupta, Bahman Hekmatshoartabari

1 2 next