Patents by Inventor Tung D. Le

Tung D. Le has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ACCELERATE INFERENCE PERFORMANCE ON ARTIFICIAL INTELLIGENCE ACCELERATORS

Publication number: 20240385882

Abstract: A method for inference performance in an artificial intelligence model provides reduction of pre-processing overhead. The method includes receiving a plurality of operations associated with the artificial intelligence model. A computational graph for the artificial intelligence model is generated. Each of the operations is categorized into one of three categories including: accelerator designated operations, central processing unit (CPU) designated operations, and undetermined processing designated operations. An estimated processing time is determined for the operations. The operations are inserted into the computational graph. The computational graph is divided into sub-graphs. Edges of the sub-graphs where pre-processing steps will be performed is determined.

Type: Application

Filed: May 20, 2023

Publication date: November 21, 2024

Inventors: Haruki Imai, Yasushi Negishi, Tung D. Le, Kiyokuni Kawachiya
Choosing execution mode of a neural network based on total memory usage

Patent number: 11880762

Abstract: A computer-implemented method, a computer program product, and a computer processing system are provided for selecting from among multiple Graphics Processing Unit (GPU) execution modes for a Neural Network (NN) having a size greater than a threshold size. The multiple GPU execution modes include a normal memory mode, an Out-of-Core (OoC) execution mode, and a Unified Memory (UM) mode. The method includes starting an execution on the NN with the UM mode and measuring the memory usage for each of layers of the NN. The method further includes selecting an execution mode based on the memory usage of all of the layers.

Type: Grant

Filed: June 26, 2018

Date of Patent: January 23, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yasushi Negishi, Haruki Imai, Taro Sekiyama, Tung D. Le, Kiyokuni Kawachiya
Neural programmer interpreters with modeled primitives

Patent number: 11836613

Abstract: Methods and systems for generating a program include parameterizing a high-order function to replace data with primitive functions. A neural programmer interpreter (NPI) model is trained for the high-order function. Respective neural network models are trained for each primitive function. The neural network models generate data for the NPI model when called.

Type: Grant

Filed: July 17, 2019

Date of Patent: December 5, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Tung D. Le
SHAPE AND DATA FORMAT CONVERSION FOR ACCELERATORS

Publication number: 20230153318

Abstract: A method for converting a shape and a format of tensor data to meet a specific data format of a hardware accelerator is provided. The method receives input tensors L1 and L2, each being constants having a data format of < X x Y x Z >, and each further having an n-dimension input tensor shape as <Xn x Xn-1 x Xn-2 x ... x X1 >. The method stores input tensor shape. The method calculates an n-dimension modified shape of the input tensors by (a) setting a largest divisor of (Xn x Xn-1 x...x X1 ) ? L1 to S1, (b) setting a largest divisor of ((Xn x Xn-1 x...x X1 ) / S1) ? L2 to S2, (c) setting (((Xn x Xn-1 x... x X1 ) / (S1 x S2)) to S3, and (d) returning the n-dimension modified shape as < S3 x S2 x S1 >.

Type: Application

Filed: November 16, 2021

Publication date: May 18, 2023

Inventors: YASUSHI NEGISHI, Tung D. Le, HARUKI IMAI, KIYOKUNI KAWACHIYA
Neural network training using a data flow graph and dynamic memory management

Patent number: 11521062

Abstract: Processing a neural network data flow graph having a set of nodes and a set of edges. An insertion point is determined for a memory reduction or memory restoration operation. The determination is based on computing tensor timing slacks (TTS) for a set of input tensors; compiling a candidate list (SI) of input tensors, from the set of input tensors, using input tensors having corresponding TTS values larger than a threshold value (thTTS); filtering the SI to retain input tensors whose size meets a threshold value (thS); and determining an insertion point for the operation using the SI based on the filtering. A new data flow graph is generated or an existing one is modified using this process.

Type: Grant

Filed: December 5, 2019

Date of Patent: December 6, 2022

Assignee: International Business Machines Corporation

Inventors: Gradus Janssen, Vladimir Zolotov, Tung D. Le
Real-time resource usage reduction in artificial neural networks

Patent number: 11461637

Abstract: A generated algorithm used by a neural network is captured during execution of an iteration of the neural network. A candidate algorithm is identified based on the generated algorithm. A determination is made that the candidate algorithm utilizes less memory than the generated algorithm. Based on the determination the neural network is updated by replacing the generated algorithm with the candidate algorithm.

Type: Grant

Filed: March 6, 2019

Date of Patent: October 4, 2022

Assignee: International Business Machines Corporation

Inventors: Taro Sekiyama, Kiyokuni Kawachiya, Tung D. Le, Yasushi Negishi
ReLU compression to reduce GPU memory

Patent number: 11362670

Abstract: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.

Type: Grant

Filed: October 30, 2020

Date of Patent: June 14, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yasushi Negishi, Tung D. Le, Haruki Imai, Kiyokuni Kawachiya
DATA SWAPPING FOR NEURAL NETWORK MEMORY CONSERVATION

Publication number: 20220138580

Abstract: Methods and systems for training a neural network include identifying units within a neural network, including a first unit for memory swapping and a second unit for re-computation to balance memory efficiency with computational efficiency. Each unit includes at least one layer of the neural network. Each unit has a first layer that is a checkpoint operation. During a feed-forward training stage, feature maps are stored in a first memory. The feature maps are output by the at least one layer of the first unit. The feature maps are swapped from the first memory to a second memory. During a backpropagation stage, the feature maps for the first unit are swapped from the second memory to the first memory. Feature maps for the second unit are re-computed.

Type: Application

Filed: November 4, 2020

Publication date: May 5, 2022

Inventors: Haruki Imai, Tung D. Le, Yasushi Negishi, Kiyokuni Kawachiya
ReLU COMPRESSION TO REDUCE GPU MEMORY

Publication number: 20220140841

Abstract: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.

Type: Application

Filed: October 30, 2020

Publication date: May 5, 2022

Inventors: Yasushi Negishi, Tung D. Le, Haruki Imai, Kiyokuni Kawachiya
Multi-GPU deep learning using CPUs

Patent number: 11164079

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for accelerating neural network data parallel training in multiple graphics processing units (GPUs) using at least one central processing unit (CPU). The method includes forming a set of chunks. Each of the chunks includes a respective group of neural network layers other than a last layer. The method further includes performing one or more chunk-wise synchronization operations during a backward phase of the neural network data parallel training, by each of the multiple GPUs and the at least one CPU.

Type: Grant

Filed: December 15, 2017

Date of Patent: November 2, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tung D. Le, Haruki Imai, Taro Sekiyama, Yasushi Negishi
Localizing tree-based convolutional neural networks

Patent number: 11106970

Abstract: In an approach to localizing tree-based convolutional neural networks, a method includes creating a first tree-based convolution layer (TBCL) corresponding to a tree, where the tree includes a first plurality of nodes and a node that has been indicated to be a first pivotal node. The first TBCL includes a second plurality of nodes and a second pivotal node having a feature vector based on node data from the first pivotal node. The method also includes creating a second TBCL corresponding to the tree. The second TBCL may include a third plurality of nodes. The method further includes determining a feature vector a third pivotal node in the third plurality of nodes based on the feature vectors from: (i) the second pivotal node, (ii) a parent node of the second pivotal node, and (iii) a child node of the second pivotal node.

Type: Grant

Filed: November 17, 2017

Date of Patent: August 31, 2021

Assignee: International Business Machines Corporation

Inventors: Tung D. Le, Taro Sekiyama
NEURAL NETWORK TRAINING USING A DATA FLOW GRAPH AND DYNAMIC MEMORY MANAGEMENT

Publication number: 20210174190

Abstract: Processing a neural network data flow graph having a set of nodes and a set of edges. An insertion point is determined for a memory reduction or memory restoration operation. The determination is based on computing tensor timing slacks (TTS) for a set of input tensors; compiling a candidate list (SI) of input tensors, from the set of input tensors, using input tensors having corresponding TTS values larger than a threshold value (thTTS); filtering the SI to retain input tensors whose size meets a threshold value (thS); and determining an insertion point for the operation using the SI based on the filtering. A new data flow graph is generated or an existing one is modified using this process.

Type: Application

Filed: December 5, 2019

Publication date: June 10, 2021

Inventors: Gradus Janssen, Vladimir Zolotov, Tung D. Le
NEURAL PROGRAMMER INTERPRETERS WITH MODELED PRIMITIVES

Publication number: 20210019613

Abstract: Methods and systems for generating a program include parameterizing a high-order function to replace data with primitive functions. A neural programmer interpreter (NPI) model is trained for the high-order function. Respective neural network models are trained for each primitive function. The neural network models generate data for the NPI model when called.

Type: Application

Filed: July 17, 2019

Publication date: January 21, 2021

Inventor: Tung D. Le
Graph rewriting for large model support using categorized topological sort

Patent number: 10884755

Abstract: A computer-implemented method is provided for managing GPU memory consumption by computational graph rewriting. The method includes constructing, by a hardware processor, a categorized topological ordering of a computational graph. The categorized topological ordering includes multiple computational nodes arranged in multiple levels. The method further includes estimating, by the hardware processor, the GPU memory consumption responsive to a level including two or more computational nodes from among the multiple computational nodes. The method also includes rewriting, by the hardware processor, the computational graph by linearizing the two or more computational nodes in the level to avoid overlapping of the GPU memory consumption by the two or more computational nodes responsive to the GPU memory consumption exceeding a threshold. The memory additionally includes managing the GPU memory consumption in accordance with the rewritten computational graph.

Type: Grant

Filed: July 31, 2019

Date of Patent: January 5, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tung D. Le, Haruki Imai, Yasushi Negishi, Kiyokuni Kawachiya
Real-time resource usage reduction in artificial neural networks

Patent number: 10558914

Abstract: A generated algorithm used by a neural network is captured during execution of an iteration of the neural network. A candidate algorithm is identified based on the generated algorithm. A determination is made that the candidate algorithm utilizes less memory than the generated algorithm. Based on the determination the neural network is updated by replacing the generated algorithm with the candidate algorithm.

Type: Grant

Filed: April 16, 2019

Date of Patent: February 11, 2020

Assignee: International Business Machines Corporation

Inventors: Taro Sekiyama, Kiyokuni Kawachiya, Tung D. Le, Yasushi Negishi
MECHANISM FOR CHOOSING EXECUTION MODE FOR LARGE NEURAL NETWORK

Publication number: 20190392306

Abstract: A computer-implemented method, a computer program product, and a computer processing system are provided for selecting from among multiple Graphics Processing Unit (GPU) execution modes for a Neural Network (NN) having a size greater than a threshold size. The multiple GPU execution modes include a normal memory mode, an Out-of-Core (OoC) execution mode, and a Unified Memory (UM) mode. The method includes starting an execution on the NN with the UM mode and measuring the memory usage for each of layers of the NN. The method further includes selecting an execution mode based on the memory usage of all of the layers.

Type: Application

Filed: June 26, 2018

Publication date: December 26, 2019

Inventors: Yasushi Negishi, Haruki Imai, Taro Sekiyama, Tung D. Le, Kiyokuni Kawachiya
REAL-TIME RESOURCE USAGE REDUCTION IN ARTIFICIAL NEURAL NETWORKS

Publication number: 20190266488

Abstract: A generated algorithm used by a neural network is captured during execution of an iteration of the neural network. A candidate algorithm is identified based on the generated algorithm. A determination is made that the candidate algorithm utilizes less memory than the generated algorithm. Based on the determination the neural network is updated by replacing the generated algorithm with the candidate algorithm.

Type: Application

Filed: April 16, 2019

Publication date: August 29, 2019

Inventors: Taro Sekiyama, Kiyokuni Kawachiya, Tung D. Le, Yasushi Negishi
REAL-TIME RESOURCE USAGE REDUCTION IN ARTIFICIAL NEURAL NETWORKS

Publication number: 20190205755

Abstract: A generated algorithm used by a neural network is captured during execution of an iteration of the neural network. A candidate algorithm is identified based on the generated algorithm. A determination is made that the candidate algorithm utilizes less memory than the generated algorithm. Based on the determination the neural network is updated by replacing the generated algorithm with the candidate algorithm.

Type: Application

Filed: March 6, 2019

Publication date: July 4, 2019

Inventors: Taro Sekiyama, Kiyokuni Kawachiya, Tung D. Le, Yasushi Negishi
MULTI-GPU DEEP LEARNING USING CPUS

Publication number: 20190188560

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for accelerating neural network data parallel training in multiple graphics processing units (GPUs) using at least one central processing unit (CPU). The method includes forming a set of chunks. Each of the chunks includes a respective group of neural network layers other than a last layer. The method further includes performing one or more chunk-wise synchronization operations during a backward phase of the neural network data parallel training, by each of the multiple GPUs and the at least one CPU.

Type: Application

Filed: December 15, 2017

Publication date: June 20, 2019

Inventors: Tung D. Le, Haruki Imai, Taro Sekiyama, Yasushi Negishi
LOCALIZING TREE-BASED CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20190156184

Abstract: In an approach to localizing tree-based convolutional neural networks, a method includes creating a first tree-based convolution layer (TBCL) corresponding to a tree, where the tree includes a first plurality of nodes and a node that has been indicated to be a first pivotal node. The first TBCL includes a second plurality of nodes and a second pivotal node having a feature vector based on node data from the first pivotal node. The method also includes creating a second TBCL corresponding to the tree. The second TBCL may include a third plurality of nodes. The method further includes determining a feature vector a third pivotal node in the third plurality of nodes based on the feature vectors from: (i) the second pivotal node, (ii) a parent node of the second pivotal node, and (ii) a child node of the second pivotal node.

Type: Application

Filed: November 17, 2017

Publication date: May 23, 2019

Inventors: Tung D. Le, Taro Sekiyama

1 2 next