Patents by Inventor Yasushi Negishi

Yasushi Negishi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Choosing execution mode of a neural network based on total memory usage

Patent number: 11880762

Abstract: A computer-implemented method, a computer program product, and a computer processing system are provided for selecting from among multiple Graphics Processing Unit (GPU) execution modes for a Neural Network (NN) having a size greater than a threshold size. The multiple GPU execution modes include a normal memory mode, an Out-of-Core (OoC) execution mode, and a Unified Memory (UM) mode. The method includes starting an execution on the NN with the UM mode and measuring the memory usage for each of layers of the NN. The method further includes selecting an execution mode based on the memory usage of all of the layers.

Type: Grant

Filed: June 26, 2018

Date of Patent: January 23, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yasushi Negishi, Haruki Imai, Taro Sekiyama, Tung D. Le, Kiyokuni Kawachiya
SHAPE AND DATA FORMAT CONVERSION FOR ACCELERATORS

Publication number: 20230153318

Abstract: A method for converting a shape and a format of tensor data to meet a specific data format of a hardware accelerator is provided. The method receives input tensors L1 and L2, each being constants having a data format of < X x Y x Z >, and each further having an n-dimension input tensor shape as <Xn x Xn-1 x Xn-2 x ... x X1 >. The method stores input tensor shape. The method calculates an n-dimension modified shape of the input tensors by (a) setting a largest divisor of (Xn x Xn-1 x...x X1 ) ? L1 to S1, (b) setting a largest divisor of ((Xn x Xn-1 x...x X1 ) / S1) ? L2 to S2, (c) setting (((Xn x Xn-1 x... x X1 ) / (S1 x S2)) to S3, and (d) returning the n-dimension modified shape as < S3 x S2 x S1 >.

Type: Application

Filed: November 16, 2021

Publication date: May 18, 2023

Inventors: YASUSHI NEGISHI, Tung D. Le, HARUKI IMAI, KIYOKUNI KAWACHIYA
Real-time resource usage reduction in artificial neural networks

Patent number: 11461637

Abstract: A generated algorithm used by a neural network is captured during execution of an iteration of the neural network. A candidate algorithm is identified based on the generated algorithm. A determination is made that the candidate algorithm utilizes less memory than the generated algorithm. Based on the determination the neural network is updated by replacing the generated algorithm with the candidate algorithm.

Type: Grant

Filed: March 6, 2019

Date of Patent: October 4, 2022

Assignee: International Business Machines Corporation

Inventors: Taro Sekiyama, Kiyokuni Kawachiya, Tung D. Le, Yasushi Negishi
ReLU compression to reduce GPU memory

Patent number: 11362670

Abstract: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.

Type: Grant

Filed: October 30, 2020

Date of Patent: June 14, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yasushi Negishi, Tung D. Le, Haruki Imai, Kiyokuni Kawachiya
DATA SWAPPING FOR NEURAL NETWORK MEMORY CONSERVATION

Publication number: 20220138580

Abstract: Methods and systems for training a neural network include identifying units within a neural network, including a first unit for memory swapping and a second unit for re-computation to balance memory efficiency with computational efficiency. Each unit includes at least one layer of the neural network. Each unit has a first layer that is a checkpoint operation. During a feed-forward training stage, feature maps are stored in a first memory. The feature maps are output by the at least one layer of the first unit. The feature maps are swapped from the first memory to a second memory. During a backpropagation stage, the feature maps for the first unit are swapped from the second memory to the first memory. Feature maps for the second unit are re-computed.

Type: Application

Filed: November 4, 2020

Publication date: May 5, 2022

Inventors: Haruki Imai, Tung D. Le, Yasushi Negishi, Kiyokuni Kawachiya
ReLU COMPRESSION TO REDUCE GPU MEMORY

Publication number: 20220140841

Abstract: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.

Type: Application

Filed: October 30, 2020

Publication date: May 5, 2022

Inventors: Yasushi Negishi, Tung D. Le, Haruki Imai, Kiyokuni Kawachiya
File system for genomic data

Patent number: 11176096

Abstract: Methods and systems for managing data redundancy include registering certified commands, input files, output files, and arguments in an execution history list after execution of said certified commands. An existing output file is provided in response to execution of a first certified command that matches an entry in the execution history list. A file is deleted if the file is reproducible from another file using a second certified command. The deleted file is registered in a reproducible file list. The deleted file is reproduced upon request using the second certified command.

Type: Grant

Filed: August 24, 2015

Date of Patent: November 16, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Megumi Ito, Yasushi Negishi, Takeshi Ogasawara
Multi-GPU deep learning using CPUs

Patent number: 11164079

Abstract: A computer-implemented method, computer program product, and computer processing system are provided for accelerating neural network data parallel training in multiple graphics processing units (GPUs) using at least one central processing unit (CPU). The method includes forming a set of chunks. Each of the chunks includes a respective group of neural network layers other than a last layer. The method further includes performing one or more chunk-wise synchronization operations during a backward phase of the neural network data parallel training, by each of the multiple GPUs and the at least one CPU.

Type: Grant

Filed: December 15, 2017

Date of Patent: November 2, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tung D. Le, Haruki Imai, Taro Sekiyama, Yasushi Negishi
Integrating multiple distributed data processing servers with different data partitioning and routing mechanisms, resource sharing policies and lifecycles into a single process

Patent number: 10984014

Abstract: A method is provided for consistent data processing by first and second distributed processing systems having different data partitioning and routing mechanisms such that the first system is without states and the second system is with states. The method includes dividing data in each system into a same number of partitions based on a same key and a same hash function. The method includes mapping partitions between the systems in a one-to-one mapping. The mapping step includes calculating a partition ID based on the hash function and a total number of partitions, and dynamically mapping a partition in the first system to a partition in the second system, responsive to the partition in the first system being unmapped to the partition in the second system.

Type: Grant

Filed: February 4, 2020

Date of Patent: April 20, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kiyokuni Kawachiya, Yasushi Negishi, Mikio Takeuchi, Gaku Yamamoto
Efficient parallel training of a network model on multiple graphics processing units

Patent number: 10949746

Abstract: A system and method provides efficient parallel training of a neural network model on multiple graphics processing units. A training module reduces the time and communication overhead of gradient accumulation and parameter updating of the network model in a neural network by overlapping processes in an advantageous way. In a described embodiment, a training module overlaps backpropagation, gradient transfer and accumulation in a Synchronous Stochastic Gradient Decent algorithm on a convolution neural network. The training module collects gradients of multiple layers during backpropagation of training from a plurality of graphics processing units (GPUs), accumulates the gradients on at least one processor and then delivers the gradients of the layers to the plurality of GPUs during the backpropagation of the training. The whole model parameters can then be updated on the GPUs after receipt of the gradient of the last layer.

Type: Grant

Filed: February 3, 2017

Date of Patent: March 16, 2021

Assignee: International Business Machines Corporation

Inventors: Imai Haruki, Tung Duc Le, Yasushi Negishi
Graph rewriting for large model support using categorized topological sort

Patent number: 10884755

Abstract: A computer-implemented method is provided for managing GPU memory consumption by computational graph rewriting. The method includes constructing, by a hardware processor, a categorized topological ordering of a computational graph. The categorized topological ordering includes multiple computational nodes arranged in multiple levels. The method further includes estimating, by the hardware processor, the GPU memory consumption responsive to a level including two or more computational nodes from among the multiple computational nodes. The method also includes rewriting, by the hardware processor, the computational graph by linearizing the two or more computational nodes in the level to avoid overlapping of the GPU memory consumption by the two or more computational nodes responsive to the GPU memory consumption exceeding a threshold. The memory additionally includes managing the GPU memory consumption in accordance with the rewritten computational graph.

Type: Grant

Filed: July 31, 2019

Date of Patent: January 5, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tung D. Le, Haruki Imai, Yasushi Negishi, Kiyokuni Kawachiya
Memory reduction for neural networks with fixed structures

Patent number: 10782897

Abstract: A method is provided for reducing consumption of a memory in a propagation process for a neural network (NN) having fixed structures for computation order and node data dependency. The memory includes memory segments for allocating to nodes. The method collects, in a NN training iteration, information for each node relating to an allocation, size, and lifetime thereof. The method chooses, responsive to the information, a first node having a maximum memory size relative to remaining nodes, and a second node non-overlapped with the first node lifetime. The method chooses another node non-overlapped with the first node lifetime, responsive to a sum of memory sizes of the second node and the other node not exceeding a first node memory size. The method reallocates a memory segment allocated to the first node to the second node and the other node to be reused by the second node and the other node.

Type: Grant

Filed: April 2, 2018

Date of Patent: September 22, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Taro Sekiyama, Haruki Imai, Jun Doi, Yasushi Negishi
INTEGRATING MULTIPLE DISTRIBUTED DATA PROCESSING SERVERS WITH DIFFERENT DATA PARTITIONING AND ROUTING MECHANISMS, RESOURCE SHARING POLICIES AND LIFECYCLES INTO A SINGLE PROCESS

Publication number: 20200174848

Abstract: A method is provided for consistent data processing by first and second distributed processing systems having different data partitioning and routing mechanisms such that the first system is without states and the second system is with states. The method includes dividing data in each system into a same number of partitions based on a same key and a same hash function. The method includes mapping partitions between the systems in a one-to-one mapping. The mapping step includes calculating a partition ID based on the hash function and a total number of partitions, and dynamically mapping a partition in the first system to a partition in the second system, responsive to the partition in the first system being unmapped to the partition in the second system.

Type: Application

Filed: February 4, 2020

Publication date: June 4, 2020

Inventors: Kiyokuni Kawachiya, Yasushi Negishi, Mikio Takeuchi, Gaku Yamamoto
Integrating multiple distributed data processing servers with different data partitioning and routing mechanisms, resource sharing policies and lifecycles into a single process

Patent number: 10613911

Abstract: A method is provided for consistent data processing by first and second distributed processing systems having different data partitioning and routing mechanisms such that the first system is without states and the second system is with states. The method includes dividing data in each system into a same number of partitions based on a same key and a same hash function. The method includes mapping partitions between the systems in a one-to-one mapping. The mapping step includes (i) checking if a partition in the first system is mapped to a partition in the second system; and (ii) calculating a partition ID based on the hash function and a total number of partitions, and dynamically mapping the partition in the first system to the partition in the second system, responsive to the partition in the first system being unmapped to the partition in the second system.

Type: Grant

Filed: January 9, 2018

Date of Patent: April 7, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kiyokuni Kawachiya, Yasushi Negishi, Mikio Takeuchi, Gaku Yamamoto
APPLICATION PERFORMANCE SIMULATOR

Publication number: 20200065214

Abstract: A computer-implemented method, system, and computer program product are provided to simulate a target system. The method includes determining system performance metrics for a target system and an execution system. The method also includes generating a ratio of estimation between the system performance metrics for the target system and the execution system. The method additionally includes throttling components in the execution system to adjust all of the system performance metrics of the execution system responsive to the ratio of estimation to create a throttled execution system. The method further includes measuring a throttled execution time while running an application on the throttled execution system. The method also includes estimating a target execution time for the application on the target system responsive to the throttled execution time.

Type: Application

Filed: August 23, 2018

Publication date: February 27, 2020

Inventors: Yasushi Negishi, Kiyokuni Kawachiya, Jun Doi
Real-time resource usage reduction in artificial neural networks

Patent number: 10558914

Abstract: A generated algorithm used by a neural network is captured during execution of an iteration of the neural network. A candidate algorithm is identified based on the generated algorithm. A determination is made that the candidate algorithm utilizes less memory than the generated algorithm. Based on the determination the neural network is updated by replacing the generated algorithm with the candidate algorithm.

Type: Grant

Filed: April 16, 2019

Date of Patent: February 11, 2020

Assignee: International Business Machines Corporation

Inventors: Taro Sekiyama, Kiyokuni Kawachiya, Tung D. Le, Yasushi Negishi
MECHANISM FOR CHOOSING EXECUTION MODE FOR LARGE NEURAL NETWORK

Publication number: 20190392306

Abstract: A computer-implemented method, a computer program product, and a computer processing system are provided for selecting from among multiple Graphics Processing Unit (GPU) execution modes for a Neural Network (NN) having a size greater than a threshold size. The multiple GPU execution modes include a normal memory mode, an Out-of-Core (OoC) execution mode, and a Unified Memory (UM) mode. The method includes starting an execution on the NN with the UM mode and measuring the memory usage for each of layers of the NN. The method further includes selecting an execution mode based on the memory usage of all of the layers.

Type: Application

Filed: June 26, 2018

Publication date: December 26, 2019

Inventors: Yasushi Negishi, Haruki Imai, Taro Sekiyama, Tung D. Le, Kiyokuni Kawachiya
ESTIMATING PERFORMANCE OF GPU APPLICATION FOR DIFFERENT GPU-LINK PERFORMANCE RATIO

Publication number: 20190325549

Abstract: A computer-implemented method is provided for estimating the performance of a GPU application on a new computing machine having an increased GPU-link performance ratio relative to a current computing machine having a current GPU-link performance ratio. The method includes adding a delay to CPU-GPU communication on the current computing machine to simulate a delayed-communication environment on the current computing machine. The method further includes executing the target GPU application in the delayed-communication environment. The method also includes measuring the performance of the target GPU application in the delayed-communication environment. The method additionally includes estimating the performance of the new computing machine having the increased higher GPU-link performance ratio, based on the measured performance of the target GPU application in the delayed-communication environment.

Type: Application

Filed: April 18, 2018

Publication date: October 24, 2019

Inventors: Kiyokuni Kawachiya, Yasushi Negishi, Jun Doi
Estimating performance of GPU application for different GPU-link performance ratio

Patent number: 10453167

Abstract: A computer-implemented method is provided for estimating the performance of a GPU application on a new computing machine having an increased GPU-link performance ratio relative to a current computing machine having a current GPU-link performance ratio. The method includes adding a delay to CPU-GPU communication on the current computing machine to simulate a delayed-communication environment on the current computing machine. The method further includes executing the target GPU application in the delayed-communication environment. The method also includes measuring the performance of the target GPU application in the delayed-communication environment. The method additionally includes estimating the performance of the new computing machine having the increased higher GPU-link performance ratio, based on the measured performance of the target GPU application in the delayed-communication environment.

Type: Grant

Filed: April 18, 2018

Date of Patent: October 22, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kiyokuni Kawachiya, Yasushi Negishi, Jun Doi
MEMORY REDUCTION FOR NEURAL NETWORKS WITH FIXED STRUCTURES

Publication number: 20190303025

Abstract: A method is provided for reducing consumption of a memory in a propagation process for a neural network (NN) having fixed structures for computation order and node data dependency. The memory includes memory segments for allocating to nodes. The method collects, in a NN training iteration, information for each node relating to an allocation, size, and lifetime thereof. The method chooses, responsive to the information, a first node having a maximum memory size relative to remaining nodes, and a second node non-overlapped with the first node lifetime. The method chooses another node non-overlapped with the first node lifetime, responsive to a sum of memory sizes of the second node and the other node not exceeding a first node memory size. The method reallocates a memory segment allocated to the first node to the second node and the other node to be reused by the second node and the other node.

Type: Application

Filed: April 2, 2018

Publication date: October 3, 2019

Inventors: Taro Sekiyama, Haruki Imai, Jun Doi, Yasushi Negishi

1 2 3 next