Patents by Inventor Gyeongin YU

Gyeongin YU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Selective batching for inference system for transformer-based generation tasks

Patent number: 11934930

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: October 19, 2022

Date of Patent: March 19, 2024

Assignee: FRIENDLIAI INC.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Selective batching for inference system for transformer-based generation tasks

Patent number: 11922282

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: September 19, 2022

Date of Patent: March 5, 2024

Assignee: FRIENDLIAI INC.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Neural adapter for classical machine learning (ML) models

Patent number: 11922315

Abstract: Solutions for adapting machine learning (ML) models to neural networks (NNs) include receiving an ML pipeline comprising a plurality of operators; determining operator dependencies within the ML pipeline; determining recognized operators; for each of at least two recognized operators, selecting a corresponding NN module from a translation dictionary; and wiring the selected NN modules in accordance with the operator dependencies to generate a translated NN. Some examples determine a starting operator for translation, which is the earliest recognized operator having parameters. Some examples connect inputs of the translated NN to upstream operators of the ML pipeline that had not been translated. Some examples further tune the translated NN using backpropagation. Some examples determine whether an operator is trainable or non-trainable and flag related parameters accordingly for later training.

Type: Grant

Filed: August 26, 2019

Date of Patent: March 5, 2024

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Matteo Interlandi, Byung-Gon Chun, Markus Weimer, Gyeongin Yu, Saeed Amizadeh
Dynamic batching for inference system for transformer-based generation tasks

Patent number: 11836520

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: August 4, 2022

Date of Patent: December 5, 2023

Assignee: FRIENDLIAI INC.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Selective Batching for Inference System for Transformer-Based Generation Tasks

Publication number: 20230177401

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Application

Filed: October 19, 2022

Publication date: June 8, 2023

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
SELECTIVE BATCHING FOR INFERENCE SYSTEM FOR TRANSFORMER-BASED GENERATION TASKS

Publication number: 20230177399

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Application

Filed: September 19, 2022

Publication date: June 8, 2023

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
DYNAMIC BATCHING FOR INFERENCE SYSTEM FOR TRANSFORMER-BASED GENERATION TASKS

Publication number: 20230176903

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Application

Filed: August 4, 2022

Publication date: June 8, 2023

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Selective batching for inference system for transformer-based generation tasks

Patent number: 11514370

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: December 3, 2021

Date of Patent: November 29, 2022

Assignee: FriendliAI Inc.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Dynamic batching for inference system for transformer-based generation tasks

Patent number: 11442775

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: December 3, 2021

Date of Patent: September 13, 2022

Assignee: FriendliAI Inc.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
METHOD AND APPARATUS FOR LIGHTWEIGHT AND PARALLELIZATION OF ACCELERATOR TASK SCHEDULING

Publication number: 20220147398

Abstract: Disclosed are a method and an electronic apparatus including an accelerator for lightweight and parallel accelerator task scheduling. The method includes pre-running a deep learning model with sample input data having a preset data form and generating a scheduling result through the pre-running.

Type: Application

Filed: November 12, 2021

Publication date: May 12, 2022

Inventors: Byung-Gon Chun, Gyeongin Yu, Woosuk Kwon
ACCELERATING INFERENCE OF TRADITIONAL ML PIPELINES WITH NEURAL NETWORK FRAMEWORKS

Publication number: 20220051104

Abstract: Methods, systems, and computer program products are provided for generating a neural network model. A ML pipeline parser is configured to identify a set of ML operators for a previously trained ML pipeline, and map the set of ML operators to a set of neural network operators. The ML pipeline parser generates a first neural network representation using the set of neural network operators. A neural network optimizer is configured to perform an optimization on the first neural network representation to generate a second neural network representation. A tensor set provider outputs a set of tensor operations based on the second neural network representation for execution on a neural network framework. In this manner, a traditional ML pipeline can be converted into a neural network pipeline that may be executed on an appropriate framework, such as one that utilizes specialized hardware accelerators.

Type: Application

Filed: August 14, 2020

Publication date: February 17, 2022

Inventors: Matteo INTERLANDI, Markus WEIMER, Saeed AMIZADEH, Konstantinos KARANASOS, Supun Chathuranga NAKANDALA, Karla J. SAUR, Carlo Aldo CURINO, Gyeongin YU
NEURAL ADAPTER FOR CLASSICAL MACHINE LEARNING (ML) MODELS

Publication number: 20210065007

Abstract: Solutions for adapting machine learning (ML) models to neural networks (NNs) include receiving an ML pipeline comprising a plurality of operators; determining operator dependencies within the ML pipeline; determining recognized operators; for each of at least two recognized operators, selecting a corresponding NN module from a translation dictionary; and wiring the selected NN modules in accordance with the operator dependencies to generate a translated NN. Some examples determine a starting operator for translation, which is the earliest recognized operator having parameters. Some examples connect inputs of the translated NN to upstream operators of the ML pipeline that had not been translated. Some examples further tune the translated NN using backpropagation. Some examples determine whether an operator is trainable or non-trainable and flag related parameters accordingly for later training.

Type: Application

Filed: August 26, 2019

Publication date: March 4, 2021

Inventors: Matteo INTERLANDI, Byung-Gon CHUN, Markus WEIMER, Gyeongin YU, Saeed AMIZADEH