Patents by Inventor Gyeongin YU

Gyeongin YU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11934930
    Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.
    Type: Grant
    Filed: October 19, 2022
    Date of Patent: March 19, 2024
    Assignee: FRIENDLIAI INC.
    Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
  • Patent number: 11922282
    Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.
    Type: Grant
    Filed: September 19, 2022
    Date of Patent: March 5, 2024
    Assignee: FRIENDLIAI INC.
    Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
  • Patent number: 11922315
    Abstract: Solutions for adapting machine learning (ML) models to neural networks (NNs) include receiving an ML pipeline comprising a plurality of operators; determining operator dependencies within the ML pipeline; determining recognized operators; for each of at least two recognized operators, selecting a corresponding NN module from a translation dictionary; and wiring the selected NN modules in accordance with the operator dependencies to generate a translated NN. Some examples determine a starting operator for translation, which is the earliest recognized operator having parameters. Some examples connect inputs of the translated NN to upstream operators of the ML pipeline that had not been translated. Some examples further tune the translated NN using backpropagation. Some examples determine whether an operator is trainable or non-trainable and flag related parameters accordingly for later training.
    Type: Grant
    Filed: August 26, 2019
    Date of Patent: March 5, 2024
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Matteo Interlandi, Byung-Gon Chun, Markus Weimer, Gyeongin Yu, Saeed Amizadeh
  • Patent number: 11836520
    Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.
    Type: Grant
    Filed: August 4, 2022
    Date of Patent: December 5, 2023
    Assignee: FRIENDLIAI INC.
    Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
  • Publication number: 20230177401
    Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.
    Type: Application
    Filed: October 19, 2022
    Publication date: June 8, 2023
    Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
  • Publication number: 20230177399
    Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.
    Type: Application
    Filed: September 19, 2022
    Publication date: June 8, 2023
    Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
  • Publication number: 20230176903
    Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.
    Type: Application
    Filed: August 4, 2022
    Publication date: June 8, 2023
    Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
  • Patent number: 11514370
    Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.
    Type: Grant
    Filed: December 3, 2021
    Date of Patent: November 29, 2022
    Assignee: FriendliAI Inc.
    Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
  • Patent number: 11442775
    Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.
    Type: Grant
    Filed: December 3, 2021
    Date of Patent: September 13, 2022
    Assignee: FriendliAI Inc.
    Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
  • Publication number: 20220147398
    Abstract: Disclosed are a method and an electronic apparatus including an accelerator for lightweight and parallel accelerator task scheduling. The method includes pre-running a deep learning model with sample input data having a preset data form and generating a scheduling result through the pre-running.
    Type: Application
    Filed: November 12, 2021
    Publication date: May 12, 2022
    Inventors: Byung-Gon Chun, Gyeongin Yu, Woosuk Kwon
  • Publication number: 20220051104
    Abstract: Methods, systems, and computer program products are provided for generating a neural network model. A ML pipeline parser is configured to identify a set of ML operators for a previously trained ML pipeline, and map the set of ML operators to a set of neural network operators. The ML pipeline parser generates a first neural network representation using the set of neural network operators. A neural network optimizer is configured to perform an optimization on the first neural network representation to generate a second neural network representation. A tensor set provider outputs a set of tensor operations based on the second neural network representation for execution on a neural network framework. In this manner, a traditional ML pipeline can be converted into a neural network pipeline that may be executed on an appropriate framework, such as one that utilizes specialized hardware accelerators.
    Type: Application
    Filed: August 14, 2020
    Publication date: February 17, 2022
    Inventors: Matteo INTERLANDI, Markus WEIMER, Saeed AMIZADEH, Konstantinos KARANASOS, Supun Chathuranga NAKANDALA, Karla J. SAUR, Carlo Aldo CURINO, Gyeongin YU
  • Publication number: 20210065007
    Abstract: Solutions for adapting machine learning (ML) models to neural networks (NNs) include receiving an ML pipeline comprising a plurality of operators; determining operator dependencies within the ML pipeline; determining recognized operators; for each of at least two recognized operators, selecting a corresponding NN module from a translation dictionary; and wiring the selected NN modules in accordance with the operator dependencies to generate a translated NN. Some examples determine a starting operator for translation, which is the earliest recognized operator having parameters. Some examples connect inputs of the translated NN to upstream operators of the ML pipeline that had not been translated. Some examples further tune the translated NN using backpropagation. Some examples determine whether an operator is trainable or non-trainable and flag related parameters accordingly for later training.
    Type: Application
    Filed: August 26, 2019
    Publication date: March 4, 2021
    Inventors: Matteo INTERLANDI, Byung-Gon CHUN, Markus WEIMER, Gyeongin YU, Saeed AMIZADEH