Patents Assigned to FriendliAI Inc.

Selective batching for inference system for transformer-based generation tasks

Patent number: 11934930

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: October 19, 2022

Date of Patent: March 19, 2024

Assignee: FRIENDLIAI INC.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Selective batching for inference system for transformer-based generation tasks

Patent number: 11922282

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: September 19, 2022

Date of Patent: March 5, 2024

Assignee: FRIENDLIAI INC.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Dynamic batching for inference system for transformer-based generation tasks

Patent number: 11836520

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal sate length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: August 4, 2022

Date of Patent: December 5, 2023

Assignee: FRIENDLIAI INC.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Selective batching for inference system for transformer-based generation tasks

Patent number: 11514370

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: December 3, 2021

Date of Patent: November 29, 2022

Assignee: FriendliAI Inc.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun
Dynamic batching for inference system for transformer-based generation tasks

Patent number: 11442775

Abstract: An inference system applies a machine-learning transformer model to a batch of requests with variable input length or variable target length or variable internal state length by selectively batching a subset of operations in the transformer model but processing requests in the batch individually for a subset of operations in the transformer model. In one embodiment, the operation to be processed individually is an attention operation of an encoder or a decoder of the transformer model. By selective batching, the inference system can allow batching operations to be performed for a batch of requests with variable input or target length or internal state length to utilize the parallel computation capabilities of hardware accelerators while preventing unnecessary computations that occur for workarounds that restrain the data of a batch of requests to a same length.

Type: Grant

Filed: December 3, 2021

Date of Patent: September 13, 2022

Assignee: FriendliAI Inc.

Inventors: Gyeongin Yu, Geon-Woo Kim, Joo Seong Jeong, Soojeong Kim, Byung-Gon Chun

Selective batching for inference system for transformer-based generation tasks

Selective batching for inference system for transformer-based generation tasks

Dynamic batching for inference system for transformer-based generation tasks

Selective batching for inference system for transformer-based generation tasks

Dynamic batching for inference system for transformer-based generation tasks