Patents by Inventor Pei-Hsuan HSIEH

Pei-Hsuan HSIEH has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ARTIFICIAL INTELLIGENCE INFERENCING VIA DELTA MODELS

Publication number: 20250117626

Abstract: A computing device is provided, including processor and a storage device holding instructions that are executable by the processor to implement a base artificial intelligence (AI) model and two or more delta AI models, each delta AI model having lower dimensionality than the base AI model. An inference request including an input prompt is received, the inference request specifying a selected delta AI model of the two or more delta AI models. The input prompt is input to the base AI model to thereby generate a base model result vector. The input prompt is input to the selected delta AI model to thereby generate a delta model result vector. An output vector is generated by combining the base model result vector and the delta model result vector via a combination operation. The output vector is output.

Type: Application

Filed: October 9, 2023

Publication date: April 10, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Sanjay RAMANUJAN, Ciprian CHISALITA, Pei-Hsuan HSIEH, Derek Edward HYATT, Rakesh KELKAR, Karthik RAMAN
System and Method for Token-based Graphics Processing Unit (GPU) Utilization

Publication number: 20240419493

Abstract: A method, computer program product, and computing system for processing workload data associated with processing a plurality of requests for an artificial intelligence (AI) model on a processing unit. A maximum number of key-value (KV) cache blocks available for the workload data is determined by simulating the workload data using a simulation engine. A token utilization for the workload data is determined based upon, at least in part, the maximum number of KV cache blocks available for the workload data. Processing unit resources are allocated for the processing unit based upon, at least in part, the token utilization.

Type: Application

Filed: June 14, 2023

Publication date: December 19, 2024

Inventors: Sanjay Ramanujan, Karthik Raman, Rakesh Kelkar, Kalyan Kumar Bhukya, Archit Shukla, Pei-Hsuan Hsieh
Large Artificial Intelligence Model Prediction and Capacity

Publication number: 20240411658

Abstract: This document relates to predicting performance of large artificial intelligence (LAI) models that are too large to be handled by a single computing device. One example can receive a sample workload for a trained LAI model and identify multiple nodes functioning as a cluster to instantiate an instance of the trained LAI model. The example can predict performance characteristics for accomplishing the sample workload on the cluster and can cause at least some of the predicted performance characteristics to be presented on a user interface.

Type: Application

Filed: June 9, 2023

Publication date: December 12, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Sanjay RAMANUJAN, Karthik RAMAN, Rakesh KELKAR, Pei-Hsuan HSIEH

ARTIFICIAL INTELLIGENCE INFERENCING VIA DELTA MODELS

System and Method for Token-based Graphics Processing Unit (GPU) Utilization

Large Artificial Intelligence Model Prediction and Capacity