Patents by Inventor Karthik Raman

Karthik Raman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ARTIFICIAL INTELLIGENCE INFERENCING VIA DELTA MODELS

Publication number: 20250117626

Abstract: A computing device is provided, including processor and a storage device holding instructions that are executable by the processor to implement a base artificial intelligence (AI) model and two or more delta AI models, each delta AI model having lower dimensionality than the base AI model. An inference request including an input prompt is received, the inference request specifying a selected delta AI model of the two or more delta AI models. The input prompt is input to the base AI model to thereby generate a base model result vector. The input prompt is input to the selected delta AI model to thereby generate a delta model result vector. An output vector is generated by combining the base model result vector and the delta model result vector via a combination operation. The output vector is output.

Type: Application

Filed: October 9, 2023

Publication date: April 10, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Sanjay RAMANUJAN, Ciprian CHISALITA, Pei-Hsuan HSIEH, Derek Edward HYATT, Rakesh KELKAR, Karthik RAMAN
CAPACITY-BASED LOAD BALANCING IN SHARED RESOURCE POOL

Publication number: 20250094237

Abstract: A system provides capacity-based load balancing across model endpoints of a cloud-based artificial intelligence (AI) model. The system includes a consumption determination engine executable to determine a net resource consumption for processing tasks in a workload generated by a client application for input to the trained machine learning model. The system also includes a load balancer that determines a distribution of available resource capacity in a shared resource pool comprising compute resources at each of the multiple model endpoints. The load balancer allocates parallelizable tasks of the workload among the compute resources at the multiple model endpoints based on the net resource consumption of the tasks and on the distribution of available resource capacity in the shared resource pool.

Type: Application

Filed: September 20, 2023

Publication date: March 20, 2025

Inventors: Wenbin MENG, Hemant KUMAR, Rakesh KELKAR, Karthik RAMAN, Sanjay RAMANUJAN, Kevin Joseph RIEHM, Theodore Dragov TODOROV
CONGESTION CONTROL FOR AUTOMATIC COMPUTE CAPACITY SATURATION

Publication number: 20250094240

Abstract: A disclosed method facilitates an increase in utilization with respect to a resource quota allocated to a tenant from a shared resource pool. The method includes transmitting a lease request to a quota service on behalf of the tenant, where the lease request identifies a processing task and specifies quantity of cloud-based resources requested from the shared resource pool for execution of the processing task. The method further provides for determining, based on a feedback signal received from the quota service, whether grant of the lease request would cause the tenant to exceed a resource quota allocated to the tenant and dynamically decreasing parallelism of active tasks being processed by the cloud-based resources on behalf of the tenant in response to determining that grant of the lease request would cause the tenant to exceed the resource quota limit.

Type: Application

Filed: September 20, 2023

Publication date: March 20, 2025

Inventors: Wenbin MENG, Hemant KUMAR, Rakesh KELKAR, Karthik RAMAN, Sanjay RAMANUJAN, Kevin Joseph RIEHM, Theodore Dragov TODOROV
REQUEST SEGMENTATION FOR REDUCED MEMORY CONSUMPTION BY TRAINED SEQUENTIAL MODELS

Publication number: 20250094233

Abstract: A disclosed method reduces memory consumption of a trained sequential model. The method includes receiving, from a client application, an initial processing request identifying an input sequence to be processed by the trained sequential model and an initial value for an output size parameter specifying a requested size of output from the trained sequential model. The method further includes sequentially transmitting, to the trained sequential model, multiple partial processing requests based on the initial processing request that each specify a fraction of the initial value as the output size parameter and receiving a sequence of output responses from the trained sequential model generated in response to processing the multiple partial processing requests. The method further provides for returning, to the client application, a final merged response that includes the sequence of output responses.

Type: Application

Filed: September 20, 2023

Publication date: March 20, 2025

Inventors: Wenbin MENG, Hemant KUMAR, Rakesh KELKAR, Karthik RAMAN, Sanjay RAMANUJAN, Kevin Joseph RIEHM, Theodore Dragov TODOROV
Processing large-scale textual inputs using neural networks

Patent number: 12182509

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a tuple of respective input sequences to generate an output. In one aspect, one of the systems includes a neural network comprising a plurality of encoder neural networks and a head neural network, each encoder neural network configured to: receive a respective input sequence from the tuple; process the respective input sequence using one or more encoder network layers to generate an encoded representation comprising a sequence of tokens; and process each of some or all of the tokens in the sequence of tokens using a projection layer to generate a lower-dimensional representation, and the head neural network configured to: receive lower-dimensional representations of a respective proper subset of the sequence of tokens generated by the encoder neural network; and process the lower-dimensional representations to generate the output.

Type: Grant

Filed: June 1, 2021

Date of Patent: December 31, 2024

Assignee: Google LLC

Inventors: Karthik Raman, Liu Yang, Mike Bendersky, Jiecao Chen, Marc Alexander Najork
System and Method for Token-based Graphics Processing Unit (GPU) Utilization

Publication number: 20240419493

Abstract: A method, computer program product, and computing system for processing workload data associated with processing a plurality of requests for an artificial intelligence (AI) model on a processing unit. A maximum number of key-value (KV) cache blocks available for the workload data is determined by simulating the workload data using a simulation engine. A token utilization for the workload data is determined based upon, at least in part, the maximum number of KV cache blocks available for the workload data. Processing unit resources are allocated for the processing unit based upon, at least in part, the token utilization.

Type: Application

Filed: June 14, 2023

Publication date: December 19, 2024

Inventors: Sanjay Ramanujan, Karthik Raman, Rakesh Kelkar, Kalyan Kumar Bhukya, Archit Shukla, Pei-Hsuan Hsieh
Large Artificial Intelligence Model Prediction and Capacity

Publication number: 20240411658

Abstract: This document relates to predicting performance of large artificial intelligence (LAI) models that are too large to be handled by a single computing device. One example can receive a sample workload for a trained LAI model and identify multiple nodes functioning as a cluster to instantiate an instance of the trained LAI model. The example can predict performance characteristics for accomplishing the sample workload on the cluster and can cause at least some of the predicted performance characteristics to be presented on a user interface.

Type: Application

Filed: June 9, 2023

Publication date: December 12, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Sanjay RAMANUJAN, Karthik RAMAN, Rakesh KELKAR, Pei-Hsuan HSIEH
PROCESS CHAMBER CLEAN

Publication number: 20240307930

Abstract: A method of cleaning a process chamber is provided including supplying a plasma from a remote plasma source to an interior volume of a rapid thermal processing chamber during a first time period, the rapid thermal processing chamber including a plurality of lamps configured to heat an interior volume of the rapid thermal processing chamber; and providing heat from the plurality of lamps to heat the interior volume of the rapid thermal processing chamber during the first time period when the plasma from the remote plasma source is provided to the interior volume of the rapid thermal processing chamber.

Type: Application

Filed: February 22, 2024

Publication date: September 19, 2024

Inventors: Wolfgang R. ADERHOLD, Karthik Raman SHARMA, Yi WANG
Apparatus and method for loop flattening and reduction in a single instruction multiple data (SIMD) pipeline

Patent number: 12079628

Abstract: An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions.

Type: Grant

Filed: October 4, 2021

Date of Patent: September 3, 2024

Assignee: Intel Corporation

Inventors: William M. Brown, Roland Schulz, Karthik Raman
Method for Training Large Language Models to Perform Query Intent Classification

Publication number: 20240232637

Abstract: Provided are computing systems, methods, and platforms that train query processing models, such as large language models, to perform query intent classification tasks by using retrieval augmentation and multi-stage distillation. Unlabeled training examples of queries may be obtained, and a set of the training examples may be augmented with additional feature annotations to generate augmented training examples. A first query processing model may annotate the retrieval augmented queries to generate inferred labels for the augmented training examples. A second query processing model may be trained on the inferred labels, distilling the query processing model that was trained with retrieval augmentation into a non-retrieval augmented query processing model. The second query processing model may annotate the entire set of unlabeled training examples. Another stage of distillation may train a third query processing model using the entire set of unlabeled training examples without retrieval augmentation.

Type: Application

Filed: October 23, 2023

Publication date: July 11, 2024

Inventors: Krishna Pragash Srinivasan, Michael Bendersky, Anupam Samanta, Lingrui Liao, Luca Bertelli, Ming-Wei Chang, Iftekhar Naim, Siddhartha Brahma, Siamak Shakeri, Hongkun Yu, John Nham, Karthik Raman, Raphael Dominik Hoffmann
LOAD TESTING AND PERFORMANCE BENCHMARKING FOR LARGE LANGUAGE MODELS USING A CLOUD COMPUTING PLATFORM

Publication number: 20240143414

Abstract: The techniques disclosed herein enable systems to perform repeatable and iterative load testing and performance benchmarking for artificial intelligence models deployed in a cloud computing environment. This is achieved by utilizing load profiles and representative workloads generated based on the load profiles to evaluate an artificial intelligence model under various workload contexts. The representative workload is then executed by the artificial intelligence model utilizing available computing infrastructure. Performance metrics are extracted from the execution and analyzed to provide insight into various performance dynamics such as the relationship between latency and data throughput. In addition, load profiles and input datasets are dynamically adjusted to evaluate different scenarios and use cases enabling the system to automatically test the artificial intelligence model across diverse applications.

Type: Application

Filed: October 27, 2022

Publication date: May 2, 2024

Inventors: Sanjay RAMANUJAN, Rakesh KELKAR, Hari Krishnan SRINIVASAN, Karthik RAMAN, Hema Vishnu POLA, Sagar TANEJA, Mradul KARMODIYA
Method for Training Large Language Models to Perform Query Intent Classification

Publication number: 20240135187

Abstract: Provided are computing systems, methods, and platforms that train query processing models, such as large language models, to perform query intent classification tasks by using retrieval augmentation and multi-stage distillation. Unlabeled training examples of queries may be obtained, and a set of the training examples may be augmented with additional feature annotations to generate augmented training examples. A first query processing model may annotate the retrieval augmented queries to generate inferred labels for the augmented training examples. A second query processing model may be trained on the inferred labels, distilling the query processing model that was trained with retrieval augmentation into a non-retrieval augmented query processing model. The second query processing model may annotate the entire set of unlabeled training examples. Another stage of distillation may train a third query processing model using the entire set of unlabeled training examples without retrieval augmentation.

Type: Application

Filed: October 22, 2023

Publication date: April 25, 2024

Inventors: Krishna Pragash Srinivasan, Michael Bendersky, Anupam Samanta, Lingrui Liao, Luca Bertelli, Ming-Wei Chang, Iftekhar Naim, Siddhartha Brahma, Siamak Shakeri, Hongkun Yu, John Nham, Karthik Raman, Raphael Dominik Hoffmann
Distributed Encryption Management

Publication number: 20240137388

Abstract: Customers of a software platform, such as a unified communications as a service platform, are enabled to control their own encryption keys used to encrypt and decrypt data from various communication services in the software platform. A key broker server is employed to map encryption and decryption requests from servers in the platform to key management servers of customers based on user identifiers. Examples of data encrypted may include conference recordings, webinar recordings, phone call recordings, voicemails, emails, and calendar tokens.

Type: Application

Filed: January 18, 2023

Publication date: April 25, 2024

Inventors: John Carl Kennedy, Prasanna Kumar Malaiyandi, Martin Josef Pagel, Karthik Raman, Jan Zila
Distributed Decryption Request Handling

Publication number: 20240137211

Abstract: Customers of a software platform, such as a unified communications as a service platform, are enabled to control their own encryption keys used to encrypt and decrypt data from various communication services in the software platform. A key broker server is employed to map encryption and decryption requests from servers in the platform to key management servers of customers based on user identifiers. Examples of data encrypted may include conference recordings, webinar recordings, phone call recordings, voicemails, emails, and calendar tokens.

Type: Application

Filed: January 18, 2023

Publication date: April 25, 2024

Inventors: John Carl Kennedy, Prasanna Kumar Malaiyandi, Martin Josef Pagel, Karthik Raman, Jan Zila
Corrective Reward Optimization for Sequential Labeling

Publication number: 20240070456

Abstract: Provided are systems and methods for corrective reward optimization for generative sequential labeling. In particular, example aspects of the present disclosure are directed to an effective framework for generative reward optimization of text (or other) data sequences, certain example implementations of which can be referred to as “GROOT”. Example implementations of the proposed framework work by training a generative sequential labeling model to match the decoder output distribution with that of the (possibly black-box) reward function. Using an iterative training regime, the framework can first generate prediction candidates and then correct errors in the candidate. Finally, a loss function can be used that contrasts those candidates based on their reward values (e.g., as measured by a reward function that encodes the specific objectives for a particular setting or application).

Type: Application

Filed: August 31, 2023

Publication date: February 29, 2024

Inventors: Karthik Raman, Kazuma Hashimoto
Distributed Encryption Key Allocation

Publication number: 20230078187

Abstract: Customers of a software platform, such as a unified communications as a service platform, are enabled to control their own encryption keys used to encrypt and decrypt data from various communication services in the software platform. A key broker server is employed to map encryption and decryption requests from servers in the platform to key management servers of customers based on user identifiers. Examples of data encrypted may includes conference recordings, webinar recordings, phone call recordings, voicemails, emails, and calendar tokens.

Type: Application

Filed: September 12, 2021

Publication date: March 16, 2023

Inventors: John Kennedy, Prasanna Kumar Malaiyandi, Karthik Raman, Jan Zila
MULTI-MASTER ARCHITECTURES FOR DISTRIBUTED DATABASES

Publication number: 20220335034

Abstract: Data services for workloads are often provided with a service level agreement specifying various performance guarantees (e.g., latency, availability, scalability, and consistency). Single-master architectures, in which updates t the data set are constrained to a single server, may limit the fulfillment of some performance guarantees. Presented herein are multi-master architectures, in which the server set is partitioned into at least two masters are permitted to update the data set and at least one non-master that is not permitted to update the data set. Non-masters that receive a request to update the data set forward the request to a master server for application to the data set. A master that receives the request applies it to the data set and propagates the update to other master and non-master servers. Conflicting updates may be resolved through a variety of conflict resolution techniques, optionally designating one master server as a conflict resolution server.

Type: Application

Filed: June 30, 2022

Publication date: October 20, 2022

Inventors: Karthik RAMAN, Momin Mahmoud AL-GHOSHIEN, Bhalakumaaran ERODE RANGANATHAN, Madhan GAJENDRAN, Ji HUANG, Atul KATIYAR, Mikhail Mikhailovich KOLTACHEV, Sujit Vattathil KURUVILLA, Digvijaysinh Govindbhai MAKWANA, Subramanyam PATTIPAKA, Ovidiu Constantin PLATON, Ankur Savailal SHAH, Pankaj SHARMA, Dharma SHUKLA, Shreshth SINGHAL, Shireesh Kumar THOTA
Structured machine learning for improved whole-structure relevance of informational displays

Patent number: 11475290

Abstract: The present disclosure provides systems and methods that use machine learning to improve whole-structure relevance of hierarchical informational displays. In particular, the present disclosure provides systems and methods that employ a supervised, discriminative machine learning approach to jointly optimize the ranking of items and their display attributes. One example system includes a machine-learned display selection model that has been trained to jointly select a plurality of items and one or more attributes for each item for inclusion in an informational display. For example, the machine-learned display selection model can optimize a nested submodular objective function to jointly select the items and attributes.

Type: Grant

Filed: December 30, 2016

Date of Patent: October 18, 2022

Assignee: GOOGLE LLC

Inventors: Jeffrey Jon Dalton, Karthik Raman, Tobias Schnabel, Evgeniy Gabrilovich
Controller for bridging database architectures

Patent number: 11474846

Abstract: A method of bridging a first database and a second database. The method includes maintaining a state machine representing a state of a virtual node in the first database, wherein the state of the virtual node conforms to a native protocol for native nodes of the first database, said native protocol of the first database differing from a foreign protocol of the second database. The method further includes receiving an incoming message for the virtual node from one of the native nodes according to the native protocol, and based on the incoming message, accessing the second database. The method further includes updating the state of the virtual node based on the incoming message according to the native protocol, and based on the state of the virtual node as updated, sending an outgoing message to one or more of the native nodes according to the native protocol.

Type: Grant

Filed: July 11, 2019

Date of Patent: October 18, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Willis Lang, Karthik Raman
Merging conflict resolution for multi-master distributed databases

Patent number: 11397721

Abstract: A server set for a data set may designate a subset of “master” servers that update the data set in order to reduce data version conflicts involving mutually exclusive updates of the data set. Multi-master configurations may fulfill the performance constraints, and the subset of masters may detect and resolve data version conflicts. However, if multiple masters perform conflict resolution for a particular data version conflict, the resolution may produce inefficiency and redundancy (if the masters reach the same outcome) or additional data version conflicts (if the masters reach different outcomes). Instead, among the masters, a merge master may be identified that applies conflict resolution techniques to data version conflicts and forwards the conflict resolution outcome to the other masters for application to the data set to resolve the data version conflict. The other masters may temporarily store updates in a tentative update set until data version conflicts are resolved.

Type: Grant

Filed: December 4, 2018

Date of Patent: July 26, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Karthik Raman, Momin Mahmoud Al-Ghosien, Bhalakumaaran Erode Ranganathan, Madhan Gajendran, Ji Huang, Atul Katiyar, Mikhail Mikhailovich Koltachev, Sujit Vattathil Kuruvilla, Digvijaysinh Govindbhai Makwana, Subramanyam Pattipaka, Ovidiu Constantin Platon, Ankur Savailal Shah, Pankaj Sharma, Dharma Shukla, Shreshth Singhal, Shireesh Kumar Thota

1 2 3 4 5 … next