Patents by Inventor Karthik Raman

Karthik Raman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250117626
    Abstract: A computing device is provided, including processor and a storage device holding instructions that are executable by the processor to implement a base artificial intelligence (AI) model and two or more delta AI models, each delta AI model having lower dimensionality than the base AI model. An inference request including an input prompt is received, the inference request specifying a selected delta AI model of the two or more delta AI models. The input prompt is input to the base AI model to thereby generate a base model result vector. The input prompt is input to the selected delta AI model to thereby generate a delta model result vector. An output vector is generated by combining the base model result vector and the delta model result vector via a combination operation. The output vector is output.
    Type: Application
    Filed: October 9, 2023
    Publication date: April 10, 2025
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Sanjay RAMANUJAN, Ciprian CHISALITA, Pei-Hsuan HSIEH, Derek Edward HYATT, Rakesh KELKAR, Karthik RAMAN
  • Publication number: 20250094237
    Abstract: A system provides capacity-based load balancing across model endpoints of a cloud-based artificial intelligence (AI) model. The system includes a consumption determination engine executable to determine a net resource consumption for processing tasks in a workload generated by a client application for input to the trained machine learning model. The system also includes a load balancer that determines a distribution of available resource capacity in a shared resource pool comprising compute resources at each of the multiple model endpoints. The load balancer allocates parallelizable tasks of the workload among the compute resources at the multiple model endpoints based on the net resource consumption of the tasks and on the distribution of available resource capacity in the shared resource pool.
    Type: Application
    Filed: September 20, 2023
    Publication date: March 20, 2025
    Inventors: Wenbin MENG, Hemant KUMAR, Rakesh KELKAR, Karthik RAMAN, Sanjay RAMANUJAN, Kevin Joseph RIEHM, Theodore Dragov TODOROV
  • Publication number: 20250094240
    Abstract: A disclosed method facilitates an increase in utilization with respect to a resource quota allocated to a tenant from a shared resource pool. The method includes transmitting a lease request to a quota service on behalf of the tenant, where the lease request identifies a processing task and specifies quantity of cloud-based resources requested from the shared resource pool for execution of the processing task. The method further provides for determining, based on a feedback signal received from the quota service, whether grant of the lease request would cause the tenant to exceed a resource quota allocated to the tenant and dynamically decreasing parallelism of active tasks being processed by the cloud-based resources on behalf of the tenant in response to determining that grant of the lease request would cause the tenant to exceed the resource quota limit.
    Type: Application
    Filed: September 20, 2023
    Publication date: March 20, 2025
    Inventors: Wenbin MENG, Hemant KUMAR, Rakesh KELKAR, Karthik RAMAN, Sanjay RAMANUJAN, Kevin Joseph RIEHM, Theodore Dragov TODOROV
  • Publication number: 20250094233
    Abstract: A disclosed method reduces memory consumption of a trained sequential model. The method includes receiving, from a client application, an initial processing request identifying an input sequence to be processed by the trained sequential model and an initial value for an output size parameter specifying a requested size of output from the trained sequential model. The method further includes sequentially transmitting, to the trained sequential model, multiple partial processing requests based on the initial processing request that each specify a fraction of the initial value as the output size parameter and receiving a sequence of output responses from the trained sequential model generated in response to processing the multiple partial processing requests. The method further provides for returning, to the client application, a final merged response that includes the sequence of output responses.
    Type: Application
    Filed: September 20, 2023
    Publication date: March 20, 2025
    Inventors: Wenbin MENG, Hemant KUMAR, Rakesh KELKAR, Karthik RAMAN, Sanjay RAMANUJAN, Kevin Joseph RIEHM, Theodore Dragov TODOROV
  • Patent number: 12182509
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a tuple of respective input sequences to generate an output. In one aspect, one of the systems includes a neural network comprising a plurality of encoder neural networks and a head neural network, each encoder neural network configured to: receive a respective input sequence from the tuple; process the respective input sequence using one or more encoder network layers to generate an encoded representation comprising a sequence of tokens; and process each of some or all of the tokens in the sequence of tokens using a projection layer to generate a lower-dimensional representation, and the head neural network configured to: receive lower-dimensional representations of a respective proper subset of the sequence of tokens generated by the encoder neural network; and process the lower-dimensional representations to generate the output.
    Type: Grant
    Filed: June 1, 2021
    Date of Patent: December 31, 2024
    Assignee: Google LLC
    Inventors: Karthik Raman, Liu Yang, Mike Bendersky, Jiecao Chen, Marc Alexander Najork
  • Publication number: 20240419493
    Abstract: A method, computer program product, and computing system for processing workload data associated with processing a plurality of requests for an artificial intelligence (AI) model on a processing unit. A maximum number of key-value (KV) cache blocks available for the workload data is determined by simulating the workload data using a simulation engine. A token utilization for the workload data is determined based upon, at least in part, the maximum number of KV cache blocks available for the workload data. Processing unit resources are allocated for the processing unit based upon, at least in part, the token utilization.
    Type: Application
    Filed: June 14, 2023
    Publication date: December 19, 2024
    Inventors: Sanjay Ramanujan, Karthik Raman, Rakesh Kelkar, Kalyan Kumar Bhukya, Archit Shukla, Pei-Hsuan Hsieh
  • Publication number: 20240411658
    Abstract: This document relates to predicting performance of large artificial intelligence (LAI) models that are too large to be handled by a single computing device. One example can receive a sample workload for a trained LAI model and identify multiple nodes functioning as a cluster to instantiate an instance of the trained LAI model. The example can predict performance characteristics for accomplishing the sample workload on the cluster and can cause at least some of the predicted performance characteristics to be presented on a user interface.
    Type: Application
    Filed: June 9, 2023
    Publication date: December 12, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Sanjay RAMANUJAN, Karthik RAMAN, Rakesh KELKAR, Pei-Hsuan HSIEH
  • Publication number: 20240307930
    Abstract: A method of cleaning a process chamber is provided including supplying a plasma from a remote plasma source to an interior volume of a rapid thermal processing chamber during a first time period, the rapid thermal processing chamber including a plurality of lamps configured to heat an interior volume of the rapid thermal processing chamber; and providing heat from the plurality of lamps to heat the interior volume of the rapid thermal processing chamber during the first time period when the plasma from the remote plasma source is provided to the interior volume of the rapid thermal processing chamber.
    Type: Application
    Filed: February 22, 2024
    Publication date: September 19, 2024
    Inventors: Wolfgang R. ADERHOLD, Karthik Raman SHARMA, Yi WANG
  • Patent number: 12079628
    Abstract: An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions.
    Type: Grant
    Filed: October 4, 2021
    Date of Patent: September 3, 2024
    Assignee: Intel Corporation
    Inventors: William M. Brown, Roland Schulz, Karthik Raman
  • Publication number: 20240232637
    Abstract: Provided are computing systems, methods, and platforms that train query processing models, such as large language models, to perform query intent classification tasks by using retrieval augmentation and multi-stage distillation. Unlabeled training examples of queries may be obtained, and a set of the training examples may be augmented with additional feature annotations to generate augmented training examples. A first query processing model may annotate the retrieval augmented queries to generate inferred labels for the augmented training examples. A second query processing model may be trained on the inferred labels, distilling the query processing model that was trained with retrieval augmentation into a non-retrieval augmented query processing model. The second query processing model may annotate the entire set of unlabeled training examples. Another stage of distillation may train a third query processing model using the entire set of unlabeled training examples without retrieval augmentation.
    Type: Application
    Filed: October 23, 2023
    Publication date: July 11, 2024
    Inventors: Krishna Pragash Srinivasan, Michael Bendersky, Anupam Samanta, Lingrui Liao, Luca Bertelli, Ming-Wei Chang, Iftekhar Naim, Siddhartha Brahma, Siamak Shakeri, Hongkun Yu, John Nham, Karthik Raman, Raphael Dominik Hoffmann
  • Publication number: 20240143414
    Abstract: The techniques disclosed herein enable systems to perform repeatable and iterative load testing and performance benchmarking for artificial intelligence models deployed in a cloud computing environment. This is achieved by utilizing load profiles and representative workloads generated based on the load profiles to evaluate an artificial intelligence model under various workload contexts. The representative workload is then executed by the artificial intelligence model utilizing available computing infrastructure. Performance metrics are extracted from the execution and analyzed to provide insight into various performance dynamics such as the relationship between latency and data throughput. In addition, load profiles and input datasets are dynamically adjusted to evaluate different scenarios and use cases enabling the system to automatically test the artificial intelligence model across diverse applications.
    Type: Application
    Filed: October 27, 2022
    Publication date: May 2, 2024
    Inventors: Sanjay RAMANUJAN, Rakesh KELKAR, Hari Krishnan SRINIVASAN, Karthik RAMAN, Hema Vishnu POLA, Sagar TANEJA, Mradul KARMODIYA
  • Publication number: 20240135187
    Abstract: Provided are computing systems, methods, and platforms that train query processing models, such as large language models, to perform query intent classification tasks by using retrieval augmentation and multi-stage distillation. Unlabeled training examples of queries may be obtained, and a set of the training examples may be augmented with additional feature annotations to generate augmented training examples. A first query processing model may annotate the retrieval augmented queries to generate inferred labels for the augmented training examples. A second query processing model may be trained on the inferred labels, distilling the query processing model that was trained with retrieval augmentation into a non-retrieval augmented query processing model. The second query processing model may annotate the entire set of unlabeled training examples. Another stage of distillation may train a third query processing model using the entire set of unlabeled training examples without retrieval augmentation.
    Type: Application
    Filed: October 22, 2023
    Publication date: April 25, 2024
    Inventors: Krishna Pragash Srinivasan, Michael Bendersky, Anupam Samanta, Lingrui Liao, Luca Bertelli, Ming-Wei Chang, Iftekhar Naim, Siddhartha Brahma, Siamak Shakeri, Hongkun Yu, John Nham, Karthik Raman, Raphael Dominik Hoffmann
  • Publication number: 20240137388
    Abstract: Customers of a software platform, such as a unified communications as a service platform, are enabled to control their own encryption keys used to encrypt and decrypt data from various communication services in the software platform. A key broker server is employed to map encryption and decryption requests from servers in the platform to key management servers of customers based on user identifiers. Examples of data encrypted may include conference recordings, webinar recordings, phone call recordings, voicemails, emails, and calendar tokens.
    Type: Application
    Filed: January 18, 2023
    Publication date: April 25, 2024
    Inventors: John Carl Kennedy, Prasanna Kumar Malaiyandi, Martin Josef Pagel, Karthik Raman, Jan Zila
  • Publication number: 20240137211
    Abstract: Customers of a software platform, such as a unified communications as a service platform, are enabled to control their own encryption keys used to encrypt and decrypt data from various communication services in the software platform. A key broker server is employed to map encryption and decryption requests from servers in the platform to key management servers of customers based on user identifiers. Examples of data encrypted may include conference recordings, webinar recordings, phone call recordings, voicemails, emails, and calendar tokens.
    Type: Application
    Filed: January 18, 2023
    Publication date: April 25, 2024
    Inventors: John Carl Kennedy, Prasanna Kumar Malaiyandi, Martin Josef Pagel, Karthik Raman, Jan Zila
  • Publication number: 20240070456
    Abstract: Provided are systems and methods for corrective reward optimization for generative sequential labeling. In particular, example aspects of the present disclosure are directed to an effective framework for generative reward optimization of text (or other) data sequences, certain example implementations of which can be referred to as “GROOT”. Example implementations of the proposed framework work by training a generative sequential labeling model to match the decoder output distribution with that of the (possibly black-box) reward function. Using an iterative training regime, the framework can first generate prediction candidates and then correct errors in the candidate. Finally, a loss function can be used that contrasts those candidates based on their reward values (e.g., as measured by a reward function that encodes the specific objectives for a particular setting or application).
    Type: Application
    Filed: August 31, 2023
    Publication date: February 29, 2024
    Inventors: Karthik Raman, Kazuma Hashimoto
  • Publication number: 20230078187
    Abstract: Customers of a software platform, such as a unified communications as a service platform, are enabled to control their own encryption keys used to encrypt and decrypt data from various communication services in the software platform. A key broker server is employed to map encryption and decryption requests from servers in the platform to key management servers of customers based on user identifiers. Examples of data encrypted may includes conference recordings, webinar recordings, phone call recordings, voicemails, emails, and calendar tokens.
    Type: Application
    Filed: September 12, 2021
    Publication date: March 16, 2023
    Inventors: John Kennedy, Prasanna Kumar Malaiyandi, Karthik Raman, Jan Zila
  • Publication number: 20220335034
    Abstract: Data services for workloads are often provided with a service level agreement specifying various performance guarantees (e.g., latency, availability, scalability, and consistency). Single-master architectures, in which updates t the data set are constrained to a single server, may limit the fulfillment of some performance guarantees. Presented herein are multi-master architectures, in which the server set is partitioned into at least two masters are permitted to update the data set and at least one non-master that is not permitted to update the data set. Non-masters that receive a request to update the data set forward the request to a master server for application to the data set. A master that receives the request applies it to the data set and propagates the update to other master and non-master servers. Conflicting updates may be resolved through a variety of conflict resolution techniques, optionally designating one master server as a conflict resolution server.
    Type: Application
    Filed: June 30, 2022
    Publication date: October 20, 2022
    Inventors: Karthik RAMAN, Momin Mahmoud AL-GHOSHIEN, Bhalakumaaran ERODE RANGANATHAN, Madhan GAJENDRAN, Ji HUANG, Atul KATIYAR, Mikhail Mikhailovich KOLTACHEV, Sujit Vattathil KURUVILLA, Digvijaysinh Govindbhai MAKWANA, Subramanyam PATTIPAKA, Ovidiu Constantin PLATON, Ankur Savailal SHAH, Pankaj SHARMA, Dharma SHUKLA, Shreshth SINGHAL, Shireesh Kumar THOTA
  • Patent number: 11475290
    Abstract: The present disclosure provides systems and methods that use machine learning to improve whole-structure relevance of hierarchical informational displays. In particular, the present disclosure provides systems and methods that employ a supervised, discriminative machine learning approach to jointly optimize the ranking of items and their display attributes. One example system includes a machine-learned display selection model that has been trained to jointly select a plurality of items and one or more attributes for each item for inclusion in an informational display. For example, the machine-learned display selection model can optimize a nested submodular objective function to jointly select the items and attributes.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: October 18, 2022
    Assignee: GOOGLE LLC
    Inventors: Jeffrey Jon Dalton, Karthik Raman, Tobias Schnabel, Evgeniy Gabrilovich
  • Patent number: 11474846
    Abstract: A method of bridging a first database and a second database. The method includes maintaining a state machine representing a state of a virtual node in the first database, wherein the state of the virtual node conforms to a native protocol for native nodes of the first database, said native protocol of the first database differing from a foreign protocol of the second database. The method further includes receiving an incoming message for the virtual node from one of the native nodes according to the native protocol, and based on the incoming message, accessing the second database. The method further includes updating the state of the virtual node based on the incoming message according to the native protocol, and based on the state of the virtual node as updated, sending an outgoing message to one or more of the native nodes according to the native protocol.
    Type: Grant
    Filed: July 11, 2019
    Date of Patent: October 18, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Willis Lang, Karthik Raman
  • Patent number: 11397721
    Abstract: A server set for a data set may designate a subset of “master” servers that update the data set in order to reduce data version conflicts involving mutually exclusive updates of the data set. Multi-master configurations may fulfill the performance constraints, and the subset of masters may detect and resolve data version conflicts. However, if multiple masters perform conflict resolution for a particular data version conflict, the resolution may produce inefficiency and redundancy (if the masters reach the same outcome) or additional data version conflicts (if the masters reach different outcomes). Instead, among the masters, a merge master may be identified that applies conflict resolution techniques to data version conflicts and forwards the conflict resolution outcome to the other masters for application to the data set to resolve the data version conflict. The other masters may temporarily store updates in a tentative update set until data version conflicts are resolved.
    Type: Grant
    Filed: December 4, 2018
    Date of Patent: July 26, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Karthik Raman, Momin Mahmoud Al-Ghosien, Bhalakumaaran Erode Ranganathan, Madhan Gajendran, Ji Huang, Atul Katiyar, Mikhail Mikhailovich Koltachev, Sujit Vattathil Kuruvilla, Digvijaysinh Govindbhai Makwana, Subramanyam Pattipaka, Ovidiu Constantin Platon, Ankur Savailal Shah, Pankaj Sharma, Dharma Shukla, Shreshth Singhal, Shireesh Kumar Thota