Patents by Inventor KARTHIK MOHAN KUMAR

KARTHIK MOHAN KUMAR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MULTIMODAL PERSONA CONFIGURATION FOR NON-PLAYABLE CHARACTERS

Publication number: 20250111578

Abstract: Methods and systems are provided for generating a stylized representation of a non-player character (NPC) in a virtual environment. A multimodal plurality of inputs regarding characteristics of the NPC is received, which is processed to generate visual data representing the NPC's appearance and to generate behavior data representing the NPC's actions. The generated visual data and behavior data are adapted to a selected character model to create an adapted configuration model, which is used to generate rendering information for the NPC.

Type: Application

Filed: September 27, 2024

Publication date: April 3, 2025

Inventors: Karthik Mohan Kumar, Archana Ramalingam, Michael Mantor, Pedro Antonio Pena
FUSED MULTIMODAL FRAMEWORK FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION

Publication number: 20240424398

Abstract: Systems and techniques for generating and animating non-player characters (NPCs) within virtual digital environments are provided. Multimodal input data is received that comprises a plurality of input modalities for interaction with an NPC having a set of body features and a set of facial features. The multimodal input data is processed through one or more neural networks to generate animation sequences for both the body features and facial features of the NPC. Generating such animation sequences includes disentangling the multimodal input data to generate substantially disentangled latent representations, combining these representations with the multimodal input data, and using a large-language model (LLM) to generate speech data for the NPC. Further processing using reverse diffusion generates face vertex displacement data and joint trajectory data based on the combined representation and generated speech data.

Type: Application

Filed: June 20, 2024

Publication date: December 26, 2024

Inventors: Karthik Mohan Kumar, Michael Mantor, Pedro Antonio Pena, Archana Ramalingam
MULTIMODAL CONTEXTUALIZER FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION

Publication number: 20240428494

Abstract: Systems and techniques for generating and animating non-player characters (NPCs) within virtual digital environments are provided. Multimodal input data is received that comprises a plurality of input modalities for interaction with an NPC having a set of body features and a set of facial features. The multimodal input data is processed through one or more neural networks to generate animation sequences for both the body features and facial features of the NPC. Generating such animation sequences includes disentangling the multimodal input data to generate substantially disentangled latent representations, combining these representations with the multimodal input data, and using a large-language model (LLM) to generate speech data for the NPC. Further processing using reverse diffusion generates face vertex displacement data and joint trajectory data based on the combined representation and generated speech data.

Type: Application

Filed: June 20, 2024

Publication date: December 26, 2024

Inventors: Karthik Mohan Kumar, Michael Mantor, Pedro Antonio Pena, Archana Ramalingam
ADAPTIVE MULTIMODAL FUSING FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION

Publication number: 20240424407

Abstract: Systems and techniques for generating and animating non-player characters (NPCs) within virtual digital environments are provided. Multimodal input data is received that comprises a plurality of input modalities for interaction with an NPC having a set of body features and a set of facial features. The multimodal input data is processed through one or more neural networks to generate animation sequences for both the body features and facial features of the NPC. Generating such animation sequences includes disentangling the multimodal input data to generate substantially disentangled latent representations, combining these representations with the multimodal input data, and using a large-language model (LLM) to generate speech data for the NPC. Further processing using reverse diffusion generates face vertex displacement data and joint trajectory data based on the combined representation and generated speech data.

Type: Application

Filed: June 20, 2024

Publication date: December 26, 2024

Inventors: Karthik Mohan Kumar, Michael Mantor, Pedro Antonio Pena, Archana Ramalingam
Accumulative multi-cue activation of domain-specific automatic speech recognition engine

Patent number: 11094324

Abstract: A method includes detecting a keyword within an audio stream. The keyword is one of multiple keywords in a database, in which each of the multiple keywords relates to at least one of multiple domains in the database. The database stores a first confidence weight for each of the multiple keywords that are related to a first domain among the multiple domains. Each first confidence weight indicates a probability that a corresponding keyword relates to the first domain. The method includes determining whether a first confidence weight of the keyword is at least equal to an activation threshold value associated with the first domain. The method includes, in response to the first confidence weight of the keyword meeting the activation threshold value, activating a DS-ASR engine corresponding with the first domain to perform speech-to-text conversion on the audio stream.

Type: Grant

Filed: May 14, 2019

Date of Patent: August 17, 2021

Assignee: Motorola Mobility LLC

Inventors: Zhengping Ji, Leo S. Woiceshyn, Karthik Mohan Kumar, Yi Wu
Selective activation of smaller resource footprint automatic speech recognition engines by predicting a domain topic based on a time since a previous communication

Patent number: 11030994

Abstract: A method and data processing device for detecting a communication between a first and second entity. The method includes identifying whether a previous communication between the first and second entity has been detected. In response to identifying that the previous communication between the first and second entity has been detected, the method determines an elapsed time since detection of the previous communication. The method predicts a topic of the communication, in part based on the determined elapsed time. The topic corresponds to a specific domain from among a plurality of available domains for automatic speech recognition (ASR) processing. The method triggers selection and activation of a first domain specific (DS) ASR engine from among a plurality of available DS ASR engines to utilize a smaller resource footprint than a general ASR engine and facilitate recognition of specific vocabulary and context, in part, based on the elapsed time since the previous communication.

Type: Grant

Filed: April 24, 2019

Date of Patent: June 8, 2021

Assignee: Motorola Mobility LLC

Inventors: Zhengping Ji, Leo S. Woiceshyn, Karthik Mohan Kumar, Yi Wu, Thomas Y. Merrell
IDENTIFYING A LIVE PERSON ON A PHONE CALL

Publication number: 20210021706

Abstract: A method, a communication device, and a computer program product for identifying a live phone call. The method includes receiving, at a first communication device, an activation of a verification mode for a phone call. The method includes receiving, from a second communication device on the phone call, first audio data associated with the phone call. The method further includes determining, via a processor of the first communication device, if the first audio data contains machine originated audio, and in response to determining that the first audio data does not contain machine originated audio , generating and outputting an alert that the phone call is live.

Type: Application

Filed: July 17, 2019

Publication date: January 21, 2021

Inventors: JARRETT K. SIMERSON, LEO S. WOICESHYN, KARTHIK MOHAN KUMAR, YI WU, THOMAS Y. MERRELL
Identifying a live person on a phone call

Patent number: 10887459

Abstract: A method, a communication device, and a computer program product for identifying a live phone call. The method includes receiving, at a first communication device, an activation of a verification mode for a phone call. The method includes receiving, from a second communication device on the phone call, first audio data associated with the phone call. The method further includes determining, via a processor of the first communication device, if the first audio data contains machine originated audio, and in response to determining that the first audio data does not contain machine originated audio, generating and outputting an alert that the phone call is live.

Type: Grant

Filed: July 17, 2019

Date of Patent: January 5, 2021

Assignee: Motorola Mobility LLC

Inventors: Jarrett K. Simerson, Leo S. Woiceshyn, Karthik Mohan Kumar, Yi Wu, Thomas Y. Merrell
ACCUMULATIVE MULTI-CUE ACTIVATION OF DOMAIN-SPECIFIC AUTOMATIC SPEECH RECOGNITION ENGINE

Publication number: 20200365148

Abstract: A method includes detecting a keyword within an audio stream. The keyword is one of multiple keywords in a database, in which each of the multiple keywords relates to at least one of multiple domains in the database. The database stores a first confidence weight for each of the multiple keywords that are related to a first domain among the multiple domains. Each first confidence weight indicates a probability that a corresponding keyword relates to the first domain. The method includes determining whether a first confidence weight of the keyword is at least equal to an activation threshold value associated with the first domain. The method includes, in response to the first confidence weight of the keyword meeting the activation threshold value, activating a DS-ASR engine corresponding with the first domain to perform speech-to-text conversion on the audio stream.

Type: Application

Filed: May 14, 2019

Publication date: November 19, 2020

Inventors: ZHENGPING JI, LEO S. WOICESHYN, KARTHIK MOHAN KUMAR, YI WU
SELECTIVE ACTIVATION OF SMALLER RESOURCE FOOTPRINT AUTOMATIC SPEECH RECOGNITION ENGINES BY PREDICTING A DOMAIN TOPIC BASED ON A TIME SINCE A PREVIOUS COMMUNICATION

Publication number: 20200342853

Abstract: A method and data processing device for detecting a communication between a first and second entity. The method includes identifying whether a previous communication between the first and second entity has been detected. In response to identifying that the previous communication between the first and second entity has been detected, the method determines an elapsed time since detection of the previous communication. The method predicts a topic of the communication, in part based on the determined elapsed time. The topic corresponds to a specific domain from among a plurality of available domains for automatic speech recognition (ASR) processing. The method triggers selection and activation of a first domain specific (DS) ASR engine from among a plurality of available DS ASR engines to utilize a smaller resource footprint than a general ASR engine and facilitate recognition of specific vocabulary and context, in part, based on the elapsed time since the previous communication.

Type: Application

Filed: April 24, 2019

Publication date: October 29, 2020

Inventors: ZHENGPING JI, LEO S. WOICESHYN, KARTHIK MOHAN KUMAR, YI WU, THOMAS Y. MERRELL