Patents Examined by Jesse S Pullias

Computing system for domain expressive text to speech

Patent number: 12293756

Abstract: A computing system obtains text that includes words and provides the text as input to an emotional classifier model that has been trained based upon emotional classification. The computing system obtains a textual embedding of the computer-readable text as output of the emotional classifier model. The computing system generates a phoneme sequence based upon the words of the text. The computing system, generates, by way of an encoder of a text to speech (TTS) model, a phoneme encoding based upon the phoneme sequence. The computing system provides the textual embedding and the phoneme encoding as input to a decoder of the TTS model. The computing system causes speech that includes the words to be played over a speaker based upon output of the decoder of the TTS model, where the speech reflects an emotion underlying the text due to the textual embedding provided to the encoder.

Type: Grant

Filed: November 11, 2021

Date of Patent: May 6, 2025

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Arijit Mukherjee, Shubham Bansal, Sandeepkumar Satpal, Rupeshkumar Rasiklal Mehta
Text mining based on document structure information extraction

Patent number: 12277389

Abstract: Frequent sequences extracted from a set of documents according to a common rule are obtained. Based on comparing occurrence frequencies of various sequences, confidence of the first frequent sequence being a label expression representing a document part in a target document is evaluated. Keywords are extracted from the target document based on evaluation of the confidence.

Type: Grant

Filed: May 10, 2021

Date of Patent: April 15, 2025

Assignee: International Business Machines Corporation

Inventors: Tetsuya Nasukawa, Shoko Suzuki, Daisuke Takuma, Issei Yoshida
System for enhancing speech understanding with effective and efficient integration of automated speech recognition error correction, out-of-domain detection, and/or domain classification

Patent number: 12272353

Abstract: The systems and methods described herein are related to a new speech understanding system for domain-specific voice interaction. The systems and methods described herein combine the automatic correction of an automatic speech recognition errors with a natural language understanding model in a way that optimizes the recognition and understanding of a received speech input. The systems and methods described herein may further support out-of-domain detection or domain classification by jointly learning and/or performing automatic speech recognition error correction and domain-related classification. Through the joint learning with automatic speech recognition error correction, the out-of-domain detection or domain classification may be conducted based on a plurality of possible speech recognition results with shared feature inputs and shared neural layers. The systems and methods described herein may achieve robust performance with high computational efficiency.

Type: Grant

Filed: September 26, 2022

Date of Patent: April 8, 2025

Assignee: Robert Bosch GmbH

Inventor: Zhengyu Zhou
Generating commonsense context for text using knowledge graphs

Patent number: 12265792

Abstract: Methods and systems are provided for facilitating generation and utilization of a commonsense contextualizing machine learning (ML) model, in accordance with embodiments described herein. In embodiments, a commonsense contextual ML model is trained by fine-tuning a pre-trained language model using a set of training path-sentence pairs. Each training path-sentence pair includes a commonsense path, identified via a commonsense knowledge graph, and a natural language sentence identified as contextually related to the commonsense path. The trained commonsense contextualizing ML model can then be used to generate a commonsense inference path for a text input. Such a commonsense inference path can include a sequence of entities and relations that provide commonsense context to the text input. Thereafter, the commonsense inference path can be provided to a natural language processing system for use in performing a natural language processing task.

Type: Grant

Filed: November 15, 2021

Date of Patent: April 1, 2025

Assignee: Adobe Inc.

Inventors: Rachit Bansal, Milan Aggarwal, Sumit Bhatia, Jivat Neet Kaur, Balaji Krishnamurthy
Phrase extraction for ASR models

Patent number: 12260875

Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.

Type: Grant

Filed: March 19, 2024

Date of Patent: March 25, 2025

Assignee: Google LLC

Inventors: Ehsan Amid, Om Dipakbhai Thakkar, Rajiv Mathews, Francoise Beaufays
One model unifying streaming and non-streaming speech recognition

Patent number: 12254869

Abstract: A transformer-transducer model for unifying streaming and non-streaming speech recognition includes an audio encoder, a label encoder, and a joint network. The audio encoder receives a sequence of acoustic frames, and generates, at each of a plurality of time steps, a higher order feature representation for a corresponding acoustic frame. The label encoder receives a sequence of non-blank symbols output by a final softmax layer, and generates, at each of the plurality of time steps, a dense representation. The joint network receives the higher order feature representation and the dense representation at each of the plurality of time steps, and generates a probability distribution over possible speech recognition hypothesis. The audio encoder of the model further includes a neural network having an initial stack of transformer layers trained with zero look ahead audio context, and a final stack of transformer layers trained with a variable look ahead audio context.

Type: Grant

Filed: July 24, 2023

Date of Patent: March 18, 2025

Assignee: Google LLC

Inventors: Anshuman Tripathi, Hasim Sak, Han Lu, Qian Zhang, Jaeyoung Kim
Natural language processing techniques using target composite sentiment designation

Patent number: 12254273

Abstract: There is a need for more effective and efficient predictive data analysis solutions and/or more effective and efficient solutions for generating an emotional sentiment score without the use of labelled data. In one example, embodiments comprise receiving an input text sequence, generating an intermediate emotional sentiment score object based at least in part on the input text sequence and by utilizing an emotional sentiment machine learning model, generating an overall emotional sentiment score based at least in part on the intermediate sentiment score object and by utilizing an emotional sentiment score transformation object, and performing one or more prediction-based actions based at least in part on the overall emotional sentiment score.

Type: Grant

Filed: November 2, 2021

Date of Patent: March 18, 2025

Assignee: Optum, Inc.

Inventors: Rajesh Sabapathy, Sumeet Jain, Saurabh Bhargava, Sandeep Chandra Das, Gourav Awasthi, Praveen Bansal, Gaurav, Animesh
Generation of optimized spoken language understanding model through joint training with integrated acoustic knowledge-speech module

Patent number: 12243513

Abstract: A speech module is joint trained with a knowledge module by transforming a first knowledge graph into an acoustic knowledge graph. The knowledge module is trained on the acoustic knowledge graph. Then, the knowledge module is integrated with the speech module to generate an integrated knowledge-speech module. In some instances, the speech module included in the integrated knowledge-speech module is aligned with a language module to generate an optimized speech model configured to leverage acoustic information and acoustic-based knowledge information, along with language information.

Type: Grant

Filed: May 18, 2021

Date of Patent: March 4, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chenguang Zhu, Nanshan Zeng
User authentication for voice-input devices

Patent number: 12230279

Abstract: Techniques for authenticating users at devices that interact with the users via voice input. For instance, the described techniques may allow a voice-input device to safely verify the identity of a user by engaging in a back-and-forth conversation. The device or another device coupled thereto may then verify the accuracy of the responses from the user during the conversation, as well as compare an audio signature associated with the user's responses to a pre-stored audio signature associated with the user. By utilizing multiple checks, the described techniques are able to accurately and safely authenticate the user based solely on an audible conversation between the user and the voice-input device.

Type: Grant

Filed: August 6, 2021

Date of Patent: February 18, 2025

Assignee: Amazon Technologies, Inc.

Inventor: Preethi Parasseri Narayanan
Methods and systems for providing subtitles

Patent number: 12229527

Abstract: Systems and methods are described for providing subtitles for a media content item. Subtitles are obtained, using control circuitry, for the media content item. Control circuitry determines whether a character component of the subtitles should be replaced by an image component. In response to determining that the character component of the subtitles should be replaced by an image component, control circuitry selects, from memory, an image component corresponding to the character component. Control circuitry replaces the character component of the subtitles by the image component to generate modified subtitles.

Type: Grant

Filed: November 22, 2023

Date of Patent: February 18, 2025

Assignee: Adeia Guides Inc.

Inventors: Ankur Anil Aher, Charishma Chundi
Responsive category prediction for user queries

Patent number: 12229208

Abstract: A method for determining a category responsive to a user query is disclosed. The method includes receiving a training data set comprising a plurality of data pairs, each data pair including: (i) a query; and (ii) an associated one or more categories that are responsive to the query, wherein the one or more categories in the training data set defines a plurality of categories. The method includes training a machine learning algorithm, according to the training data set, to create a trained model, wherein training the machine learning algorithm includes: creating a first co-occurrence data structure defining co-occurrence of respective word representations of the queries with the plurality of categories, and creating a second co-occurrence data structure defining co-occurrence of respective categories in respective data pairs. The method also includes deploying the trained model to return one or more categories in response to a new query input.

Type: Grant

Filed: September 28, 2021

Date of Patent: February 18, 2025

Assignee: Home Depot Product Authority, LLC

Inventors: Ali Ahmadvand, Surya Kallumadi, Faizan Javed
Conversational agent counterfactual simulation

Patent number: 12229496

Abstract: A computer-implemented method for counterfactual conversation simulation is disclosed. The computer-implemented method includes generating a system output based, at least in part, on a user input. The computer-implemented method further includes determining that a system output/user input pair is not satisfactory based, at least in part, on a system output/user input score being below a predetermined threshold. The computer-implemented method further includes generating, in response to determining the system output/user input pair is not satisfactory, a counterfactual simulation of the user input based, at least in part, on a target intent of the user input.

Type: Grant

Filed: December 3, 2021

Date of Patent: February 18, 2025

Assignee: International Business Machines Corporation

Inventors: Vera Liao, Yunfeng Zhang, Stephanie Houde
Systems and methods for hierarchical multi-label multi-class intent classification

Patent number: 12229524

Abstract: Methods and systems are described herein for efficiently labeling user utterances, which may encompass any communication received from a user within a conversational interaction, and identifying novel user intents for large amounts of data. A machine learning model may be used, which is trained on embeddings of utterance data, and which may employ methods like prototypical networks and hierarchical local binary classification for hierarchical multi-label multi-class classification.

Type: Grant

Filed: August 9, 2022

Date of Patent: February 18, 2025

Assignee: Capital One Services, LLC

Inventor: Isha Chaturvedi
Performing speech recognition using a local language context including a set of words with descriptions in terms of components smaller than the words

Patent number: 12223963

Abstract: A method of a local recognition system controlling a host device to perform one or more operations is provided. The method includes receiving, by the local recognition system, a query, performing speech recognition on the received query by implementing, by the local recognition system, a local language context comprising a set of words comprising descriptions in terms of components smaller than the words, and performing speech recognition, using the local language context, to create a transcribed query. Further, the method includes controlling the host device in dependence upon the speech recognition performed on the transcribed query.

Type: Grant

Filed: June 12, 2020

Date of Patent: February 11, 2025

Assignee: ScoutHound AI IP, LLC

Inventors: Keyvan Mohajer, Timothy Stonehocker, Bernard Mont-Reynaud
Systems and methods for multiple speaker speech recognition

Patent number: 12223945

Abstract: A speech recognition method is provided. The method may include: obtaining speech data and a speech recognition result of the speech data, the speech data including speech of a plurality of speakers, and the speech recognition result including a plurality of words; determining speaking time of each of the plurality of speakers by processing the speech data; determining, based on the speaking times of the plurality of speakers and the speech recognition result, a corresponding relationship between the plurality of words and the plurality of speakers; determining, based on the corresponding relationship, at least one conversion word from the plurality of words, each of the at least one conversion word corresponding to at least two of the plurality of speakers; and re-determining the corresponding relationship between the plurality of words and the plurality of speakers based on the at least one conversion word.

Type: Grant

Filed: April 23, 2022

Date of Patent: February 11, 2025

Assignee: HITHINK ROYALFLUSH INFORMATION NETWORK CO., LTD.

Inventors: Jinlong Wang, Xinkang Xu, Xinhui Hu
Human-augmented artificial intelligence configuration and optimization insights

Patent number: 12223015

Abstract: A computer-implemented method includes receiving a document insight request that requests document insights for a corpus of documents. The document insight request includes the corpus of documents, a set of entities contained within each document of the corpus of documents, and document insight request parameters that includes a confidence value threshold. The method also includes generating the document insights for the corpus of documents based on the confidence value threshold. Here, the document insights include an accuracy target and a user review rate target. The method also includes transmitting the document insights to the user device causing a graphical user interface to display the document insights on the user device.

Type: Grant

Filed: February 16, 2022

Date of Patent: February 11, 2025

Assignee: GOOGLE LLC

Inventors: Emmanouil Koukoumidis, Nikolaos Kofinas, Evan Huang, Kiran Bellare, Xiao Liu, Michael Lanning, Lukas Rutishauser
Electronic apparatus for processing user utterance and controlling method thereof

Patent number: 12217747

Abstract: Disclosed is an electronic device including a communication interface, a memory, a microphone, a speaker, a display, a main processor, and a sub-processor activating the main processor by recognizing a wake-up word included in a voice input. The at least one memory stores instructions that, when executed, cause the main processor to receive a first voice input to register the wake-up word, when the first voice input does not include a specified word, to receive a second voice input including a word identical to the first voice input, through the microphone, to generate a wake-up word recognition model for recognizing the wake-up word, and to store the generated wake-up word recognition model in the at least one memory, and when the first voice input includes the specified word, to output information for requesting a third voice input, through the speaker or the display.

Type: Grant

Filed: August 23, 2019

Date of Patent: February 4, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Euisuk Chung, Sangki Kang, Sunghwan Baek, Seokyeong Jung, Kyungtae Kim
Metadata-based diarization of teleconferences

Patent number: 12217760

Abstract: A method for audio processing includes receiving a recording of a teleconference among multiple participants over a network, including an audio stream containing speech uttered by the participants and information outside the audio stream. The method further includes processing the audio stream to identify speech segments interspersed with intervals of silence, extracting speaker identifications from the information outside the audio stream in the received recording, labeling a first set of the identified speech segments from the audio stream with the speaker identifications, extracting acoustic features from the speech segments in the first set, learning a correlation between the speaker identifications labelled to the segments in the first set and the extracted acoustic features, and labeling a second set of the identified speech segments using the learned correlation, to indicate the participants who spoke during the speech segments in the second set.

Type: Grant

Filed: January 30, 2022

Date of Patent: February 4, 2025

Assignee: GONGIO Ltd.

Inventors: Eilon Reshef, Hanan Shteingart, Zohar Shay, Shlomi Medalion
Estimating output confidence for black-box API

Patent number: 12210838

Abstract: A computer-implemented method is provided for estimating output confidence of a black box Application Programming Interface (API). The method includes generating paraphrases for an input text. The method further includes calculating a distance between the input text and each respective one of the paraphrases. The method also includes sorting the paraphrases in ascending order of the distance. The method additionally includes selecting a top predetermined number of the paraphrases. The method further includes inputting the input text and the selected paraphrases into the API to obtain an output confidence score for each of the input text and the selected paraphrases. The method also includes estimating, by a hardware processor, the output confidence of the input text from a robustness of output scores of the input text and the selected paraphrases.

Type: Grant

Filed: August 15, 2023

Date of Patent: January 28, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yohei Ikawa, Issei Yoshida, Sachiko Yoshihama, Miki Ishikawa, Kohichi Kamijoh
Training and domain adaptation for supervised text segmentation

Patent number: 12204856

Abstract: Data such as unstructured text is received that includes a sequence of sentences. This received data is then tokenized into a plurality of tokens. The received data is segmented using a hierarchical transformer network model including a token transformer, a sentence transformer, and a segmentation classifier. The token transformer contextualizes tokens within sentences and yields sentence embeddings. The sentences transformer contextualizes sentence representations based on the sentence embedddings. The segmentation classifier predicts segments of the received data based on the contextualized sentence representations. Data can be provided which characterizes the segmentation of the received data. Related apparatus, systems, techniques and articles are also described.

Type: Grant

Filed: September 23, 2021

Date of Patent: January 21, 2025

Assignee: Educational Testing Service

Inventors: Swapna Somasundaran, Goran Glavaš

1 2 3 4 5 … next