Patents Examined by Bhavesh M. Mehta

Interpreting a text classifier

Patent number: 11842159

Abstract: Techniques for interpreting a text classifier model are described. An exemplary method includes receiving a request to interpret the text classifier; receiving input text to be used to interpret the text classifier; interpreting the text classifier using the input text and masked input text to determine two or more of a counterfactual score for the received input text or an aspect thereof, an importance score for the received input text or an aspect thereof, and a bias score for the received input text or an aspect thereof as requested by the request, and providing the determined one or more scores is provided to a requester.

Type: Grant

Filed: March 16, 2021

Date of Patent: December 12, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Sawan Kumar, Kalpit Dixit, Syed Kashif Hussain Shah
Speech emotion recognition method and system based on fused population information

Patent number: 11837252

Abstract: The present invention discloses a speech emotion recognition method and system based on fused population information. The method includes the following steps: S1: acquiring a user's audio data; S2: preprocessing the audio data, and obtaining a Mel spectrogram feature; S3: cutting off a front mute segment and a rear mute segment of the Mel spectrogram feature; S4: obtaining population depth feature information through a population classification network; S5: obtaining Mel spectrogram depth feature information through a Mel spectrogram preprocessing network; S6: fusing the population depth feature information and the Mel spectrogram depth feature information through SENet to obtain fused information; and S7: obtaining an emotion recognition result from the fused information through a classification network.

Type: Grant

Filed: June 21, 2022

Date of Patent: December 5, 2023

Assignee: Zhejiang Lab

Inventors: Taihao Li, Shukai Zheng, Yulong Liu, Guanxiong Pei, Shijie Ma
Learning device, learning method, and learning program for images and sound which uses a similarity matrix

Patent number: 11830478

Abstract: A learning device calculates a feature of each data included in a pair of datasets in which two modalities among a plurality of modalities are combined, using a model that receives data on a corresponding modality among the modalities and outputs a feature obtained by mapping the received data into an embedding space. The learning device then selects similar data similar to each target data that is data on a first modality in a first dataset of the datasets, from data on a second modality included in a second dataset of the datasets. The learning device further updates a parameter of the model such that the features of the data in the pair included in the first and the second datasets are similar to one another, and the feature of data paired with the target data is similar to the feature of data paired with the similar data.

Type: Grant

Filed: April 1, 2021

Date of Patent: November 28, 2023

Assignees: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Inventors: Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, James R. Glass, David Harwath
Information processing apparatus and information processing method

Patent number: 11823669

Abstract: According to one embodiment, an information processing apparatus include following units. The first acquisition unit acquires speech data including frames. The second acquisition unit acquires a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise. The first calculation unit calculates a keyword score indicative of occurrence probability of the component of the keyword. The second calculation unit calculates a background noise score indicative of occurrence probability of the component of the background noise. The determination unit determines whether or not the speech data includes the keyword.

Type: Grant

Filed: February 28, 2020

Date of Patent: November 21, 2023

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Ning Ding, Hiroshi Fujimura
Systems and methods for increasing accuracy in categorizing characters in text string

Patent number: 11816432

Abstract: Disclosed embodiments may include a method that includes setting an influence level for each index that a neural network can accept in one sample to a same level for a neural network, receiving a training corpus including training input samples and a corresponding correct training prediction samples, generating, using the neural network, prediction samples, identifying an accuracy for each index by comparing the prediction samples with the corresponding correct training prediction samples, adjusting the influence level for each index based on the accuracy for each index, identifying one or more poorly accurate indexes for the neural network, receiving a first input sample including one or more characters, generating one or more normalized first input samples by applying one or more buffers to the one or more poorly accurate indexes, and generating, using the neural network, a categorization of each character in the one or more normalized first input samples.

Type: Grant

Filed: February 9, 2021

Date of Patent: November 14, 2023

Assignee: CAPITAL ONE SERVICES, LLC

Inventors: Jeremy Edward Goodsitt, Galen Rafferty, Anh Truong, Austin Walters
Multi-turn dialogue response generation with autoregressive transformer models

Patent number: 11816442

Abstract: Machine classifiers in accordance with embodiments of the invention capture long-term temporal dependencies in the dialogue data better than the existing RNN-based architectures. Additionally, machine classifiers may model the joint distribution of the context and response as opposed to the conditional distribution of the response given the context as employed in sequence-to-sequence frameworks. Machine classifiers in accordance with embodiments further append random paddings before and/or after the input data to reduce the syntactic redundancy in the input data, thereby improving the performance of the machine classifiers for a variety of dialogue-related tasks. The random padding of the input data may further provide regularization during the training of the machine classifier and/or reduce exposure bias. In a variety of embodiments, the input data may be encoded based on subword tokenization.

Type: Grant

Filed: March 1, 2023

Date of Patent: November 14, 2023

Assignee: Capital One Services, LLC

Inventors: Oluwatobi Olabiyi, Erik T. Mueller, Rui Zhang
Systems and methods for reducing latency in cloud services

Patent number: 11817087

Abstract: Systems and methods for distributing cloud-based language processing services to partially execute in a local device to reduce latency perceived by the user. For example, a local device may receive a request via audio input, that requires a cloud-based service to process the request and generate a response. A partial response may be generated locally and played back while a more complete response is generated remotely.

Type: Grant

Filed: August 28, 2020

Date of Patent: November 14, 2023

Assignee: Micron Technology, Inc.

Inventor: Ameen D. Akel
Speech recognition with selective use of dynamic language models

Patent number: 11810568

Abstract: A computer-implemented method for transcribing an utterance includes receiving, at a computing system, speech data that characterizes an utterance of a user. A first set of candidate transcriptions of the utterance can be generated using a static class-based language model that includes a plurality of classes that are each populated with class-based terms selected independently of the utterance or the user. The computing system can then determine whether the first set of candidate transcriptions includes class-based terms. Based on whether the first set of candidate transcriptions includes class-based terms, the computing system can determine whether to generate a dynamic class-based language model that includes at least one class that is populated with class-based terms selected based on a context associated with at least one of the utterance and the user.

Type: Grant

Filed: December 10, 2020

Date of Patent: November 7, 2023

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Dynamic and/or context-specific hot words to invoke automated assistant

Patent number: 11810557

Abstract: Techniques are described herein for enabling the use of “dynamic” or “context-specific” hot words to invoke an automated assistant. In various implementations, an automated assistant may be executed in a default listening state at least in part on a user's computing device(s). While in the default listening state, audio data captured by microphone(s) may be monitored for default hot words. Detection of the default hot word(s) transitions of the automated assistant into a speech recognition state. Sensor signal(s) generated by hardware sensor(s) integral with the computing device(s) may be detected and analyzed to determine an attribute of the user. Based on the analysis, the automated assistant may transition into an enhanced listening state in which the audio data may be monitored for enhanced hot word(s). Detection of enhanced hot word(s) triggers the automated assistant to perform a responsive action without requiring detection of default hot word(s).

Type: Grant

Filed: February 19, 2022

Date of Patent: November 7, 2023

Assignee: GOOGLE LLC

Inventor: Diego Melendo Casado
Methods and apparatuses for discriminative pre-training for low resource title compression

Patent number: 11804214

Abstract: A system for generating compressed product titles that can be used in conversational transactions includes a computing device configured to obtain product title data characterizing descriptive product titles of products available on an ecommerce marketplace and to determine compressed product titles based on the product title data using a machine learning model that is pre-trained using a replaced-token detection task. The computing device also stores the compressed product titles for use during conversational transactions.

Type: Grant

Filed: February 26, 2021

Date of Patent: October 31, 2023

Assignee: Walmart Apollo, LLC

Inventors: Snehasish Mukherjee, Phani Ram Sayapaneni, Shankara Bhargava
Information exchange on mobile devices using audio

Patent number: 11804231

Abstract: In some implementations, a user device may receive input that triggers transmission of information via sound. The user device may select an audio clip based on a setting associated with the device, and may modify a digital representation of the selected audio clip using an encoding algorithm and based on data associated with a user of the device. The user device may transmit, to a remote server, an indication of the selected audio clip, an indication of the encoding algorithm, and the data associated with the user. The user device may use a speaker to play audio, based on the modified digital representation, for recording by other devices. Accordingly, the user device may receive, from the remote server and based on the speaker playing the audio, a confirmation that users associated with the other devices have performed an action based on the data associated with the user of the device.

Type: Grant

Filed: July 2, 2021

Date of Patent: October 31, 2023

Assignee: Capital One Services, LLC

Inventor: Ian Fitzgerald
Linearization of non-linearly transformed signals

Patent number: 11804233

Abstract: A device includes one or more processors configured to perform signal processing including a linear transformation and a non-linear transformation of an input signal to generate a reference target signal. The reference target signal has a linear component associated with the linear transformation and a non-linear component associated with the non-linear transformation. The one or more processors are also configured to perform linear filtering of the input signal by controlling adaptation of the linear filtering to generate an output signal that substantially matches the linear component of the reference target signal.

Type: Grant

Filed: November 15, 2019

Date of Patent: October 31, 2023

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Dongmei Wang, Cheng-Yu Hung, Erik Visser
Example-based voice bot development techniques

Patent number: 11804211

Abstract: Implementations are directed to providing a voice bot development platform that enables a third-party developer to train a voice bot based on training instance(s). The training instance(s) can each include training input and training output. The training input can include a portion of a corresponding conversation and a prior context of the corresponding conversation. The training output can include a corresponding ground truth response to the portion of the corresponding conversation. Subsequent to training, the voice bot can be deployed for conducting conversations on behalf of a third-party. In some implementations, the voice bot is further trained based on a corresponding feature emphasis input that attentions the voice bot to a particular feature of the portion of the corresponding conversation. In some additional or alternative implementations, the voice bot is further trained to interact with third-party system(s) via remote procedure calls (RPCs).

Type: Grant

Filed: December 4, 2020

Date of Patent: October 31, 2023

Assignee: GOOGLE LLC

Inventors: Asaf Aharoni, Yaniv Leviathan, Eyal Segalis, Gal Elidan, Sasha Goldshtein, Tomer Amiaz, Deborah Cohen
Attentive scoring function for speaker identification

Patent number: 11798562

Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.

Type: Grant

Filed: May 16, 2021

Date of Patent: October 24, 2023

Assignee: Google LLC

Inventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Yiling Huang, Mert Saglam
Answer prediction in a speech processing system

Patent number: 11798538

Abstract: This disclosure relates to answer prediction in a speech processing system. The system may disambiguate entities spoken or implied in a request to initiate an action with respect to a target user. To initiate the action, the system may determine one or more parameters; for example, the target (e.g., a contact/recipient), a source (e.g., a caller/requesting user), and a network (voice over internet protocol (VOIP), cellular, video chat, etc.). Due to the privacy implications of initiating actions involving data transfers between parties, the system may apply a high threshold for a confidence associated with each parameter. Rather than ask multiple follow-up questions, which may frustrate the requesting user, the system may attempt to disambiguate or determine a parameter, and skip a question regarding the parameter if it can predict an answer with high confidence. The system can improve the customer experience while maintaining security for actions involving, for example, communications.

Type: Grant

Filed: September 21, 2020

Date of Patent: October 24, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Christopher Geiger Parker, Piyush Bhargava, Aparna Nandyal, Rajagopalan Ranganathan, Mugunthan Govindaraju, Vidya Narasimhan
Systems and methods for a multilingual speech recognition framework

Patent number: 11798534

Abstract: Embodiments described herein provide an Adapt-and-Adjust (A2) mechanism for multilingual speech recognition model that combines both adaptation and adjustment methods as an integrated end-to-end training to improve the models' generalization and mitigate the long-tailed issue. Specifically, a multilingual language model mBERT is utilized, and converted into an autoregressive transformer decoder. In addition, a cross-attention module is added to the encoder on top of the mBERT's self-attention layer in order to explore the acoustic space in addition to the text space. The joint training of the encoder and mBERT decoder can bridge the semantic gap between the speech and the text.

Type: Grant

Filed: January 29, 2021

Date of Patent: October 24, 2023

Assignee: salesforce.com, inc.

Inventors: Guangsen Wang, Chu Hong Hoi, Genta Indra Winata
Context aware beamforming of audio data

Patent number: 11798533

Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.

Type: Grant

Filed: April 2, 2021

Date of Patent: October 24, 2023

Assignee: GOOGLE LLC

Inventors: Joseph Caroselli, Jr., Yiteng Huang, Arun Narayanan
WPE-based dereverberation apparatus using virtual acoustic channel expansion based on deep neural network

Patent number: 11790929

Abstract: According to an aspect, a WPE-based dereverberation apparatus using virtual acoustic channel expansion based on a deep neural network includes a signal reception unit for receiving as input a first speech signal through a single channel microphone, a signal generation unit for generating a second speech signal by applying a virtual acoustic channel expansion algorithm based on a deep neural network to the first speech signal and a dereverberation unit for removing reverberation of the first speech signal and generating a dereverberated signal from which the reverberation has been removed by applying a dual-channel weighted prediction error (WPE) algorithm based on a deep neural network to the first speech signal and the second speech signal.

Type: Grant

Filed: August 4, 2021

Date of Patent: October 17, 2023

Assignee: IUCF-HYU (INDUSTRY-UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY)

Inventors: Joon Hyuk Chang, Joon Young Yang
Voice-driven application prototyping using machine-learning techniques

Patent number: 11790892

Abstract: A method includes capturing an event, analyzing the event to generate graphs, receiving a natural language utterance, identifying an entity and a command, modifying the graphs; and emitting an application prototype. An application prototyping server includes a processor; and a memory storing instructions that, when executed by the processor, cause the server to capture an event, analyze the captured event to generate graphs, receive a natural language utterance, identify an entity and a command, modify the graphs; and emit an application prototype. A non-transitory computer readable medium containing program instructions that when executed, cause a computer to: capture an event, analyze the captured event to generate graphs, receive a natural language utterance, identify an entity and a command, modify the graphs; and emit an application prototype.

Type: Grant

Filed: May 27, 2020

Date of Patent: October 17, 2023

Assignee: CDW LLC

Inventor: Joseph Kessler
Voice activity detection and dialogue recognition for air traffic control

Patent number: 11783810

Abstract: Illustrative embodiments provide a method and system for communicating air traffic control information. An audio signal comprising voice activity is received. Air traffic control information in the voice activity is identified using an artificial intelligence algorithm. A text transcript of the air traffic control information is generated and displayed on a confirmation display. Voice activity in the audio signal may be detected by identifying portions of the audio signal that comprise speech based on a comparison between the power spectrum of the audio signal and the power spectrum of noise and forming speech segments comprising the portions of the audio signal that comprise speech.

Type: Grant

Filed: July 17, 2020

Date of Patent: October 10, 2023

Assignee: The Boeing Company

Inventors: Stephen Dame, Yu Qiao, Taylor A. Riccetti, David J. Ross, Joshua Welshmeyer, Matthew Sheridan-Smith, Su Ying Li, Zarrin Khiang-Huey Chua, Jose A. Medina, Michelle D. Warren, Simran Pabla, Jasper P. Corleis

prev 1 2 3 4 5 6 7 … next