Patents Examined by Bhavesh M. Mehta
-
Patent number: 11842159Abstract: Techniques for interpreting a text classifier model are described. An exemplary method includes receiving a request to interpret the text classifier; receiving input text to be used to interpret the text classifier; interpreting the text classifier using the input text and masked input text to determine two or more of a counterfactual score for the received input text or an aspect thereof, an importance score for the received input text or an aspect thereof, and a bias score for the received input text or an aspect thereof as requested by the request, and providing the determined one or more scores is provided to a requester.Type: GrantFiled: March 16, 2021Date of Patent: December 12, 2023Assignee: Amazon Technologies, Inc.Inventors: Sawan Kumar, Kalpit Dixit, Syed Kashif Hussain Shah
-
Patent number: 11837252Abstract: The present invention discloses a speech emotion recognition method and system based on fused population information. The method includes the following steps: S1: acquiring a user's audio data; S2: preprocessing the audio data, and obtaining a Mel spectrogram feature; S3: cutting off a front mute segment and a rear mute segment of the Mel spectrogram feature; S4: obtaining population depth feature information through a population classification network; S5: obtaining Mel spectrogram depth feature information through a Mel spectrogram preprocessing network; S6: fusing the population depth feature information and the Mel spectrogram depth feature information through SENet to obtain fused information; and S7: obtaining an emotion recognition result from the fused information through a classification network.Type: GrantFiled: June 21, 2022Date of Patent: December 5, 2023Assignee: Zhejiang LabInventors: Taihao Li, Shukai Zheng, Yulong Liu, Guanxiong Pei, Shijie Ma
-
Patent number: 11830478Abstract: A learning device calculates a feature of each data included in a pair of datasets in which two modalities among a plurality of modalities are combined, using a model that receives data on a corresponding modality among the modalities and outputs a feature obtained by mapping the received data into an embedding space. The learning device then selects similar data similar to each target data that is data on a first modality in a first dataset of the datasets, from data on a second modality included in a second dataset of the datasets. The learning device further updates a parameter of the model such that the features of the data in the pair included in the first and the second datasets are similar to one another, and the feature of data paired with the target data is similar to the feature of data paired with the similar data.Type: GrantFiled: April 1, 2021Date of Patent: November 28, 2023Assignees: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, MASSACHUSETTS INSTITUTE OF TECHNOLOGYInventors: Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, James R. Glass, David Harwath
-
Patent number: 11823669Abstract: According to one embodiment, an information processing apparatus include following units. The first acquisition unit acquires speech data including frames. The second acquisition unit acquires a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise. The first calculation unit calculates a keyword score indicative of occurrence probability of the component of the keyword. The second calculation unit calculates a background noise score indicative of occurrence probability of the component of the background noise. The determination unit determines whether or not the speech data includes the keyword.Type: GrantFiled: February 28, 2020Date of Patent: November 21, 2023Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Ning Ding, Hiroshi Fujimura
-
Patent number: 11816432Abstract: Disclosed embodiments may include a method that includes setting an influence level for each index that a neural network can accept in one sample to a same level for a neural network, receiving a training corpus including training input samples and a corresponding correct training prediction samples, generating, using the neural network, prediction samples, identifying an accuracy for each index by comparing the prediction samples with the corresponding correct training prediction samples, adjusting the influence level for each index based on the accuracy for each index, identifying one or more poorly accurate indexes for the neural network, receiving a first input sample including one or more characters, generating one or more normalized first input samples by applying one or more buffers to the one or more poorly accurate indexes, and generating, using the neural network, a categorization of each character in the one or more normalized first input samples.Type: GrantFiled: February 9, 2021Date of Patent: November 14, 2023Assignee: CAPITAL ONE SERVICES, LLCInventors: Jeremy Edward Goodsitt, Galen Rafferty, Anh Truong, Austin Walters
-
Patent number: 11816442Abstract: Machine classifiers in accordance with embodiments of the invention capture long-term temporal dependencies in the dialogue data better than the existing RNN-based architectures. Additionally, machine classifiers may model the joint distribution of the context and response as opposed to the conditional distribution of the response given the context as employed in sequence-to-sequence frameworks. Machine classifiers in accordance with embodiments further append random paddings before and/or after the input data to reduce the syntactic redundancy in the input data, thereby improving the performance of the machine classifiers for a variety of dialogue-related tasks. The random padding of the input data may further provide regularization during the training of the machine classifier and/or reduce exposure bias. In a variety of embodiments, the input data may be encoded based on subword tokenization.Type: GrantFiled: March 1, 2023Date of Patent: November 14, 2023Assignee: Capital One Services, LLCInventors: Oluwatobi Olabiyi, Erik T. Mueller, Rui Zhang
-
Patent number: 11817087Abstract: Systems and methods for distributing cloud-based language processing services to partially execute in a local device to reduce latency perceived by the user. For example, a local device may receive a request via audio input, that requires a cloud-based service to process the request and generate a response. A partial response may be generated locally and played back while a more complete response is generated remotely.Type: GrantFiled: August 28, 2020Date of Patent: November 14, 2023Assignee: Micron Technology, Inc.Inventor: Ameen D. Akel
-
Patent number: 11810568Abstract: A computer-implemented method for transcribing an utterance includes receiving, at a computing system, speech data that characterizes an utterance of a user. A first set of candidate transcriptions of the utterance can be generated using a static class-based language model that includes a plurality of classes that are each populated with class-based terms selected independently of the utterance or the user. The computing system can then determine whether the first set of candidate transcriptions includes class-based terms. Based on whether the first set of candidate transcriptions includes class-based terms, the computing system can determine whether to generate a dynamic class-based language model that includes at least one class that is populated with class-based terms selected based on a context associated with at least one of the utterance and the user.Type: GrantFiled: December 10, 2020Date of Patent: November 7, 2023Assignee: Google LLCInventors: Petar Aleksic, Pedro J. Moreno Mengibar
-
Patent number: 11810557Abstract: Techniques are described herein for enabling the use of “dynamic” or “context-specific” hot words to invoke an automated assistant. In various implementations, an automated assistant may be executed in a default listening state at least in part on a user's computing device(s). While in the default listening state, audio data captured by microphone(s) may be monitored for default hot words. Detection of the default hot word(s) transitions of the automated assistant into a speech recognition state. Sensor signal(s) generated by hardware sensor(s) integral with the computing device(s) may be detected and analyzed to determine an attribute of the user. Based on the analysis, the automated assistant may transition into an enhanced listening state in which the audio data may be monitored for enhanced hot word(s). Detection of enhanced hot word(s) triggers the automated assistant to perform a responsive action without requiring detection of default hot word(s).Type: GrantFiled: February 19, 2022Date of Patent: November 7, 2023Assignee: GOOGLE LLCInventor: Diego Melendo Casado
-
Patent number: 11804214Abstract: A system for generating compressed product titles that can be used in conversational transactions includes a computing device configured to obtain product title data characterizing descriptive product titles of products available on an ecommerce marketplace and to determine compressed product titles based on the product title data using a machine learning model that is pre-trained using a replaced-token detection task. The computing device also stores the compressed product titles for use during conversational transactions.Type: GrantFiled: February 26, 2021Date of Patent: October 31, 2023Assignee: Walmart Apollo, LLCInventors: Snehasish Mukherjee, Phani Ram Sayapaneni, Shankara Bhargava
-
Patent number: 11804231Abstract: In some implementations, a user device may receive input that triggers transmission of information via sound. The user device may select an audio clip based on a setting associated with the device, and may modify a digital representation of the selected audio clip using an encoding algorithm and based on data associated with a user of the device. The user device may transmit, to a remote server, an indication of the selected audio clip, an indication of the encoding algorithm, and the data associated with the user. The user device may use a speaker to play audio, based on the modified digital representation, for recording by other devices. Accordingly, the user device may receive, from the remote server and based on the speaker playing the audio, a confirmation that users associated with the other devices have performed an action based on the data associated with the user of the device.Type: GrantFiled: July 2, 2021Date of Patent: October 31, 2023Assignee: Capital One Services, LLCInventor: Ian Fitzgerald
-
Patent number: 11804233Abstract: A device includes one or more processors configured to perform signal processing including a linear transformation and a non-linear transformation of an input signal to generate a reference target signal. The reference target signal has a linear component associated with the linear transformation and a non-linear component associated with the non-linear transformation. The one or more processors are also configured to perform linear filtering of the input signal by controlling adaptation of the linear filtering to generate an output signal that substantially matches the linear component of the reference target signal.Type: GrantFiled: November 15, 2019Date of Patent: October 31, 2023Assignee: QUALCOMM IncorporatedInventors: Lae-Hoon Kim, Dongmei Wang, Cheng-Yu Hung, Erik Visser
-
Patent number: 11804211Abstract: Implementations are directed to providing a voice bot development platform that enables a third-party developer to train a voice bot based on training instance(s). The training instance(s) can each include training input and training output. The training input can include a portion of a corresponding conversation and a prior context of the corresponding conversation. The training output can include a corresponding ground truth response to the portion of the corresponding conversation. Subsequent to training, the voice bot can be deployed for conducting conversations on behalf of a third-party. In some implementations, the voice bot is further trained based on a corresponding feature emphasis input that attentions the voice bot to a particular feature of the portion of the corresponding conversation. In some additional or alternative implementations, the voice bot is further trained to interact with third-party system(s) via remote procedure calls (RPCs).Type: GrantFiled: December 4, 2020Date of Patent: October 31, 2023Assignee: GOOGLE LLCInventors: Asaf Aharoni, Yaniv Leviathan, Eyal Segalis, Gal Elidan, Sasha Goldshtein, Tomer Amiaz, Deborah Cohen
-
Patent number: 11798562Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.Type: GrantFiled: May 16, 2021Date of Patent: October 24, 2023Assignee: Google LLCInventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Yiling Huang, Mert Saglam
-
Patent number: 11798538Abstract: This disclosure relates to answer prediction in a speech processing system. The system may disambiguate entities spoken or implied in a request to initiate an action with respect to a target user. To initiate the action, the system may determine one or more parameters; for example, the target (e.g., a contact/recipient), a source (e.g., a caller/requesting user), and a network (voice over internet protocol (VOIP), cellular, video chat, etc.). Due to the privacy implications of initiating actions involving data transfers between parties, the system may apply a high threshold for a confidence associated with each parameter. Rather than ask multiple follow-up questions, which may frustrate the requesting user, the system may attempt to disambiguate or determine a parameter, and skip a question regarding the parameter if it can predict an answer with high confidence. The system can improve the customer experience while maintaining security for actions involving, for example, communications.Type: GrantFiled: September 21, 2020Date of Patent: October 24, 2023Assignee: Amazon Technologies, Inc.Inventors: Christopher Geiger Parker, Piyush Bhargava, Aparna Nandyal, Rajagopalan Ranganathan, Mugunthan Govindaraju, Vidya Narasimhan
-
Patent number: 11798534Abstract: Embodiments described herein provide an Adapt-and-Adjust (A2) mechanism for multilingual speech recognition model that combines both adaptation and adjustment methods as an integrated end-to-end training to improve the models' generalization and mitigate the long-tailed issue. Specifically, a multilingual language model mBERT is utilized, and converted into an autoregressive transformer decoder. In addition, a cross-attention module is added to the encoder on top of the mBERT's self-attention layer in order to explore the acoustic space in addition to the text space. The joint training of the encoder and mBERT decoder can bridge the semantic gap between the speech and the text.Type: GrantFiled: January 29, 2021Date of Patent: October 24, 2023Assignee: salesforce.com, inc.Inventors: Guangsen Wang, Chu Hong Hoi, Genta Indra Winata
-
Patent number: 11798533Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.Type: GrantFiled: April 2, 2021Date of Patent: October 24, 2023Assignee: GOOGLE LLCInventors: Joseph Caroselli, Jr., Yiteng Huang, Arun Narayanan
-
Patent number: 11790929Abstract: According to an aspect, a WPE-based dereverberation apparatus using virtual acoustic channel expansion based on a deep neural network includes a signal reception unit for receiving as input a first speech signal through a single channel microphone, a signal generation unit for generating a second speech signal by applying a virtual acoustic channel expansion algorithm based on a deep neural network to the first speech signal and a dereverberation unit for removing reverberation of the first speech signal and generating a dereverberated signal from which the reverberation has been removed by applying a dual-channel weighted prediction error (WPE) algorithm based on a deep neural network to the first speech signal and the second speech signal.Type: GrantFiled: August 4, 2021Date of Patent: October 17, 2023Assignee: IUCF-HYU (INDUSTRY-UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY)Inventors: Joon Hyuk Chang, Joon Young Yang
-
Patent number: 11790892Abstract: A method includes capturing an event, analyzing the event to generate graphs, receiving a natural language utterance, identifying an entity and a command, modifying the graphs; and emitting an application prototype. An application prototyping server includes a processor; and a memory storing instructions that, when executed by the processor, cause the server to capture an event, analyze the captured event to generate graphs, receive a natural language utterance, identify an entity and a command, modify the graphs; and emit an application prototype. A non-transitory computer readable medium containing program instructions that when executed, cause a computer to: capture an event, analyze the captured event to generate graphs, receive a natural language utterance, identify an entity and a command, modify the graphs; and emit an application prototype.Type: GrantFiled: May 27, 2020Date of Patent: October 17, 2023Assignee: CDW LLCInventor: Joseph Kessler
-
Patent number: 11783810Abstract: Illustrative embodiments provide a method and system for communicating air traffic control information. An audio signal comprising voice activity is received. Air traffic control information in the voice activity is identified using an artificial intelligence algorithm. A text transcript of the air traffic control information is generated and displayed on a confirmation display. Voice activity in the audio signal may be detected by identifying portions of the audio signal that comprise speech based on a comparison between the power spectrum of the audio signal and the power spectrum of noise and forming speech segments comprising the portions of the audio signal that comprise speech.Type: GrantFiled: July 17, 2020Date of Patent: October 10, 2023Assignee: The Boeing CompanyInventors: Stephen Dame, Yu Qiao, Taylor A. Riccetti, David J. Ross, Joshua Welshmeyer, Matthew Sheridan-Smith, Su Ying Li, Zarrin Khiang-Huey Chua, Jose A. Medina, Michelle D. Warren, Simran Pabla, Jasper P. Corleis