Patents Examined by Shaun A Roberts

Digital assistant and a corresponding method for voice-based interactive communication based on detected user gaze indicating attention

Patent number: 11699439

Abstract: Method for voice-based interactive communication using a digital assistant, wherein the method comprises, an attention detection step, in which the digital assistant detects a user attention and as a result is set into a listening mode; a speaker detection step, in which the digital assistant detects the user as a current speaker; a speech sound detection step, in which the digital assistant detects and records speech uttered by the current speaker, which speech sound detection step further comprises a lip movement detection step, in which the digital assistant detects a lip movement of the current speaker; a speech analysis step, in which the digital assistant parses said recorded speech and extracts speech-based verbal informational content from said recorded speech; and a subsequent response step, in which the digital assistant provides feed-back to the user based on said recorded speech.

Type: Grant

Filed: December 21, 2020

Date of Patent: July 11, 2023

Assignee: Tobii AB

Inventors: Erland George-Svahn, Sourabh Pateriya, Onur Kurt, Deepak Akkil
Recognizing transliterated words using suffix and/or prefix outputs

Patent number: 11694026

Abstract: A computer-implemented method includes: receiving, by a computing device, an input file defining correct spellings of one or more transliterated words; generating, by the computing device, suffix outputs based on the one or more transliterated words; generating, by the computing device, a dictionary that maps the suffix outputs to the one or more transliterated words; recognizing, by the computing device, an alternatively spelled transliterated word included in a document as one of the one or more correctly spelled transliterated words using the dictionary; and outputting, by the computing device, information corresponding to the recognized transliterated word.

Type: Grant

Filed: September 23, 2021

Date of Patent: July 4, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: James E. Bostick, John M. Ganci, Jr., Martin G. Keen, Craig M. Trim
Online target-speech extraction method based on auxiliary function for robust automatic speech recognition

Patent number: 11694707

Abstract: A target speech signal extraction method for robust speech recognition includes: initializing a steering vector for a target speech source and an adaptive vector, setting a real output channel of the target speech source as an output by the adaptive vector, initializing adaptive vectors for a noise and setting a dummy channel as an output by the adaptive vectors for the noise; setting a cost function for minimizing dependency between a real output for the target speech source and a dummy output for the noise; setting an auxiliary function to the cost function, and updating the adaptive vector for the target speech source and the adaptive vectors for the noise by using the auxiliary function and the steering vector; estimating the target speech signal by using the adaptive vector thereby extracting the target speech signal from the input signals; and updating the steering vector for the target speech source.

Type: Grant

Filed: March 29, 2021

Date of Patent: July 4, 2023

Assignee: INDUSTRY-UNIVERSITY COOPERATION FOUNDATION SOGANG UNIVERSITY

Inventors: Hyung Min Park, Uihyeop Shin
Sound signal processing system apparatus for avoiding adverse effects on speech recognition

Patent number: 11694705

Abstract: A sound signal processing system includes: a sound signal processing apparatus executing non-linear signal processing on a collected sound signal collected by a microphone, and transmitting, to an information processing apparatus, both a pre-execution sound signal before the non-linear signal processing is executed and a post-execution sound signal after the non-linear signal processing is executed; and the information processing apparatus receiving the pre-execution sound signal and the post-execution sound signal from the sound signal processing apparatus, and executing first processing on the pre-execution sound signal and executing second processing on the post-execution sound signal, the second processing being different from the first processing.

Type: Grant

Filed: July 16, 2019

Date of Patent: July 4, 2023

Assignee: Sony Interactive Entertainment Inc.

Inventors: Yoshihiko Tamaru, Tetsuro Horikawa
User-configured and customized interactive dialog application

Patent number: 11676602

Abstract: Implementations relate to generating and/or executing a customized interactive dialog application. The customized interactive dialog application may be generated from a state mapping tool that allows a user to generate custom states and custom transitions between the custom states. A customized configuration description is then generated based on the generated custom states and custom transitions. Further, a default configuration description is identified that includes additional or alternative states and transitions. In executing the customized interactive dialog application, dialog turns are generated based on the states and transition information, with the customized configuration description taking precedence and the default configuration description being utilized for any undefined states and/or transitions. Implementations additionally or alternatively relate to generating and/or executing a custom agent based on generated custom states and custom transitions, and a default configuration description.

Type: Grant

Filed: May 20, 2022

Date of Patent: June 13, 2023

Assignee: GOOGLE LLC

Inventors: Uri First, Yang Sun
Apparatus and method for reducing noise in an audio signal

Patent number: 11664040

Abstract: An apparatus for processing an audio signal includes an audio signal analyzer and a filter. The audio signal analyzer is configured to analyze an audio signal to determine a plurality of noise suppression filter values for a plurality of bands of the audio signal, wherein the analyzer is configured to determine a noise suppression filter value so that a noise suppression filter value is greater than or equal to a minimum noise suppression filter value and so that the minimum noise suppression value depends on a characteristic of the audio signal. The filter is configured for filtering the audio signal, wherein the filter is adjusted based on the noise suppression filter values.

Type: Grant

Filed: March 23, 2021

Date of Patent: May 30, 2023

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Anthony Lombard, Bernhard Birzer, Dirk Mahne, Edwin Mabande, Fabian Kuech, Emanuel Habets, Paolo Annibale
Contextual biasing for speech recognition

Patent number: 11664021

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Grant

Filed: December 9, 2021

Date of Patent: May 30, 2023

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
Subband block based harmonic transposition

Patent number: 11646047

Abstract: The present document relates to audio source coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR), as well as to digital effect processors, e.g. exciters, where generation of harmonic distortion add brightness to the processed signal, and to time stretchers where a signal duration is prolonged with maintained spectral content. A system and method configured to generate a time stretched and/or frequency transposed signal from an input signal is described. The system comprises an analysis filterbank configured to provide an analysis subband signal from the input signal; wherein the analysis subband signal comprises a plurality of complex valued analysis samples, each having a phase and a magnitude. Furthermore, the system comprises a subband processing unit configured to determine a synthesis subband signal from the analysis subband signal using a subband transposition factor Q and a subband stretch factor S.

Type: Grant

Filed: May 23, 2022

Date of Patent: May 9, 2023

Assignee: Dolby International AB

Inventor: Lars Villemoes
Audio analysis system for automatic language proficiency assessment

Patent number: 11636858

Abstract: A language proficiency analyzer automatically evaluates a person's language proficiency by analyzing that person's oral communications with another person. The analyzer first enhances the quality of an audio recording of a conversation between the two people using a neural network that automatically detects loss features in the audio and adds those loss features back into the audio. The analyzer then performs a textual and audio analysis on the improved audio. Through textual analysis, the analyzer uses a multi-attention network to determine how focused one person is on the other and/or how pleased one person is with the other. Through audio analysis, the analyzer uses a neural network to determine how well one person pronounced words during the conversation.

Type: Grant

Filed: October 12, 2021

Date of Patent: April 25, 2023

Assignee: Bank of America Corporation

Inventors: Madhusudhanan Krishnamoorthy, Harikrishnan Rajeev
Machine learning system for customer utterance intent prediction

Patent number: 11626108

Abstract: A method of operating a customer utterance analysis system includes obtaining a subset of utterances from among a first set of utterances. The method includes encoding, by a sentence encoder, the subset of utterances into multi-dimensional vectors. The method includes generating reduced-dimensionality vectors by reducing a dimensionality of the multi-dimensional vectors. Each vector of the reduced-dimensionality vectors corresponds to an utterance from among the subset of utterances. The method includes performing clustering on the reduced-dimensionality vectors. The method includes, based on the clustering performed on the reduced-dimensionality vectors, arranging the subset of utterances into clusters. The method includes obtaining labels for a least two clusters from among the clusters. The method includes generating training data based on the obtained labels. The method includes training a neural network model to predict an intent of an utterance based on the training data.

Type: Grant

Filed: September 25, 2020

Date of Patent: April 11, 2023

Assignee: TD Ameritrade IP Company, Inc.

Inventors: Abhilash Krishnankutty Nair, Amaris Yuseon Sim, Dayanand Narregudem, Drew David Riassetto, Logan Sommers Ahlstrom, Nafiseh Saberian, Stephen Filios, Ravindra Reddy Tappeta Venkata
Dynamic language and command recognition

Patent number: 11626101

Abstract: Systems and methods are described for processing and interpreting audible commands spoken in one or more languages. Speech recognition systems disclosed herein may be used as a stand-alone speech recognition system or comprise a portion of another content consumption system. A requesting user may provide audio input (e.g., command data) to the speech recognition system via a computing device to request an entertainment system to perform one or more operational commands. The speech recognition system may analyze the audio input across a variety of linguistic models, and may parse the audio input to identify a plurality of phrases and corresponding action classifiers. In some embodiments, the speech recognition system may utilize the action classifiers and other information to determine the one or more identified phrases that appropriately match the desired intent and operational command associated with the user's spoken command.

Type: Grant

Filed: October 28, 2021

Date of Patent: April 11, 2023

Assignee: Comcast Cable Communications, LLC

Inventors: George Thomas Des Jardins, Vikrant Sagar
Voice to text conversion based on third-party agent content

Patent number: 11626115

Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

Type: Grant

Filed: January 24, 2022

Date of Patent: April 11, 2023

Assignee: GOOGLE LLC

Inventors: Barnaby James, Bo Wang, Sunil Vemuri, David Schairer, Ulas Kirazci, Ertan Dogrultan, Petar Aleksic
Adapting automated speech recognition parameters based on hotword properties

Patent number: 11620990

Abstract: A method for optimizing speech recognition includes receiving a first acoustic segment characterizing a hotword detected by a hotword detector in streaming audio captured by a user device, extracting one or more hotword attributes from the first acoustic segment, and adjusting, based on the one or more hotword attributes extracted from the first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model. After adjusting the speech recognition parameters of the ASR model, the method also includes processing, using the ASR model, a second acoustic segment to generate a speech recognition result. The second acoustic segment characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device.

Type: Grant

Filed: December 11, 2020

Date of Patent: April 4, 2023

Assignee: Google LLC

Inventors: Matthew Sharifi, Aleksandar Kracun
Methods and apparatus for rate quality scalable coding with generative models

Patent number: 11621011

Abstract: Described herein is a method of decoding an audio or speech signal, the method including the steps of: (a) receiving, by a decoder, a coded bitstream including the audio or speech signal and conditioning information; (b) providing, by a bitstream decoder, decoded conditioning information in a format associated with a first bitrate; (c) converting, by a converter, the decoded conditioning information from the format associated with the first bitrate to a format associated with a second bitrate; and (d) providing, by a generative neural network, a reconstruction of the audio or speech signal according to a probabilistic model conditioned by the conditioning information in the format associated with the second bitrate. Described are further an apparatus for decoding an audio or speech signal, a respective encoder, a system of the encoder and the apparatus for decoding an audio or speech signal as well as a respective computer program product.

Type: Grant

Filed: October 29, 2019

Date of Patent: April 4, 2023

Assignee: Dolby International AB

Inventors: Janusz Klejsa, Per Hedelin
Selection of quantisation schemes for spatial audio parameter encoding

Patent number: 11600281

Abstract: There is disclosed inter alia an apparatus for spatial audio signal encoding comprising means for receiving for each time frequency block of a sub band of an audio frame a spatial audio parameter comprising an azimuth and an elevation; determining a first distortion measure for the audio frame by determining a first distance measure for each time frequency block and summing the first distance measure for each time frequency block; determining a second distortion measure for the audio frame by determining a second distance measure for each time frequency block and summing the second distance measure for each time frequency block, and selecting either the first quantization scheme or the second quantization scheme for quantising the elevation and the azimuth for all time frequency blocks of the sub band of the audio frame, wherein the selecting is dependent on the first and second distortion measures.

Type: Grant

Filed: September 20, 2019

Date of Patent: March 7, 2023

Assignee: Nokia Technologies Oy

Inventor: Adriana Vasilache
Speaker verification

Patent number: 11594230

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

Type: Grant

Filed: May 4, 2021

Date of Patent: February 28, 2023

Assignee: Google LLC

Inventors: Ignacio Lopez Moreno, Li Wan, Quan Wang
In-vehicle speech processing apparatus

Patent number: 11580981

Abstract: An in-vehicle apparatus is connectable to a device that includes a voice assistant function. The in-vehicle apparatus includes: a voice detector that performs voice recognition of an audio signal input from a microphone and that controls functions of the in-vehicle apparatus based on a result of the voice recognition; and an interface that communicates with the device. When being informed of a detection of a predetermined word in the audio signal as the result of the voice recognition of the audio signal performed by the voice detector, the interface sends to the device, not via the voice detector, the audio signal input from the microphone. The predetermined word is for activating the voice assistant function of the device.

Type: Grant

Filed: March 3, 2021

Date of Patent: February 14, 2023

Assignee: DENSO TEN Limited

Inventors: Katsuaki Hikima, Daisuke Yamasaki, Futoshi Kosuga
Voice shortcut detection with speaker verification

Patent number: 11568878

Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance.

Type: Grant

Filed: April 16, 2021

Date of Patent: January 31, 2023

Assignee: GOOGLE LLC

Inventors: Rajeev Rikhye, Quan Wang, Yanzhang He, Qiao Liang, Ian C. McGraw
Terminal control method, terminal and non-transitory computer readable storage medium

Patent number: 11568888

Abstract: A terminal control method, a terminal and a non-transitory computer-readable storage medium are provided. The terminal control method includes: receiving, by a microphone, a detection audio signal emitted from a speaker and having a frequency within a pre-set detection frequency range; acquiring actual audio parameters of the detection audio signal when being received by the microphone, and original audio parameters of the detection audio signal when being emitted from the speaker; determining a relative state between the microphone and the speaker according to the actual audio parameters and the original audio parameters; determining a terminal control operation to be performed, according to the relative state and a pre-set correspondence between relative states and terminal control operations; and performing the determined terminal control operation on a terminal where the microphone is located.

Type: Grant

Filed: June 3, 2020

Date of Patent: January 31, 2023

Assignee: ZTE CORPORATION

Inventors: Shaowu Shen, Liting Liu
Transliteration based data augmentation for training multilingual ASR acoustic models in low resource settings

Patent number: 11568858

Abstract: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

Type: Grant

Filed: October 17, 2020

Date of Patent: January 31, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Samuel Thomas, Kartik Audhkhasi, Brian E. D. Kingsbury

prev 1 2 3 4 5 6 7 … next