Patents Examined by Shaun Roberts
-
Patent number: 11664021Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.Type: GrantFiled: December 9, 2021Date of Patent: May 30, 2023Assignee: Google LLCInventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
-
Patent number: 11664040Abstract: An apparatus for processing an audio signal includes an audio signal analyzer and a filter. The audio signal analyzer is configured to analyze an audio signal to determine a plurality of noise suppression filter values for a plurality of bands of the audio signal, wherein the analyzer is configured to determine a noise suppression filter value so that a noise suppression filter value is greater than or equal to a minimum noise suppression filter value and so that the minimum noise suppression value depends on a characteristic of the audio signal. The filter is configured for filtering the audio signal, wherein the filter is adjusted based on the noise suppression filter values.Type: GrantFiled: March 23, 2021Date of Patent: May 30, 2023Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.Inventors: Anthony Lombard, Bernhard Birzer, Dirk Mahne, Edwin Mabande, Fabian Kuech, Emanuel Habets, Paolo Annibale
-
Patent number: 11646047Abstract: The present document relates to audio source coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR), as well as to digital effect processors, e.g. exciters, where generation of harmonic distortion add brightness to the processed signal, and to time stretchers where a signal duration is prolonged with maintained spectral content. A system and method configured to generate a time stretched and/or frequency transposed signal from an input signal is described. The system comprises an analysis filterbank configured to provide an analysis subband signal from the input signal; wherein the analysis subband signal comprises a plurality of complex valued analysis samples, each having a phase and a magnitude. Furthermore, the system comprises a subband processing unit configured to determine a synthesis subband signal from the analysis subband signal using a subband transposition factor Q and a subband stretch factor S.Type: GrantFiled: May 23, 2022Date of Patent: May 9, 2023Assignee: Dolby International ABInventor: Lars Villemoes
-
Patent number: 11636858Abstract: A language proficiency analyzer automatically evaluates a person's language proficiency by analyzing that person's oral communications with another person. The analyzer first enhances the quality of an audio recording of a conversation between the two people using a neural network that automatically detects loss features in the audio and adds those loss features back into the audio. The analyzer then performs a textual and audio analysis on the improved audio. Through textual analysis, the analyzer uses a multi-attention network to determine how focused one person is on the other and/or how pleased one person is with the other. Through audio analysis, the analyzer uses a neural network to determine how well one person pronounced words during the conversation.Type: GrantFiled: October 12, 2021Date of Patent: April 25, 2023Assignee: Bank of America CorporationInventors: Madhusudhanan Krishnamoorthy, Harikrishnan Rajeev
-
Patent number: 11626115Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.Type: GrantFiled: January 24, 2022Date of Patent: April 11, 2023Assignee: GOOGLE LLCInventors: Barnaby James, Bo Wang, Sunil Vemuri, David Schairer, Ulas Kirazci, Ertan Dogrultan, Petar Aleksic
-
Patent number: 11626101Abstract: Systems and methods are described for processing and interpreting audible commands spoken in one or more languages. Speech recognition systems disclosed herein may be used as a stand-alone speech recognition system or comprise a portion of another content consumption system. A requesting user may provide audio input (e.g., command data) to the speech recognition system via a computing device to request an entertainment system to perform one or more operational commands. The speech recognition system may analyze the audio input across a variety of linguistic models, and may parse the audio input to identify a plurality of phrases and corresponding action classifiers. In some embodiments, the speech recognition system may utilize the action classifiers and other information to determine the one or more identified phrases that appropriately match the desired intent and operational command associated with the user's spoken command.Type: GrantFiled: October 28, 2021Date of Patent: April 11, 2023Assignee: Comcast Cable Communications, LLCInventors: George Thomas Des Jardins, Vikrant Sagar
-
Patent number: 11626108Abstract: A method of operating a customer utterance analysis system includes obtaining a subset of utterances from among a first set of utterances. The method includes encoding, by a sentence encoder, the subset of utterances into multi-dimensional vectors. The method includes generating reduced-dimensionality vectors by reducing a dimensionality of the multi-dimensional vectors. Each vector of the reduced-dimensionality vectors corresponds to an utterance from among the subset of utterances. The method includes performing clustering on the reduced-dimensionality vectors. The method includes, based on the clustering performed on the reduced-dimensionality vectors, arranging the subset of utterances into clusters. The method includes obtaining labels for a least two clusters from among the clusters. The method includes generating training data based on the obtained labels. The method includes training a neural network model to predict an intent of an utterance based on the training data.Type: GrantFiled: September 25, 2020Date of Patent: April 11, 2023Assignee: TD Ameritrade IP Company, Inc.Inventors: Abhilash Krishnankutty Nair, Amaris Yuseon Sim, Dayanand Narregudem, Drew David Riassetto, Logan Sommers Ahlstrom, Nafiseh Saberian, Stephen Filios, Ravindra Reddy Tappeta Venkata
-
Patent number: 11621011Abstract: Described herein is a method of decoding an audio or speech signal, the method including the steps of: (a) receiving, by a decoder, a coded bitstream including the audio or speech signal and conditioning information; (b) providing, by a bitstream decoder, decoded conditioning information in a format associated with a first bitrate; (c) converting, by a converter, the decoded conditioning information from the format associated with the first bitrate to a format associated with a second bitrate; and (d) providing, by a generative neural network, a reconstruction of the audio or speech signal according to a probabilistic model conditioned by the conditioning information in the format associated with the second bitrate. Described are further an apparatus for decoding an audio or speech signal, a respective encoder, a system of the encoder and the apparatus for decoding an audio or speech signal as well as a respective computer program product.Type: GrantFiled: October 29, 2019Date of Patent: April 4, 2023Assignee: Dolby International ABInventors: Janusz Klejsa, Per Hedelin
-
Patent number: 11620990Abstract: A method for optimizing speech recognition includes receiving a first acoustic segment characterizing a hotword detected by a hotword detector in streaming audio captured by a user device, extracting one or more hotword attributes from the first acoustic segment, and adjusting, based on the one or more hotword attributes extracted from the first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model. After adjusting the speech recognition parameters of the ASR model, the method also includes processing, using the ASR model, a second acoustic segment to generate a speech recognition result. The second acoustic segment characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device.Type: GrantFiled: December 11, 2020Date of Patent: April 4, 2023Assignee: Google LLCInventors: Matthew Sharifi, Aleksandar Kracun
-
Patent number: 11600281Abstract: There is disclosed inter alia an apparatus for spatial audio signal encoding comprising means for receiving for each time frequency block of a sub band of an audio frame a spatial audio parameter comprising an azimuth and an elevation; determining a first distortion measure for the audio frame by determining a first distance measure for each time frequency block and summing the first distance measure for each time frequency block; determining a second distortion measure for the audio frame by determining a second distance measure for each time frequency block and summing the second distance measure for each time frequency block, and selecting either the first quantization scheme or the second quantization scheme for quantising the elevation and the azimuth for all time frequency blocks of the sub band of the audio frame, wherein the selecting is dependent on the first and second distortion measures.Type: GrantFiled: September 20, 2019Date of Patent: March 7, 2023Assignee: Nokia Technologies OyInventor: Adriana Vasilache
-
Patent number: 11594230Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.Type: GrantFiled: May 4, 2021Date of Patent: February 28, 2023Assignee: Google LLCInventors: Ignacio Lopez Moreno, Li Wan, Quan Wang
-
Patent number: 11580981Abstract: An in-vehicle apparatus is connectable to a device that includes a voice assistant function. The in-vehicle apparatus includes: a voice detector that performs voice recognition of an audio signal input from a microphone and that controls functions of the in-vehicle apparatus based on a result of the voice recognition; and an interface that communicates with the device. When being informed of a detection of a predetermined word in the audio signal as the result of the voice recognition of the audio signal performed by the voice detector, the interface sends to the device, not via the voice detector, the audio signal input from the microphone. The predetermined word is for activating the voice assistant function of the device.Type: GrantFiled: March 3, 2021Date of Patent: February 14, 2023Assignee: DENSO TEN LimitedInventors: Katsuaki Hikima, Daisuke Yamasaki, Futoshi Kosuga
-
Patent number: 11568858Abstract: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.Type: GrantFiled: October 17, 2020Date of Patent: January 31, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Samuel Thomas, Kartik Audhkhasi, Brian E. D. Kingsbury
-
Patent number: 11568888Abstract: A terminal control method, a terminal and a non-transitory computer-readable storage medium are provided. The terminal control method includes: receiving, by a microphone, a detection audio signal emitted from a speaker and having a frequency within a pre-set detection frequency range; acquiring actual audio parameters of the detection audio signal when being received by the microphone, and original audio parameters of the detection audio signal when being emitted from the speaker; determining a relative state between the microphone and the speaker according to the actual audio parameters and the original audio parameters; determining a terminal control operation to be performed, according to the relative state and a pre-set correspondence between relative states and terminal control operations; and performing the determined terminal control operation on a terminal where the microphone is located.Type: GrantFiled: June 3, 2020Date of Patent: January 31, 2023Assignee: ZTE CORPORATIONInventors: Shaowu Shen, Liting Liu
-
Patent number: 11568878Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance.Type: GrantFiled: April 16, 2021Date of Patent: January 31, 2023Assignee: GOOGLE LLCInventors: Rajeev Rikhye, Quan Wang, Yanzhang He, Qiao Liang, Ian C. McGraw
-
Patent number: 11562758Abstract: An encoder operable to filter audio signals into a plurality of frequency band components, generate quantized digital components for each band, identify a potential for pre-echo events within the generated quantized digital components, generate an approximate signal by decoding the quantized digital components using inverse pulse code modulation, generate an error signal by comparing the approximate signal with the sampled audio signal, and process the error signal and quantized digital components. The encoder operable to process the error signal by processing delayed audio signals and Q band values, determining the potential for pre-echo events from the Q band values, and determining scale factors and MDCT block sizes for the potential for pre-echo events.Type: GrantFiled: March 29, 2022Date of Patent: January 24, 2023Assignee: IMMERSION NETWORKS, INC.Inventors: James David Johnston, Stephen Daniel White, King Wei Hor, Barry M. Genova
-
Patent number: 11562764Abstract: An apparatus for generating a bandwidth enhanced audio signal from an input audio signal having an input audio signal frequency range includes: a raw signal generator configured for generating a raw signal having an enhancement frequency range, wherein the enhancement frequency range is not included in the input audio signal frequency range; a neural network processor configured for generating a parametric representation for the enhancement frequency range using the input audio frequency range of the input audio signal and a trained neural network; and a raw signal processor for processing the raw signal using the parametric representation for the enhancement frequency range to obtain a processed raw signal having frequency components in the enhancement frequency range, wherein the processed raw signal or the processed raw signal and the input audio signal frequency range of the input audio signal represent the bandwidth enhanced audio signal.Type: GrantFiled: April 17, 2020Date of Patent: January 24, 2023Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.Inventors: Konstantin Schmidt, Christian Uhle, Bernd Edler
-
Patent number: 11562737Abstract: Speech recognition may be improved by generating and using a topic specific language model. A topic specific language model may be created by performing an initial pass on an audio signal using a generic or basis language model. A speech recognition device may then determine topics relating to the audio signal based on the words identified in the initial pass and retrieve a corpus of text relating to those topics. Using the retrieved corpus of text, the speech recognition device may create a topic specific language model. In one example, the speech recognition device may adapt or otherwise modify the generic language model based on the retrieved corpus of text.Type: GrantFiled: December 27, 2019Date of Patent: January 24, 2023Assignee: TIVO CORPORATIONInventors: David F. Houghton, Seth Michael Murray, Sibley Verbeck Simon
-
Patent number: 11562757Abstract: An audio signal encoding method performed by an encoder includes identifying a time-domain audio signal in a unit of blocks, quantizing a linear prediction coefficient extracted from a combined block in which a current original block of the audio signal and a previous original block chronologically adjacent to the current original block using frequency-domain linear predictive coding (LPC), generating a temporal envelope by dequantizing the quantized linear prediction coefficient, extracting a residual signal from the combined block based on the temporal envelope, quantizing the residual signal by one of time-domain quantization and frequency-domain quantization, and transforming the quantized residual signal and the quantized linear prediction coefficient into a bitstream.Type: GrantFiled: July 15, 2021Date of Patent: January 24, 2023Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Seung Kwon Beack, Jongmo Sung, Mi Suk Lee, Tae Jin Lee, Woo-taek Lim, Inseon Jang, Jin Soo Choi
-
Patent number: 11557287Abstract: Provided is a system which allows a learner who is a non-native speaker of a given language to intuitively improve pronunciation of the language. A pronunciation conversion apparatus includes a conversion section which converts a first feature value corresponding to a first speech signal obtained when a first speaker who speaks a given language as his/her native language speaks another language such that the first feature value approaches a second feature value corresponding to a second speech signal obtained when a second speaker who speaks the other language as his/her native language speaks the other language, each of the first feature value and the second feature value is a feature value capable of representing a difference in pronunciation, and a speech signal obtained from the first feature value after the conversion is presented to the first speaker.Type: GrantFiled: April 9, 2019Date of Patent: January 17, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventor: Sadao Hiroya