Normalizing Patents (Class 704/234)

Audio event detection with window-based prediction

Patent number: 11948599

Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.

Type: Grant

Filed: January 6, 2022

Date of Patent: April 2, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
Trigger word detection with multiple digital assistants

Patent number: 11935537

Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for selecting a digital assistant from among multiple digital assistants. An embodiment operates by receiving a voice input containing a trigger word at a first voice adapter associated with a digital assistant that generates a first confidence score for the trigger word. The embodiment further receives the voice input at a second voice adapter that generates a second confidence score for the trigger word. The embodiment determines the first confidence score is higher than the second confidence score. The embodiment selects the digital assistant based on the determining.

Type: Grant

Filed: April 19, 2023

Date of Patent: March 19, 2024

Assignee: Roku, Inc.

Inventors: Frank Maker, Andrey Eltsov, Robert Curtis, Gregory Medding
System and method to correct for packet loss in ASR systems

Patent number: 11694697

Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.

Type: Grant

Filed: June 29, 2020

Date of Patent: July 4, 2023

Inventors: Srinath Cheluvaraja, Ananth Nagaraja Iyer, Aravind Ganapathiraju, Felix Immanuel Wyss
Trigger word detection with multiple digital assistants

Patent number: 11664026

Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for selecting a digital assistant from among multiple digital assistants. An embodiment operates by receiving a voice input containing a trigger word at a first voice adapter associated with a digital assistant that generates a first confidence score for the trigger word. The embodiment further receives the voice input at a second voice adapter that generates a second confidence score for the trigger word. The embodiment determines the first confidence score is higher than the second confidence score. The embodiment selects the digital assistant based on the determining.

Type: Grant

Filed: August 26, 2021

Date of Patent: May 30, 2023

Assignee: Roku, Inc.

Inventors: Frank Maker, Andrey Eltsov, Robert Curtis, Gregory Medding
System and method to correct for packet loss in ASR systems

Patent number: 11574642

Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.

Type: Grant

Filed: June 29, 2020

Date of Patent: February 7, 2023

Inventors: Srinath Cheluvaraja, Ananth Nagaraja Iyer, Aravind Ganapathiraju, Felix Immanuel Wyss
Speaker identification accuracy

Patent number: 11468900

Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.

Type: Grant

Filed: October 15, 2020

Date of Patent: October 11, 2022

Assignee: Google LLC

Inventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
System and method of diarization and labeling of audio data

Patent number: 11380333

Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

Type: Grant

Filed: December 4, 2019

Date of Patent: July 5, 2022

Assignee: Verint Systems Inc.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Dual use of acoustic model in speech-to-text framework

Patent number: 11373655

Abstract: An apparatus includes processor(s) to: perform preprocessing operations of a segmentation technique including divide speech data set into data chunks representing chunks of speech audio, use an acoustic model with each data chunk to identify pauses in the speech audio, and analyze a length of time of each identified pause to identify a candidate set of likely sentence pauses in the speech audio; and perform speech-to-text operations including divide the speech data set into data segments that each representing segments of the speech audio based on the candidate set of likely sentence pauses, use the acoustic model with each data segment to identify likely speech sounds in the speech audio, analyze the identified likely speech sounds to identify candidate sets of words likely spoken in the speech audio, and generate a transcript of the speech data set based at least on the candidate sets of words likely spoken.

Type: Grant

Filed: October 12, 2021

Date of Patent: June 28, 2022

Assignee: SAS INSTITUTE INC.

Inventors: Xiaolong Li, Xiaozhuo Cheng, Xu Yang
System and method of diarization and labeling of audio data

Patent number: 11367450

Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

Type: Grant

Filed: December 4, 2019

Date of Patent: June 21, 2022

Assignee: Verint Systems Inc.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Determining apnea-hypopnia index AHI from speech

Patent number: 11344225

Abstract: A method of determining a value for an apnea-hypopnea index (AHI) for a person, the method comprising: recording a voice track of a person; extracting features from the voice track that characterize the voice track; and processing the features to determine an AHI.

Type: Grant

Filed: January 24, 2014

Date of Patent: May 31, 2022

Assignees: B. G. NEGEV TECHNOLOGIES AND APPLICATIONS LTD., MOR RESEARCH APPLICATIONS LTD.

Inventors: Yaniv Zigel, Ariel Tarasiuk, Oren Elisha
Diarization using linguistic labeling

Patent number: 11322154

Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. At least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcribed customer service interaction.

Type: Grant

Filed: December 4, 2019

Date of Patent: May 3, 2022

Assignee: Verint Systems Inc.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Speaker recognition using domain independent embedding

Patent number: 11257503

Abstract: Receiving a raw speech signal from a human speaker; providing an acoustic representation of the raw speech signal if the raw speech signal is determined to be within one of a plurality of pre-defined acoustic domains; augmenting the raw speech signal with the acoustic representation to provide a plurality of augmented speech signals; determining a set of a plurality of Mel frequency cepstral coefficients for each of the plurality of augmented speech signals, wherein each set of the plurality of Mel frequency cepstral coefficients is transformed using domain-dependent transformations to obtain acoustic reference vector, such that there are a plurality of acoustic reference vectors, for each one of the plurality of augmented speech signals; stacking the plurality of acoustic reference vectors corresponding to each augmented speech signal to form a super acoustic reference vector; and processing the super acoustic reference vector through a neural network which has been previously trained on data from a pluralit

Type: Grant

Filed: March 10, 2021

Date of Patent: February 22, 2022

Inventors: Vikram Ramesh Lakkavalli, Sunderrajan Vivekkumar
System and method of video capture and search optimization for creating an acoustic voiceprint

Patent number: 11227603

Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.

Type: Grant

Filed: April 14, 2020

Date of Patent: January 18, 2022

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Identifying a push communication pattern

Patent number: 11082510

Abstract: A method for identifying a push communication pattern includes creating clusters from a communication entity's response buffers. Clusters that meet a first criterion are detected. The communication entity is identified as having a push communication pattern upon a determination that the detected clusters meet a second criterion.

Type: Grant

Filed: January 26, 2012

Date of Patent: August 3, 2021

Assignee: MICRO FOCUS LLC

Inventors: Ofer Eliassaf, Amir Kessner, Meidan Zemer, Oded Keret, Moshe Eran Kraus
Method of providing service based on location of sound source and speech recognition device therefor

Patent number: 10984790

Abstract: A speech recognition device is provided. The speech recognition device includes at least one microphone configured to receive a sound signal from a first sound source, and at least one processor configured to determine a direction of the first sound source based on the sound signal, determine whether the direction of the first sound source is in a registered direction, and based on whether the direction of the first sound source is in the registered direction, recognize a speech from the sound signal regardless of whether the sound signal comprises a wake-up keyword.

Type: Grant

Filed: November 28, 2018

Date of Patent: April 20, 2021

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyeon-Taek Lim, Sang-Yoon Kim, Kyung-Min Lee, Chang-Woo Han, Nam-Hoon Kim, Jong-Youb Ryu, Chi-Youn Park, Jae-Won Lee
System and methods for transforming language into interactive elements

Patent number: 10896624

Abstract: A computer operable method is described for transforming phonemes, graphemes, and other language structures into interactive elements. The method may comprise, receiving a word, wherein the word consists of a group of phonemes; forming a group of graphemes, wherein the group of graphemes is constructed using information relating to the group of phonemes; and forming a group of manipulatives, wherein the group of manipulatives is constructed using information relating to the group of phonemes or the group of graphemes.

Type: Grant

Filed: June 18, 2018

Date of Patent: January 19, 2021

Assignee: KNOTBIRD LLC

Inventor: Richard Daniel Telep
Deep multi-channel acoustic modeling

Patent number: 10726830

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Type: Grant

Filed: September 27, 2018

Date of Patent: July 28, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
System and method of diarization and labeling of audio data

Patent number: 10720164

Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

Type: Grant

Filed: December 4, 2019

Date of Patent: July 21, 2020

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Diarization using acoustic labeling to create an acoustic voiceprint

Patent number: 10692501

Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.

Type: Grant

Filed: October 7, 2019

Date of Patent: June 23, 2020

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Diarization using linguistic labeling to create and apply a linguistic model

Patent number: 10692500

Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

Type: Grant

Filed: September 30, 2019

Date of Patent: June 23, 2020

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Diarization using acoustic labeling

Patent number: 10650826

Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.

Type: Grant

Filed: October 7, 2019

Date of Patent: May 12, 2020

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Diarization using textual and audio speaker labeling

Patent number: 10593332

Abstract: Systems and methods diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

Type: Grant

Filed: September 11, 2019

Date of Patent: March 17, 2020

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Diarization using linguistic labeling

Patent number: 10522153

Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

Type: Grant

Filed: October 25, 2018

Date of Patent: December 31, 2019

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Diarization using textual and audio speaker labeling

Patent number: 10446156

Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

Type: Grant

Filed: October 25, 2018

Date of Patent: October 15, 2019

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Diarization using speech segment labeling

Patent number: 10438592

Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.

Type: Grant

Filed: October 25, 2018

Date of Patent: October 8, 2019

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Acoustic model training device, acoustic model training method, voice recognition device, and voice recognition method

Patent number: 10418030

Abstract: An acoustic model training device includes: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of: generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker; generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and training an acoustic model using the training data item of each speaker and the training data item of all the speakers.

Type: Grant

Filed: May 20, 2016

Date of Patent: September 17, 2019

Assignee: MITSUBISHI ELECTRIC CORPORATION

Inventor: Toshiyuki Hanazawa
Voice command for communication between related devices

Patent number: 10210864

Abstract: Methods and computing systems for enabling a voice command for communication between related devices are described. A training voice command of a user is processed to generate a voice command signature including a content characteristic and a sound characteristic. When the user wishes to transfer an on-going packet data session from a current device to a related device, the user inputs the same voice command. The voice command will be analyzed with the voice command signature to determine a correspondence before being executed.

Type: Grant

Filed: December 29, 2016

Date of Patent: February 19, 2019

Assignee: T-Mobile USA, Inc.

Inventors: Yasmin Karimli, Gunjan Nimbavikar
Diarization using linguistic labeling

Patent number: 10134401

Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.

Type: Grant

Filed: November 20, 2013

Date of Patent: November 20, 2018

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Diarization using acoustic labeling

Patent number: 10134400

Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.

Type: Grant

Filed: November 20, 2013

Date of Patent: November 20, 2018

Assignee: Verint Systems Ltd.

Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
Method and device for canceling a bias of a radio channel sequence

Patent number: 10044533

Abstract: A method (200) of bias cancellation for a radio channel sequence includes: receiving (201) a radio signal, the radio signal comprising a radio channel sequence coded by a first signature, the first signature belonging to a set of orthogonal signatures; decoding (202) the radio channel sequence based on the first signature to generate a decoded radio channel sequence; decoding (203) the radio channel sequence based on a second signature, wherein the second signature is orthogonal to the signatures of the set of orthogonal signatures, to generate a bias of the radio channel sequence; and canceling (204) the bias of the radio channel sequence from the decoded radio channel sequence.

Type: Grant

Filed: January 12, 2016

Date of Patent: August 7, 2018

Assignee: Intel IP Corporation

Inventors: Thomas Esch, Edgar Bolinth, Markus Jordan, Tobias Scholand, Michael Speth
Neural network processing of multiple feature streams using max pooling and restricted connectivity

Patent number: 9886948

Abstract: Features are disclosed for improving the robustness of a neural network by using multiple (e.g., two or more) feature streams, combing data from the feature streams, and comparing the combined data to data from a subset of the feature streams (e.g., comparing values from the combined feature stream to values from one of the component feature streams of the combined feature stream). The neural network can include a component or layer that selects the data with the highest value, which can suppress or exclude some or all corrupted data from the combined feature stream. Subsequent layers of the neural network can restrict connections from the combined feature stream to a component feature stream to reduce the possibility that a corrupted combined feature stream will corrupt the component feature stream.

Type: Grant

Filed: January 5, 2015

Date of Patent: February 6, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Bjorn Hoffmeister
Providing a log of events to an isolated user

Patent number: 9876985

Abstract: A system, apparatus, and computer program product for monitoring a subject person's environment while the person is isolated from the environment. The system can use a microphone and/or a digital camera or imager to detect and capture sounds, voices, object, symbols, and faces in the subject person's environment, for example. The captured items can be analyzed, identified, and provided in an events log. The subject person can later review the events log to understand what happened while isolated. In various instances, the subject person can select an event from the log and review the underlying detected sounds, voices, object, symbols, and faces.

Type: Grant

Filed: September 3, 2014

Date of Patent: January 23, 2018

Assignee: HARMAN INTERNATIONAL INDUSTIES, INCORPORATED

Inventors: Davide Di Censo, Stefan Marti
Methods and apparatus for interpreting received speech data using speech recognition

Patent number: 9418679

Abstract: A method for processing a received set of speech data, wherein the received set of speech data comprises an utterance, is provided. The method executes a process to generate a plurality of confidence scores, wherein each of the plurality of confidence scores is associated with one of a plurality of candidate utterances; determines a plurality of difference values, each of the plurality of difference values comprising a difference between two of the plurality of confidence scores; and compares the plurality of difference values to determine at least one disparity.

Type: Grant

Filed: August 12, 2014

Date of Patent: August 16, 2016

Assignee: HONEYWELL INTERNATIONAL INC.

Inventor: Erik T. Nelson
Speech syllable/vowel/phone boundary detection using auditory attention cues

Patent number: 9251783

Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.

Type: Grant

Filed: June 17, 2014

Date of Patent: February 2, 2016

Assignee: Sony Computer Entertainment Inc.

Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
Method, system and computer program product for estimating a level of noise

Patent number: 9137611

Abstract: In response to a signal failing to exceed an estimated level of noise by more than a predetermined amount for more than a predetermined continuous duration, the estimated level of noise is adjusted according to a first time constant in response to the signal rising and a second time constant in response to the signal falling, so that the estimated level of noise falls more quickly than it rises. In response to the signal exceeding the estimated level of noise by more than the predetermined amount for more than the predetermined continuous duration, a speed of adjusting the estimated level of noise is accelerated.

Type: Grant

Filed: August 24, 2012

Date of Patent: September 15, 2015

Assignee: TEXAS INSTRUMENTS INCORPORATION

Inventors: Takahiro Unno, Nitish Krishna Murthy
Online maximum-likelihood mean and variance normalization for speech recognition

Patent number: 8996368

Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.

Type: Grant

Filed: February 22, 2010

Date of Patent: March 31, 2015

Assignee: Nuance Communications, Inc.

Inventor: Daniel Willett
State detection device and state detecting method

Patent number: 8996373

Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.

Type: Grant

Filed: October 5, 2011

Date of Patent: March 31, 2015

Assignee: Fujitsu Limited

Inventors: Shoji Hayakawa, Naoshi Matsuo
LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION

Publication number: 20150088498

Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Application

Filed: November 26, 2014

Publication date: March 26, 2015

Inventors: Vincent GOFFIN, Andrej LJOLJE, Murat Saraclar
Techniques to normalize names efficiently for name-based speech recognition grammars

Patent number: 8990080

Abstract: Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed.

Type: Grant

Filed: January 27, 2012

Date of Patent: March 24, 2015

Assignee: Microsoft Corporation

Inventors: Mini Varkey, Bernardo Sana, Victor Boctor, Diego Carlomagno
Recognition confidence measuring by lexical distance between candidates

Patent number: 8990086

Abstract: A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.

Type: Grant

Filed: July 31, 2006

Date of Patent: March 24, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sang-Bae Jeong, Nam Hoon Kim, Ick Sang Han, In Jeong Choi, Gil Jin Jang, Jae-Hoon Jeong
Method and apparatus for providing speech output for speech-enabled applications

Patent number: 8949128

Abstract: Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application.

Type: Grant

Filed: February 12, 2010

Date of Patent: February 3, 2015

Assignee: Nuance Communications, Inc.

Inventors: Darren C. Meyer, Corinne Bos-Plachez, Martine Marguerite Staessen
Noise suppression in a Mel-filtered spectral domain

Patent number: 8942975

Abstract: Techniques are described herein that suppress noise in a Mel-filtered spectral domain. For example, a window may be applied to a representation of a speech signal in a time domain. The windowed representation in the time domain may be converted to a subsequent representation of the speech signal in the Mel-filtered spectral domain. A noise suppression operation may be performed with respect to the subsequent representation to provide noise-suppressed Mel coefficients.

Type: Grant

Filed: March 22, 2011

Date of Patent: January 27, 2015

Assignee: Broadcom Corporation

Inventor: Jonas Borgstrom
Error concealment method and apparatus for audio signal and decoding method and apparatus for audio signal using the same

Patent number: 8930188

Abstract: An error concealment method and apparatus for an audio signal and a decoding method and apparatus for an audio signal using the error concealment method and apparatus. The error concealment method includes selecting one of an error concealment in a frequency domain and an error concealment in a time domain as an error concealment scheme for a current frame based on a predetermined criteria when an error occurs in the current frame, selecting one of a repetition scheme and an interpolation scheme in the frequency domain as the error concealment scheme for the current frame based on a predetermined criteria when the error concealment in the frequency domain is selected, and concealing the error of the current frame using the selected scheme.

Type: Grant

Filed: July 2, 2013

Date of Patent: January 6, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Eun-mi Oh, Ki-hyun Choo, Ho-sang Sung, Chang-yong Son, Jung-hoe Kim, Kang eun Lee
Method and apparatus for generating synthetic speech with contrastive stress

Patent number: 8914291

Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.

Type: Grant

Filed: September 24, 2013

Date of Patent: December 16, 2014

Assignee: Nuance Communications, Inc.

Inventors: Darren C. Meyer, Stephen R. Springer
Low latency real-time vocal tract length normalization

Patent number: 8909527

Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Grant

Filed: June 24, 2009

Date of Patent: December 9, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
VoiceXML browser and supporting components for mobile devices

Patent number: 8838455

Abstract: A system and method for facilitating user interaction with a voice application. A VoiceXML browser runs locally on a mobile device. Supporting components, such as a Resource Manager, a Call Data Manager, and a MRCP Gateway Client support operation of the VoiceXML browser. The Resource Manager servers either those files stored locally on the mobile device, or files accessible via a network connection using the wireless or mobile broadband capabilities of the mobile device. The Call Data Manager communicates call-specific data back to the application's system of origin or another configured target system. The MRCP Gateway Client provides the VoiceXML browser with access to media resources via a MRCP Gateway Client.

Type: Grant

Filed: June 13, 2008

Date of Patent: September 16, 2014

Assignee: West Corporation

Inventor: Chad Daniel Fox
Method and apparatus for generating synthetic speech with contrastive stress

Patent number: 8825486

Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.

Type: Grant

Filed: January 22, 2014

Date of Patent: September 2, 2014

Assignee: Nuance Communications, Inc.

Inventors: Darren C. Meyer, Stephen R. Springer
Method and Apparatus for Efficient I-Vector Extraction

Publication number: 20140222423

Abstract: Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of an orthogonal operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition.

Type: Application

Filed: February 7, 2013

Publication date: August 7, 2014

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Sandro Cumani, Pietro Laface
Resource conservative transformation based unsupervised speaker adaptation

Patent number: 8798994

Abstract: The present invention discloses a solution for conserving computing resources when implementing transformation based adaptation techniques. The disclosed solution limits the amount of speech data used by real-time adaptation algorithms to compute a transformation, which results in substantial computational savings. Appreciably, application of a transform is a relatively low memory and computationally cheap process compared to memory and resource requirements for computing the transform to be applied.

Type: Grant

Filed: February 6, 2008

Date of Patent: August 5, 2014

Assignee: International Business Machines Corporation

Inventors: John W. Eckhart, Michael Florio, Radek Hampl, Pavel Krbec, Jonathan Palgon
Audio noise modification for event broadcasting

Patent number: 8798992

Abstract: An signal processing apparatus, system and software product for audio modification/substitution of a background noise generated during an event including, but not be limited to, substituting or partially substituting a noise signal from one or more microphones by a pre-recorded noise, and/or selecting one or more noise signals from a plurality of microphones for further processing in real-time or near real-time broadcasting.

Type: Grant

Filed: May 18, 2011

Date of Patent: August 5, 2014

Assignee: Disney Enterprises, Inc.

Inventors: Michael Gay, Jed Drake, Anthony Bailey

1 2 3 4 5 next