Normalizing Patents (Class 704/234)
  • Patent number: 11948599
    Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.
    Type: Grant
    Filed: January 6, 2022
    Date of Patent: April 2, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
  • Patent number: 11935537
    Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for selecting a digital assistant from among multiple digital assistants. An embodiment operates by receiving a voice input containing a trigger word at a first voice adapter associated with a digital assistant that generates a first confidence score for the trigger word. The embodiment further receives the voice input at a second voice adapter that generates a second confidence score for the trigger word. The embodiment determines the first confidence score is higher than the second confidence score. The embodiment selects the digital assistant based on the determining.
    Type: Grant
    Filed: April 19, 2023
    Date of Patent: March 19, 2024
    Assignee: Roku, Inc.
    Inventors: Frank Maker, Andrey Eltsov, Robert Curtis, Gregory Medding
  • Patent number: 11694697
    Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: July 4, 2023
    Inventors: Srinath Cheluvaraja, Ananth Nagaraja Iyer, Aravind Ganapathiraju, Felix Immanuel Wyss
  • Patent number: 11664026
    Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for selecting a digital assistant from among multiple digital assistants. An embodiment operates by receiving a voice input containing a trigger word at a first voice adapter associated with a digital assistant that generates a first confidence score for the trigger word. The embodiment further receives the voice input at a second voice adapter that generates a second confidence score for the trigger word. The embodiment determines the first confidence score is higher than the second confidence score. The embodiment selects the digital assistant based on the determining.
    Type: Grant
    Filed: August 26, 2021
    Date of Patent: May 30, 2023
    Assignee: Roku, Inc.
    Inventors: Frank Maker, Andrey Eltsov, Robert Curtis, Gregory Medding
  • Patent number: 11574642
    Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: February 7, 2023
    Inventors: Srinath Cheluvaraja, Ananth Nagaraja Iyer, Aravind Ganapathiraju, Felix Immanuel Wyss
  • Patent number: 11468900
    Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: October 11, 2022
    Assignee: Google LLC
    Inventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
  • Patent number: 11380333
    Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
    Type: Grant
    Filed: December 4, 2019
    Date of Patent: July 5, 2022
    Assignee: Verint Systems Inc.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 11373655
    Abstract: An apparatus includes processor(s) to: perform preprocessing operations of a segmentation technique including divide speech data set into data chunks representing chunks of speech audio, use an acoustic model with each data chunk to identify pauses in the speech audio, and analyze a length of time of each identified pause to identify a candidate set of likely sentence pauses in the speech audio; and perform speech-to-text operations including divide the speech data set into data segments that each representing segments of the speech audio based on the candidate set of likely sentence pauses, use the acoustic model with each data segment to identify likely speech sounds in the speech audio, analyze the identified likely speech sounds to identify candidate sets of words likely spoken in the speech audio, and generate a transcript of the speech data set based at least on the candidate sets of words likely spoken.
    Type: Grant
    Filed: October 12, 2021
    Date of Patent: June 28, 2022
    Assignee: SAS INSTITUTE INC.
    Inventors: Xiaolong Li, Xiaozhuo Cheng, Xu Yang
  • Patent number: 11367450
    Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
    Type: Grant
    Filed: December 4, 2019
    Date of Patent: June 21, 2022
    Assignee: Verint Systems Inc.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 11344225
    Abstract: A method of determining a value for an apnea-hypopnea index (AHI) for a person, the method comprising: recording a voice track of a person; extracting features from the voice track that characterize the voice track; and processing the features to determine an AHI.
    Type: Grant
    Filed: January 24, 2014
    Date of Patent: May 31, 2022
    Assignees: B. G. NEGEV TECHNOLOGIES AND APPLICATIONS LTD., MOR RESEARCH APPLICATIONS LTD.
    Inventors: Yaniv Zigel, Ariel Tarasiuk, Oren Elisha
  • Patent number: 11322154
    Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. At least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcribed customer service interaction.
    Type: Grant
    Filed: December 4, 2019
    Date of Patent: May 3, 2022
    Assignee: Verint Systems Inc.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 11257503
    Abstract: Receiving a raw speech signal from a human speaker; providing an acoustic representation of the raw speech signal if the raw speech signal is determined to be within one of a plurality of pre-defined acoustic domains; augmenting the raw speech signal with the acoustic representation to provide a plurality of augmented speech signals; determining a set of a plurality of Mel frequency cepstral coefficients for each of the plurality of augmented speech signals, wherein each set of the plurality of Mel frequency cepstral coefficients is transformed using domain-dependent transformations to obtain acoustic reference vector, such that there are a plurality of acoustic reference vectors, for each one of the plurality of augmented speech signals; stacking the plurality of acoustic reference vectors corresponding to each augmented speech signal to form a super acoustic reference vector; and processing the super acoustic reference vector through a neural network which has been previously trained on data from a pluralit
    Type: Grant
    Filed: March 10, 2021
    Date of Patent: February 22, 2022
    Inventors: Vikram Ramesh Lakkavalli, Sunderrajan Vivekkumar
  • Patent number: 11227603
    Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.
    Type: Grant
    Filed: April 14, 2020
    Date of Patent: January 18, 2022
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 11082510
    Abstract: A method for identifying a push communication pattern includes creating clusters from a communication entity's response buffers. Clusters that meet a first criterion are detected. The communication entity is identified as having a push communication pattern upon a determination that the detected clusters meet a second criterion.
    Type: Grant
    Filed: January 26, 2012
    Date of Patent: August 3, 2021
    Assignee: MICRO FOCUS LLC
    Inventors: Ofer Eliassaf, Amir Kessner, Meidan Zemer, Oded Keret, Moshe Eran Kraus
  • Patent number: 10984790
    Abstract: A speech recognition device is provided. The speech recognition device includes at least one microphone configured to receive a sound signal from a first sound source, and at least one processor configured to determine a direction of the first sound source based on the sound signal, determine whether the direction of the first sound source is in a registered direction, and based on whether the direction of the first sound source is in the registered direction, recognize a speech from the sound signal regardless of whether the sound signal comprises a wake-up keyword.
    Type: Grant
    Filed: November 28, 2018
    Date of Patent: April 20, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyeon-Taek Lim, Sang-Yoon Kim, Kyung-Min Lee, Chang-Woo Han, Nam-Hoon Kim, Jong-Youb Ryu, Chi-Youn Park, Jae-Won Lee
  • Patent number: 10896624
    Abstract: A computer operable method is described for transforming phonemes, graphemes, and other language structures into interactive elements. The method may comprise, receiving a word, wherein the word consists of a group of phonemes; forming a group of graphemes, wherein the group of graphemes is constructed using information relating to the group of phonemes; and forming a group of manipulatives, wherein the group of manipulatives is constructed using information relating to the group of phonemes or the group of graphemes.
    Type: Grant
    Filed: June 18, 2018
    Date of Patent: January 19, 2021
    Assignee: KNOTBIRD LLC
    Inventor: Richard Daniel Telep
  • Patent number: 10726830
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: July 28, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
  • Patent number: 10720164
    Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
    Type: Grant
    Filed: December 4, 2019
    Date of Patent: July 21, 2020
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10692501
    Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.
    Type: Grant
    Filed: October 7, 2019
    Date of Patent: June 23, 2020
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10692500
    Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: June 23, 2020
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10650826
    Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.
    Type: Grant
    Filed: October 7, 2019
    Date of Patent: May 12, 2020
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10593332
    Abstract: Systems and methods diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
    Type: Grant
    Filed: September 11, 2019
    Date of Patent: March 17, 2020
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10522153
    Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
    Type: Grant
    Filed: October 25, 2018
    Date of Patent: December 31, 2019
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10446156
    Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
    Type: Grant
    Filed: October 25, 2018
    Date of Patent: October 15, 2019
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10438592
    Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.
    Type: Grant
    Filed: October 25, 2018
    Date of Patent: October 8, 2019
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10418030
    Abstract: An acoustic model training device includes: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of: generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker; generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and training an acoustic model using the training data item of each speaker and the training data item of all the speakers.
    Type: Grant
    Filed: May 20, 2016
    Date of Patent: September 17, 2019
    Assignee: MITSUBISHI ELECTRIC CORPORATION
    Inventor: Toshiyuki Hanazawa
  • Patent number: 10210864
    Abstract: Methods and computing systems for enabling a voice command for communication between related devices are described. A training voice command of a user is processed to generate a voice command signature including a content characteristic and a sound characteristic. When the user wishes to transfer an on-going packet data session from a current device to a related device, the user inputs the same voice command. The voice command will be analyzed with the voice command signature to determine a correspondence before being executed.
    Type: Grant
    Filed: December 29, 2016
    Date of Patent: February 19, 2019
    Assignee: T-Mobile USA, Inc.
    Inventors: Yasmin Karimli, Gunjan Nimbavikar
  • Patent number: 10134401
    Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
    Type: Grant
    Filed: November 20, 2013
    Date of Patent: November 20, 2018
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10134400
    Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.
    Type: Grant
    Filed: November 20, 2013
    Date of Patent: November 20, 2018
    Assignee: Verint Systems Ltd.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 10044533
    Abstract: A method (200) of bias cancellation for a radio channel sequence includes: receiving (201) a radio signal, the radio signal comprising a radio channel sequence coded by a first signature, the first signature belonging to a set of orthogonal signatures; decoding (202) the radio channel sequence based on the first signature to generate a decoded radio channel sequence; decoding (203) the radio channel sequence based on a second signature, wherein the second signature is orthogonal to the signatures of the set of orthogonal signatures, to generate a bias of the radio channel sequence; and canceling (204) the bias of the radio channel sequence from the decoded radio channel sequence.
    Type: Grant
    Filed: January 12, 2016
    Date of Patent: August 7, 2018
    Assignee: Intel IP Corporation
    Inventors: Thomas Esch, Edgar Bolinth, Markus Jordan, Tobias Scholand, Michael Speth
  • Patent number: 9886948
    Abstract: Features are disclosed for improving the robustness of a neural network by using multiple (e.g., two or more) feature streams, combing data from the feature streams, and comparing the combined data to data from a subset of the feature streams (e.g., comparing values from the combined feature stream to values from one of the component feature streams of the combined feature stream). The neural network can include a component or layer that selects the data with the highest value, which can suppress or exclude some or all corrupted data from the combined feature stream. Subsequent layers of the neural network can restrict connections from the combined feature stream to a component feature stream to reduce the possibility that a corrupted combined feature stream will corrupt the component feature stream.
    Type: Grant
    Filed: January 5, 2015
    Date of Patent: February 6, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Bjorn Hoffmeister
  • Patent number: 9876985
    Abstract: A system, apparatus, and computer program product for monitoring a subject person's environment while the person is isolated from the environment. The system can use a microphone and/or a digital camera or imager to detect and capture sounds, voices, object, symbols, and faces in the subject person's environment, for example. The captured items can be analyzed, identified, and provided in an events log. The subject person can later review the events log to understand what happened while isolated. In various instances, the subject person can select an event from the log and review the underlying detected sounds, voices, object, symbols, and faces.
    Type: Grant
    Filed: September 3, 2014
    Date of Patent: January 23, 2018
    Assignee: HARMAN INTERNATIONAL INDUSTIES, INCORPORATED
    Inventors: Davide Di Censo, Stefan Marti
  • Patent number: 9418679
    Abstract: A method for processing a received set of speech data, wherein the received set of speech data comprises an utterance, is provided. The method executes a process to generate a plurality of confidence scores, wherein each of the plurality of confidence scores is associated with one of a plurality of candidate utterances; determines a plurality of difference values, each of the plurality of difference values comprising a difference between two of the plurality of confidence scores; and compares the plurality of difference values to determine at least one disparity.
    Type: Grant
    Filed: August 12, 2014
    Date of Patent: August 16, 2016
    Assignee: HONEYWELL INTERNATIONAL INC.
    Inventor: Erik T. Nelson
  • Patent number: 9251783
    Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.
    Type: Grant
    Filed: June 17, 2014
    Date of Patent: February 2, 2016
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
  • Patent number: 9137611
    Abstract: In response to a signal failing to exceed an estimated level of noise by more than a predetermined amount for more than a predetermined continuous duration, the estimated level of noise is adjusted according to a first time constant in response to the signal rising and a second time constant in response to the signal falling, so that the estimated level of noise falls more quickly than it rises. In response to the signal exceeding the estimated level of noise by more than the predetermined amount for more than the predetermined continuous duration, a speed of adjusting the estimated level of noise is accelerated.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: September 15, 2015
    Assignee: TEXAS INSTRUMENTS INCORPORATION
    Inventors: Takahiro Unno, Nitish Krishna Murthy
  • Patent number: 8996368
    Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.
    Type: Grant
    Filed: February 22, 2010
    Date of Patent: March 31, 2015
    Assignee: Nuance Communications, Inc.
    Inventor: Daniel Willett
  • Patent number: 8996373
    Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.
    Type: Grant
    Filed: October 5, 2011
    Date of Patent: March 31, 2015
    Assignee: Fujitsu Limited
    Inventors: Shoji Hayakawa, Naoshi Matsuo
  • Publication number: 20150088498
    Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Application
    Filed: November 26, 2014
    Publication date: March 26, 2015
    Inventors: Vincent GOFFIN, Andrej LJOLJE, Murat Saraclar
  • Patent number: 8990080
    Abstract: Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed.
    Type: Grant
    Filed: January 27, 2012
    Date of Patent: March 24, 2015
    Assignee: Microsoft Corporation
    Inventors: Mini Varkey, Bernardo Sana, Victor Boctor, Diego Carlomagno
  • Patent number: 8990086
    Abstract: A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.
    Type: Grant
    Filed: July 31, 2006
    Date of Patent: March 24, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Sang-Bae Jeong, Nam Hoon Kim, Ick Sang Han, In Jeong Choi, Gil Jin Jang, Jae-Hoon Jeong
  • Patent number: 8949128
    Abstract: Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application.
    Type: Grant
    Filed: February 12, 2010
    Date of Patent: February 3, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Corinne Bos-Plachez, Martine Marguerite Staessen
  • Patent number: 8942975
    Abstract: Techniques are described herein that suppress noise in a Mel-filtered spectral domain. For example, a window may be applied to a representation of a speech signal in a time domain. The windowed representation in the time domain may be converted to a subsequent representation of the speech signal in the Mel-filtered spectral domain. A noise suppression operation may be performed with respect to the subsequent representation to provide noise-suppressed Mel coefficients.
    Type: Grant
    Filed: March 22, 2011
    Date of Patent: January 27, 2015
    Assignee: Broadcom Corporation
    Inventor: Jonas Borgstrom
  • Patent number: 8930188
    Abstract: An error concealment method and apparatus for an audio signal and a decoding method and apparatus for an audio signal using the error concealment method and apparatus. The error concealment method includes selecting one of an error concealment in a frequency domain and an error concealment in a time domain as an error concealment scheme for a current frame based on a predetermined criteria when an error occurs in the current frame, selecting one of a repetition scheme and an interpolation scheme in the frequency domain as the error concealment scheme for the current frame based on a predetermined criteria when the error concealment in the frequency domain is selected, and concealing the error of the current frame using the selected scheme.
    Type: Grant
    Filed: July 2, 2013
    Date of Patent: January 6, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Eun-mi Oh, Ki-hyun Choo, Ho-sang Sung, Chang-yong Son, Jung-hoe Kim, Kang eun Lee
  • Patent number: 8914291
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: September 24, 2013
    Date of Patent: December 16, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Patent number: 8909527
    Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Grant
    Filed: June 24, 2009
    Date of Patent: December 9, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Patent number: 8838455
    Abstract: A system and method for facilitating user interaction with a voice application. A VoiceXML browser runs locally on a mobile device. Supporting components, such as a Resource Manager, a Call Data Manager, and a MRCP Gateway Client support operation of the VoiceXML browser. The Resource Manager servers either those files stored locally on the mobile device, or files accessible via a network connection using the wireless or mobile broadband capabilities of the mobile device. The Call Data Manager communicates call-specific data back to the application's system of origin or another configured target system. The MRCP Gateway Client provides the VoiceXML browser with access to media resources via a MRCP Gateway Client.
    Type: Grant
    Filed: June 13, 2008
    Date of Patent: September 16, 2014
    Assignee: West Corporation
    Inventor: Chad Daniel Fox
  • Patent number: 8825486
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: January 22, 2014
    Date of Patent: September 2, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Publication number: 20140222423
    Abstract: Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of an orthogonal operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition.
    Type: Application
    Filed: February 7, 2013
    Publication date: August 7, 2014
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Sandro Cumani, Pietro Laface
  • Patent number: 8798994
    Abstract: The present invention discloses a solution for conserving computing resources when implementing transformation based adaptation techniques. The disclosed solution limits the amount of speech data used by real-time adaptation algorithms to compute a transformation, which results in substantial computational savings. Appreciably, application of a transform is a relatively low memory and computationally cheap process compared to memory and resource requirements for computing the transform to be applied.
    Type: Grant
    Filed: February 6, 2008
    Date of Patent: August 5, 2014
    Assignee: International Business Machines Corporation
    Inventors: John W. Eckhart, Michael Florio, Radek Hampl, Pavel Krbec, Jonathan Palgon
  • Patent number: 8798992
    Abstract: An signal processing apparatus, system and software product for audio modification/substitution of a background noise generated during an event including, but not be limited to, substituting or partially substituting a noise signal from one or more microphones by a pre-recorded noise, and/or selecting one or more noise signals from a plurality of microphones for further processing in real-time or near real-time broadcasting.
    Type: Grant
    Filed: May 18, 2011
    Date of Patent: August 5, 2014
    Assignee: Disney Enterprises, Inc.
    Inventors: Michael Gay, Jed Drake, Anthony Bailey