Normalizing Patents (Class 704/234)
-
Patent number: 11948599Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.Type: GrantFiled: January 6, 2022Date of Patent: April 2, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
-
Patent number: 11935537Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for selecting a digital assistant from among multiple digital assistants. An embodiment operates by receiving a voice input containing a trigger word at a first voice adapter associated with a digital assistant that generates a first confidence score for the trigger word. The embodiment further receives the voice input at a second voice adapter that generates a second confidence score for the trigger word. The embodiment determines the first confidence score is higher than the second confidence score. The embodiment selects the digital assistant based on the determining.Type: GrantFiled: April 19, 2023Date of Patent: March 19, 2024Assignee: Roku, Inc.Inventors: Frank Maker, Andrey Eltsov, Robert Curtis, Gregory Medding
-
Patent number: 11694697Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.Type: GrantFiled: June 29, 2020Date of Patent: July 4, 2023Inventors: Srinath Cheluvaraja, Ananth Nagaraja Iyer, Aravind Ganapathiraju, Felix Immanuel Wyss
-
Patent number: 11664026Abstract: Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for selecting a digital assistant from among multiple digital assistants. An embodiment operates by receiving a voice input containing a trigger word at a first voice adapter associated with a digital assistant that generates a first confidence score for the trigger word. The embodiment further receives the voice input at a second voice adapter that generates a second confidence score for the trigger word. The embodiment determines the first confidence score is higher than the second confidence score. The embodiment selects the digital assistant based on the determining.Type: GrantFiled: August 26, 2021Date of Patent: May 30, 2023Assignee: Roku, Inc.Inventors: Frank Maker, Andrey Eltsov, Robert Curtis, Gregory Medding
-
Patent number: 11574642Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.Type: GrantFiled: June 29, 2020Date of Patent: February 7, 2023Inventors: Srinath Cheluvaraja, Ananth Nagaraja Iyer, Aravind Ganapathiraju, Felix Immanuel Wyss
-
Patent number: 11468900Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.Type: GrantFiled: October 15, 2020Date of Patent: October 11, 2022Assignee: Google LLCInventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
-
Patent number: 11380333Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.Type: GrantFiled: December 4, 2019Date of Patent: July 5, 2022Assignee: Verint Systems Inc.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 11373655Abstract: An apparatus includes processor(s) to: perform preprocessing operations of a segmentation technique including divide speech data set into data chunks representing chunks of speech audio, use an acoustic model with each data chunk to identify pauses in the speech audio, and analyze a length of time of each identified pause to identify a candidate set of likely sentence pauses in the speech audio; and perform speech-to-text operations including divide the speech data set into data segments that each representing segments of the speech audio based on the candidate set of likely sentence pauses, use the acoustic model with each data segment to identify likely speech sounds in the speech audio, analyze the identified likely speech sounds to identify candidate sets of words likely spoken in the speech audio, and generate a transcript of the speech data set based at least on the candidate sets of words likely spoken.Type: GrantFiled: October 12, 2021Date of Patent: June 28, 2022Assignee: SAS INSTITUTE INC.Inventors: Xiaolong Li, Xiaozhuo Cheng, Xu Yang
-
Patent number: 11367450Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.Type: GrantFiled: December 4, 2019Date of Patent: June 21, 2022Assignee: Verint Systems Inc.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 11344225Abstract: A method of determining a value for an apnea-hypopnea index (AHI) for a person, the method comprising: recording a voice track of a person; extracting features from the voice track that characterize the voice track; and processing the features to determine an AHI.Type: GrantFiled: January 24, 2014Date of Patent: May 31, 2022Assignees: B. G. NEGEV TECHNOLOGIES AND APPLICATIONS LTD., MOR RESEARCH APPLICATIONS LTD.Inventors: Yaniv Zigel, Ariel Tarasiuk, Oren Elisha
-
Patent number: 11322154Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. At least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcribed customer service interaction.Type: GrantFiled: December 4, 2019Date of Patent: May 3, 2022Assignee: Verint Systems Inc.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 11257503Abstract: Receiving a raw speech signal from a human speaker; providing an acoustic representation of the raw speech signal if the raw speech signal is determined to be within one of a plurality of pre-defined acoustic domains; augmenting the raw speech signal with the acoustic representation to provide a plurality of augmented speech signals; determining a set of a plurality of Mel frequency cepstral coefficients for each of the plurality of augmented speech signals, wherein each set of the plurality of Mel frequency cepstral coefficients is transformed using domain-dependent transformations to obtain acoustic reference vector, such that there are a plurality of acoustic reference vectors, for each one of the plurality of augmented speech signals; stacking the plurality of acoustic reference vectors corresponding to each augmented speech signal to form a super acoustic reference vector; and processing the super acoustic reference vector through a neural network which has been previously trained on data from a pluralitType: GrantFiled: March 10, 2021Date of Patent: February 22, 2022Inventors: Vikram Ramesh Lakkavalli, Sunderrajan Vivekkumar
-
Patent number: 11227603Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.Type: GrantFiled: April 14, 2020Date of Patent: January 18, 2022Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 11082510Abstract: A method for identifying a push communication pattern includes creating clusters from a communication entity's response buffers. Clusters that meet a first criterion are detected. The communication entity is identified as having a push communication pattern upon a determination that the detected clusters meet a second criterion.Type: GrantFiled: January 26, 2012Date of Patent: August 3, 2021Assignee: MICRO FOCUS LLCInventors: Ofer Eliassaf, Amir Kessner, Meidan Zemer, Oded Keret, Moshe Eran Kraus
-
Method of providing service based on location of sound source and speech recognition device therefor
Patent number: 10984790Abstract: A speech recognition device is provided. The speech recognition device includes at least one microphone configured to receive a sound signal from a first sound source, and at least one processor configured to determine a direction of the first sound source based on the sound signal, determine whether the direction of the first sound source is in a registered direction, and based on whether the direction of the first sound source is in the registered direction, recognize a speech from the sound signal regardless of whether the sound signal comprises a wake-up keyword.Type: GrantFiled: November 28, 2018Date of Patent: April 20, 2021Assignee: Samsung Electronics Co., Ltd.Inventors: Hyeon-Taek Lim, Sang-Yoon Kim, Kyung-Min Lee, Chang-Woo Han, Nam-Hoon Kim, Jong-Youb Ryu, Chi-Youn Park, Jae-Won Lee -
Patent number: 10896624Abstract: A computer operable method is described for transforming phonemes, graphemes, and other language structures into interactive elements. The method may comprise, receiving a word, wherein the word consists of a group of phonemes; forming a group of graphemes, wherein the group of graphemes is constructed using information relating to the group of phonemes; and forming a group of manipulatives, wherein the group of manipulatives is constructed using information relating to the group of phonemes or the group of graphemes.Type: GrantFiled: June 18, 2018Date of Patent: January 19, 2021Assignee: KNOTBIRD LLCInventor: Richard Daniel Telep
-
Patent number: 10726830Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.Type: GrantFiled: September 27, 2018Date of Patent: July 28, 2020Assignee: Amazon Technologies, Inc.Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
-
Patent number: 10720164Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.Type: GrantFiled: December 4, 2019Date of Patent: July 21, 2020Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10692501Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.Type: GrantFiled: October 7, 2019Date of Patent: June 23, 2020Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10692500Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.Type: GrantFiled: September 30, 2019Date of Patent: June 23, 2020Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10650826Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.Type: GrantFiled: October 7, 2019Date of Patent: May 12, 2020Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10593332Abstract: Systems and methods diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.Type: GrantFiled: September 11, 2019Date of Patent: March 17, 2020Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10522153Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.Type: GrantFiled: October 25, 2018Date of Patent: December 31, 2019Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10446156Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.Type: GrantFiled: October 25, 2018Date of Patent: October 15, 2019Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10438592Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.Type: GrantFiled: October 25, 2018Date of Patent: October 8, 2019Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10418030Abstract: An acoustic model training device includes: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of: generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker; generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and training an acoustic model using the training data item of each speaker and the training data item of all the speakers.Type: GrantFiled: May 20, 2016Date of Patent: September 17, 2019Assignee: MITSUBISHI ELECTRIC CORPORATIONInventor: Toshiyuki Hanazawa
-
Patent number: 10210864Abstract: Methods and computing systems for enabling a voice command for communication between related devices are described. A training voice command of a user is processed to generate a voice command signature including a content characteristic and a sound characteristic. When the user wishes to transfer an on-going packet data session from a current device to a related device, the user inputs the same voice command. The voice command will be analyzed with the voice command signature to determine a correspondence before being executed.Type: GrantFiled: December 29, 2016Date of Patent: February 19, 2019Assignee: T-Mobile USA, Inc.Inventors: Yasmin Karimli, Gunjan Nimbavikar
-
Patent number: 10134401Abstract: Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.Type: GrantFiled: November 20, 2013Date of Patent: November 20, 2018Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10134400Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.Type: GrantFiled: November 20, 2013Date of Patent: November 20, 2018Assignee: Verint Systems Ltd.Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
-
Patent number: 10044533Abstract: A method (200) of bias cancellation for a radio channel sequence includes: receiving (201) a radio signal, the radio signal comprising a radio channel sequence coded by a first signature, the first signature belonging to a set of orthogonal signatures; decoding (202) the radio channel sequence based on the first signature to generate a decoded radio channel sequence; decoding (203) the radio channel sequence based on a second signature, wherein the second signature is orthogonal to the signatures of the set of orthogonal signatures, to generate a bias of the radio channel sequence; and canceling (204) the bias of the radio channel sequence from the decoded radio channel sequence.Type: GrantFiled: January 12, 2016Date of Patent: August 7, 2018Assignee: Intel IP CorporationInventors: Thomas Esch, Edgar Bolinth, Markus Jordan, Tobias Scholand, Michael Speth
-
Patent number: 9886948Abstract: Features are disclosed for improving the robustness of a neural network by using multiple (e.g., two or more) feature streams, combing data from the feature streams, and comparing the combined data to data from a subset of the feature streams (e.g., comparing values from the combined feature stream to values from one of the component feature streams of the combined feature stream). The neural network can include a component or layer that selects the data with the highest value, which can suppress or exclude some or all corrupted data from the combined feature stream. Subsequent layers of the neural network can restrict connections from the combined feature stream to a component feature stream to reduce the possibility that a corrupted combined feature stream will corrupt the component feature stream.Type: GrantFiled: January 5, 2015Date of Patent: February 6, 2018Assignee: Amazon Technologies, Inc.Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Bjorn Hoffmeister
-
Patent number: 9876985Abstract: A system, apparatus, and computer program product for monitoring a subject person's environment while the person is isolated from the environment. The system can use a microphone and/or a digital camera or imager to detect and capture sounds, voices, object, symbols, and faces in the subject person's environment, for example. The captured items can be analyzed, identified, and provided in an events log. The subject person can later review the events log to understand what happened while isolated. In various instances, the subject person can select an event from the log and review the underlying detected sounds, voices, object, symbols, and faces.Type: GrantFiled: September 3, 2014Date of Patent: January 23, 2018Assignee: HARMAN INTERNATIONAL INDUSTIES, INCORPORATEDInventors: Davide Di Censo, Stefan Marti
-
Patent number: 9418679Abstract: A method for processing a received set of speech data, wherein the received set of speech data comprises an utterance, is provided. The method executes a process to generate a plurality of confidence scores, wherein each of the plurality of confidence scores is associated with one of a plurality of candidate utterances; determines a plurality of difference values, each of the plurality of difference values comprising a difference between two of the plurality of confidence scores; and compares the plurality of difference values to determine at least one disparity.Type: GrantFiled: August 12, 2014Date of Patent: August 16, 2016Assignee: HONEYWELL INTERNATIONAL INC.Inventor: Erik T. Nelson
-
Patent number: 9251783Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.Type: GrantFiled: June 17, 2014Date of Patent: February 2, 2016Assignee: Sony Computer Entertainment Inc.Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
-
Patent number: 9137611Abstract: In response to a signal failing to exceed an estimated level of noise by more than a predetermined amount for more than a predetermined continuous duration, the estimated level of noise is adjusted according to a first time constant in response to the signal rising and a second time constant in response to the signal falling, so that the estimated level of noise falls more quickly than it rises. In response to the signal exceeding the estimated level of noise by more than the predetermined amount for more than the predetermined continuous duration, a speed of adjusting the estimated level of noise is accelerated.Type: GrantFiled: August 24, 2012Date of Patent: September 15, 2015Assignee: TEXAS INSTRUMENTS INCORPORATIONInventors: Takahiro Unno, Nitish Krishna Murthy
-
Patent number: 8996368Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.Type: GrantFiled: February 22, 2010Date of Patent: March 31, 2015Assignee: Nuance Communications, Inc.Inventor: Daniel Willett
-
Patent number: 8996373Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.Type: GrantFiled: October 5, 2011Date of Patent: March 31, 2015Assignee: Fujitsu LimitedInventors: Shoji Hayakawa, Naoshi Matsuo
-
Publication number: 20150088498Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.Type: ApplicationFiled: November 26, 2014Publication date: March 26, 2015Inventors: Vincent GOFFIN, Andrej LJOLJE, Murat Saraclar
-
Patent number: 8990080Abstract: Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed.Type: GrantFiled: January 27, 2012Date of Patent: March 24, 2015Assignee: Microsoft CorporationInventors: Mini Varkey, Bernardo Sana, Victor Boctor, Diego Carlomagno
-
Patent number: 8990086Abstract: A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.Type: GrantFiled: July 31, 2006Date of Patent: March 24, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Sang-Bae Jeong, Nam Hoon Kim, Ick Sang Han, In Jeong Choi, Gil Jin Jang, Jae-Hoon Jeong
-
Patent number: 8949128Abstract: Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application.Type: GrantFiled: February 12, 2010Date of Patent: February 3, 2015Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Corinne Bos-Plachez, Martine Marguerite Staessen
-
Patent number: 8942975Abstract: Techniques are described herein that suppress noise in a Mel-filtered spectral domain. For example, a window may be applied to a representation of a speech signal in a time domain. The windowed representation in the time domain may be converted to a subsequent representation of the speech signal in the Mel-filtered spectral domain. A noise suppression operation may be performed with respect to the subsequent representation to provide noise-suppressed Mel coefficients.Type: GrantFiled: March 22, 2011Date of Patent: January 27, 2015Assignee: Broadcom CorporationInventor: Jonas Borgstrom
-
Patent number: 8930188Abstract: An error concealment method and apparatus for an audio signal and a decoding method and apparatus for an audio signal using the error concealment method and apparatus. The error concealment method includes selecting one of an error concealment in a frequency domain and an error concealment in a time domain as an error concealment scheme for a current frame based on a predetermined criteria when an error occurs in the current frame, selecting one of a repetition scheme and an interpolation scheme in the frequency domain as the error concealment scheme for the current frame based on a predetermined criteria when the error concealment in the frequency domain is selected, and concealing the error of the current frame using the selected scheme.Type: GrantFiled: July 2, 2013Date of Patent: January 6, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Eun-mi Oh, Ki-hyun Choo, Ho-sang Sung, Chang-yong Son, Jung-hoe Kim, Kang eun Lee
-
Patent number: 8914291Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: September 24, 2013Date of Patent: December 16, 2014Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Patent number: 8909527Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.Type: GrantFiled: June 24, 2009Date of Patent: December 9, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
-
Patent number: 8838455Abstract: A system and method for facilitating user interaction with a voice application. A VoiceXML browser runs locally on a mobile device. Supporting components, such as a Resource Manager, a Call Data Manager, and a MRCP Gateway Client support operation of the VoiceXML browser. The Resource Manager servers either those files stored locally on the mobile device, or files accessible via a network connection using the wireless or mobile broadband capabilities of the mobile device. The Call Data Manager communicates call-specific data back to the application's system of origin or another configured target system. The MRCP Gateway Client provides the VoiceXML browser with access to media resources via a MRCP Gateway Client.Type: GrantFiled: June 13, 2008Date of Patent: September 16, 2014Assignee: West CorporationInventor: Chad Daniel Fox
-
Patent number: 8825486Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: January 22, 2014Date of Patent: September 2, 2014Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Publication number: 20140222423Abstract: Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of an orthogonal operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition.Type: ApplicationFiled: February 7, 2013Publication date: August 7, 2014Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Sandro Cumani, Pietro Laface
-
Patent number: 8798994Abstract: The present invention discloses a solution for conserving computing resources when implementing transformation based adaptation techniques. The disclosed solution limits the amount of speech data used by real-time adaptation algorithms to compute a transformation, which results in substantial computational savings. Appreciably, application of a transform is a relatively low memory and computationally cheap process compared to memory and resource requirements for computing the transform to be applied.Type: GrantFiled: February 6, 2008Date of Patent: August 5, 2014Assignee: International Business Machines CorporationInventors: John W. Eckhart, Michael Florio, Radek Hampl, Pavel Krbec, Jonathan Palgon
-
Patent number: 8798992Abstract: An signal processing apparatus, system and software product for audio modification/substitution of a background noise generated during an event including, but not be limited to, substituting or partially substituting a noise signal from one or more microphones by a pre-recorded noise, and/or selecting one or more noise signals from a plurality of microphones for further processing in real-time or near real-time broadcasting.Type: GrantFiled: May 18, 2011Date of Patent: August 5, 2014Assignee: Disney Enterprises, Inc.Inventors: Michael Gay, Jed Drake, Anthony Bailey