Detect Speech In Noise Patents (Class 704/233)
  • Patent number: 8725508
    Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.
    Type: Grant
    Filed: March 27, 2012
    Date of Patent: May 13, 2014
    Assignee: Novospeech
    Inventor: Yossef Ben-Ezra
  • Patent number: 8725506
    Abstract: A speech processing engine is provided that in some embodiments, employs Kalman filtering with a particular speaker's glottal information to clean up an audio speech signal for more efficient automatic speech recognition.
    Type: Grant
    Filed: June 30, 2010
    Date of Patent: May 13, 2014
    Assignee: Intel Corporation
    Inventors: Willem M. Beltman, Matias Zanartu, Arijit Raychowdhury, Anand P. Rangarajan, Michael E. Deisher
  • Publication number: 20140122068
    Abstract: According to an embodiment, a signal processing apparatus includes an ambient sound estimating unit, a representative component estimating unit, a voice estimating unit, and a filter generating unit. The ambient sound estimating unit is configured to estimate, from the feature, an ambient sound component that is non-stationary among ambient sound components having a feature. The representative component estimating unit is configured to estimate a representative component representing ambient sound components estimated from one or more features for a time period, based on a largest value among the ambient sound components within the time period. The voice estimating unit is configured to estimate, from the feature, a voice component having the feature. The filter generating unit is configured to generate a filter for extracting a voice component and an ambient sound component from the feature, based on the voice component and the representative component.
    Type: Application
    Filed: October 21, 2013
    Publication date: May 1, 2014
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Makoto HIROHATA, Masashi Nishiyama
  • Patent number: 8712762
    Abstract: During speech processing a filter adjusts a spectral envelope of an input speech signal with a frequency dependent adjustment factor. The adjustment factors for respective spectral components are selected dependent on the input speech signal. The factor is set to a first or second non-zero value, the second value being smaller than the first value, when a strength average for the spectral component is above and below a threshold value respectively.
    Type: Grant
    Filed: July 27, 2007
    Date of Patent: April 29, 2014
    Assignee: Vereniging voor Christelijk Hoger Onderwijs, Wetenschappelijk Onderzoek en Patiëntenzor
    Inventors: Finn Dubbelboer, Tammo Houtgast
  • Patent number: 8712771
    Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.
    Type: Grant
    Filed: October 31, 2013
    Date of Patent: April 29, 2014
    Inventor: Alon Konchitsky
  • Patent number: 8712770
    Abstract: The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (U?) estimated by one or both of the two speech input devices (X?(T)) and an arbitrary subtraction constant (?) to obtain a resultant subtracted power spectrum (Y?(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (D?(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (?) to obtain a power spectrum for speech recognition (Z?(T)).
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: April 29, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Patent number: 8706482
    Abstract: The present invention provides a voice coder for voice communication that employs a multi-microphone system as part of an improved approach to enhancing signal quality and improving the signal to noise ratio for such voice communications, where there is a special relationship between the position of a first microphone and a second microphone to provide the communication device with certain advantageous physical and acoustic properties. In addition, the communication device can have certain physical characteristics and design features. In a two microphone arrangement, the first microphone is located in a location directed toward the speech source, while the second microphone is located in a location that provides a voice signal with significantly lower signal-to-noise ratio (SNR).
    Type: Grant
    Filed: June 10, 2010
    Date of Patent: April 22, 2014
    Assignee: Nth Data Processing L.L.C.
    Inventor: Alon Konchitsky
  • Patent number: 8700406
    Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.
    Type: Grant
    Filed: August 19, 2011
    Date of Patent: April 15, 2014
    Assignee: Qualcomm Incorporated
    Inventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
  • Patent number: 8700398
    Abstract: An interactive user interface is described for setting confidence score thresholds in a language processing system. There is a display of a first system confidence score curve characterizing system recognition performance associated with a high confidence threshold, a first user control for adjusting the high confidence threshold and an associated visual display highlighting a point on the first system confidence score curve representing the selected high confidence threshold, a display of a second system confidence score curve characterizing system recognition performance associated with a low confidence threshold, and a second user control for adjusting the low confidence threshold and an associated visual display highlighting a point on the second system confidence score curve representing the selected low confidence threshold. The operation of the second user control is constrained to require that the low confidence threshold must be less than or equal to the high confidence threshold.
    Type: Grant
    Filed: November 29, 2011
    Date of Patent: April 15, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Jeffrey N. Marcus, Amy E. Ulug, William Bridges Smith, Jr.
  • Patent number: 8700394
    Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.
    Type: Grant
    Filed: March 24, 2010
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
  • Patent number: 8700390
    Abstract: A Voice Activity Detection/Silence Suppression (VAD/SS) system is connected to a channel of a transmission pipe. The channel provides a pathway for the transmission of energy. A method for operating a VAD/SS system includes detecting the energy on the channel, and activating or suppressing activation of the VAD/SS system depending upon the nature of the energy detected on the channel.
    Type: Grant
    Filed: October 7, 2013
    Date of Patent: April 15, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Bing Chen, James H. James
  • Patent number: 8699721
    Abstract: Systems and methods are described by which microphones comprising a mechanical filter can be accurately calibrated to each other in both amplitude and phase.
    Type: Grant
    Filed: June 29, 2010
    Date of Patent: April 15, 2014
    Assignee: AliphCom
    Inventor: Gregory C. Burnett
  • Patent number: 8693703
    Abstract: A method of combining at least two audio signals for generating an enhanced system output signal is described.
    Type: Grant
    Filed: May 2, 2008
    Date of Patent: April 8, 2014
    Assignee: GN Netcom A/S
    Inventor: Martin Rung
  • Patent number: 8694314
    Abstract: In a voice authentication apparatus, a characteristics analyzer analyzes characteristics of a sample noise which is generated around a subject while the subject generates a sample voice for authentication of the subject. A setting part sets a correction value according to the characteristics of the sample noise analyzed by the characteristics analyzer. A correction part corrects an index value, which indicates a degree of similarity between a feature quantity of a reference voice which has been previously registered and a feature quantity of the sample voice obtained from the subject, based on the set correction value. A determinator determines authenticity of the subject by comparing the corrected index value with a predetermined threshold value.
    Type: Grant
    Filed: September 5, 2007
    Date of Patent: April 8, 2014
    Assignee: Yamaha Corporation
    Inventors: Yasuo Yoshioka, Takehiko Kawahara
  • Patent number: 8692862
    Abstract: A method is provided in one embodiment and includes establishing a communication session involving a first endpoint and a second endpoint that are associated with a video conference in a network environment. The first endpoint is associated with a first identifier and the second endpoint is associated with a second identifier. The method also includes evaluating first audio data for the first endpoint, and determining a vocative parameter associated with the first audio data, where image data can be rendered on a user interface at the first endpoint based on the detecting of the vocative parameter.
    Type: Grant
    Filed: February 28, 2011
    Date of Patent: April 8, 2014
    Assignee: Cisco Technology, Inc.
    Inventor: Sylvia Olayinka Aya Manfa N'guessan
  • Publication number: 20140095157
    Abstract: At least one exemplary embodiment is directed to a method and device for voice operated control with learning. The method can include measuring a first sound received from a first microphone, measuring a second sound received from a second microphone, detecting a spoken voice based on an analysis of measurements taken at the first and second microphone, learning from the analysis when the user is speaking and a speaking level in noisy environments, training a decision unit from the learning to be robust to a detection of the spoken voice in the noisy environments, mixing the first sound and the second sound to produce a mixed signal, and controlling the production of the mixed signal based on the learning of one or more aspects of the spoken voice and ambient sounds in the noisy environments.
    Type: Application
    Filed: December 3, 2013
    Publication date: April 3, 2014
    Applicant: Personics Holdings, Inc.
    Inventors: John Usher, Steve Goldstein, Marc Boillot
  • Patent number: 8689105
    Abstract: The present invention is directed to a system and method for monitoring perceived quality of a packet-switched voice service in a network. The method includes the step of receiving a packetized voice communication via the packet-switched voice service. At least one objective measurement is obtained from the received packetized voice communication. User perceived quality of voice data is derived from the at least one objective measurement. The user perceived quality of voice data is provided to a user. The steps of receiving, obtaining, deriving, and providing are performed in real-time.
    Type: Grant
    Filed: December 31, 2008
    Date of Patent: April 1, 2014
    Assignee: Tekla Pehr LLC
    Inventors: William Christopher Hardy, Frank A. McKiel, Jr.
  • Patent number: 8682658
    Abstract: The equipment comprises two microphones, sampling means, and de-noising means. The de-noising means are non-frequency noise reduction means comprising a combiner having an adaptive filter performing an iterative search seeking to cancel the noise picked up by one of the microphones on the basis of a noise reference given by the other microphone sensor. The adaptive filter is a fractional delay filter modeling a delay that is shorter than the sampling period. The equipment also has voice activity detector means delivering a signal representative of the presence or the absence of speech from the user of the equipment. The adaptive filter receives this signal as input so as to enable it to act selectively: i) either to perform an adaptive search for the parameters of the filter in the absence of speech; ii) or else to “freeze” those parameters of the filter in the presence of speech.
    Type: Grant
    Filed: May 18, 2012
    Date of Patent: March 25, 2014
    Assignee: Parrot
    Inventors: Guillaume Vitte, Michael Herve
  • Patent number: 8682662
    Abstract: In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.
    Type: Grant
    Filed: August 13, 2012
    Date of Patent: March 25, 2014
    Assignee: Nokia Corporation
    Inventors: Riitta Elina Niemistö, Päivi Marianna Valve
  • Patent number: 8682654
    Abstract: Disclosed are systems, methods, and computer readable media having programs for classifying sports video. In one embodiment, a method includes: extracting, from an audio stream of a video clip, a plurality of key audio components contained therein; and classifying, using at least one of the plurality of key audio components, a sport type contained in the video clip. In one embodiment, a computer readable medium having a computer program for classifying ports video includes: logic configured to extract a plurality of key audio components from a video clip; and logic configured to classify a sport type corresponding to the video clip.
    Type: Grant
    Filed: April 25, 2006
    Date of Patent: March 25, 2014
    Assignee: Cyberlink Corp.
    Inventors: Ming-Jun Chen, Jiun-Fu Chen, Shih-Min Tang, Ho-Chao Huang
  • Patent number: 8676581
    Abstract: Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
    Type: Grant
    Filed: January 22, 2010
    Date of Patent: March 18, 2014
    Assignee: Microsoft Corporation
    Inventors: Jason Flaks, Dax Hawkins, Christian Klein, Mitchell Stephen Dernis, Tommer Leyvand, Ali M. Vassigh, Duncan McKay
  • Publication number: 20140074464
    Abstract: Some embodiments of the inventive subject matter may include a method for detecting speech loss and supplying appropriate recollection data to the user. The method can include detecting a speech stream from a user. The method can include converting the speech stream to text. The method can include storing the text. The method can include detecting an interruption to the speech stream, wherein the interruption to the speech stream indicates speech loss by the user. The method can include searching a catalog using the text as a search parameter to find relevant catalog data. The method can include presenting the relevant catalog data to remind the user about the speech stream.
    Type: Application
    Filed: September 12, 2012
    Publication date: March 13, 2014
    Applicant: International Business Machines Corporation
    Inventor: Scott H. Berens
  • Publication number: 20140067387
    Abstract: Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.
    Type: Application
    Filed: September 5, 2012
    Publication date: March 6, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Jinyu Li, Michael Lewis Seltzer, Yifan Gong
  • Publication number: 20140067388
    Abstract: A method and a system for robust voice activity detection under adverse environments are provided. The apparatus includes a controller for controlling a signal receiving module, a signal blocking module, a silent/non-silent classification module for discriminating silent blocks by comparing a temporal feature to a threshold, a total variation filtering module for enhancing voiced portions and reducing an effect of background noises, a frame division module for dividing a filtered signal into small frames, a residual processing module for estimating a noise floor, a silent/non-silent frame classification module, a voice/non-voice signal frame classification module based on autocorrelation features of a total variation filtered signal, a binary-flag merging and deletion module, a voice endpoint detection and correction module, and a voice endpoint storing/sending module. A decision-tree is arranged based on time and memory complexity of feature extraction methods.
    Type: Application
    Filed: September 4, 2013
    Publication date: March 6, 2014
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: M. Sabarimalai Manikandan, Saurabh TYAGI
  • Patent number: 8666738
    Abstract: A biometric-sensor assembly, e.g., for acoustic reflectometry of the vocal tract. In one embodiment, the sensor assembly includes a dental appliance for in-mouth mounting, an acoustic sensor attached to the dental appliance, and an optional headset. The dental appliance enables secure placement of the acoustic sensor in the mouth of the user, e.g., for tracking movements of the tongue and/or other internal articulators of the vocal tract. A boom arm of the headset can be used for holding an additional acoustic sensor and/or a miniature video camera, e.g., for tracking movements of the lips.
    Type: Grant
    Filed: May 24, 2011
    Date of Patent: March 4, 2014
    Assignee: Alcatel Lucent
    Inventor: Lothar Benedikt Moeller
  • Patent number: 8666737
    Abstract: A noise power estimation system for estimating noise power of each frequency spectral component includes a cumulative histogram generating section for generating a cumulative histogram for each frequency spectral component of a time series signal, in which the horizontal axis indicates index of power level and the vertical axis indicates cumulative frequency and which is weighted by exponential moving average; and a noise power estimation section for determining an estimated value of noise power for each frequency spectral component of the time series signal based on the cumulative histogram.
    Type: Grant
    Filed: September 14, 2011
    Date of Patent: March 4, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Hirofumi Nakajima, Kazuhiro Nakadai, Yuji Hasegawa
  • Patent number: 8666736
    Abstract: The present invention relates to a method for signal processing comprising the steps of providing a set of prototype spectral envelopes, providing a set of reference noise prototypes, wherein the reference noise prototypes are obtained from at least a sub-set of the provided set of prototype spectral envelopes, detecting a verbal utterance by at least one microphone to obtain a microphone signal, processing the microphone signal for noise reduction based on the provided reference noise prototypes to obtain an enhanced signal and encoding the enhanced signal based on the provided prototype spectral envelopes to obtain an encoded enhanced signal.
    Type: Grant
    Filed: August 7, 2009
    Date of Patent: March 4, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Tim Haulick, Mohamed Krini, Shreyas Paranjpe, Gerhard Schmidt
  • Patent number: 8666740
    Abstract: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.
    Type: Grant
    Filed: June 22, 2012
    Date of Patent: March 4, 2014
    Assignee: Google Inc.
    Inventors: Matthew I. Lloyd, Trausti T. Kristjansson
  • Publication number: 20140058726
    Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.
    Type: Application
    Filed: October 31, 2013
    Publication date: February 27, 2014
    Inventor: Alon Konchitsky
  • Patent number: 8660847
    Abstract: A system for integrating local speech recognition with cloud-based speech recognition in order to provide an efficient natural user interface is described. In some embodiments, a computing device determines a direction associated with a particular person within an environment and generates an audio recording associated with the direction. The computing device then performs local speech recognition on the audio recording in order to detect a first utterance spoken by the particular person and to detect one or more keywords within the first utterance. The first utterance may be detected by applying voice activity detection techniques to the audio recording. The first utterance and the one or more keywords are subsequently transferred to a server which may identify speech sounds within the first utterance associated with the one or more keywords and adapt one or more speech recognition techniques based on the identified speech sounds.
    Type: Grant
    Filed: September 2, 2011
    Date of Patent: February 25, 2014
    Assignee: Microsoft Corporation
    Inventors: Thomas M. Soemo, Leo Soong, Michael H. Kim, Chad R. Heinemann, Dax H. Hawkins
  • Patent number: 8660841
    Abstract: Apparatus for isolation of a media stream of a first modality from a complex media source having at least two media modality, and multiple objects, and events, comprises: recording devices for the different modalities; an associator for associating between events recorded in said first modality and events recorded in said second modality, and providing an association output; and an isolator that uses the association output for isolating those events in the first mode correlating with events in the second mode associated with a predetermined object, thereby to isolate a isolated media stream associated with said predetermined object. Thus it is possible to identify events such as hand or mouth movements, and associate these with sounds, and then produce a filtered track of only those sounds associated with the events. In this way a particular speaker or musical instrument can be isolated from a complex scene.
    Type: Grant
    Filed: April 6, 2008
    Date of Patent: February 25, 2014
    Assignee: Technion Research & Development Foundation Limited
    Inventors: Zohar Barzelay, Yoav Yosef Schechner
  • Patent number: 8660842
    Abstract: Speech recognition device uses visual information to narrow down the range of likely adaptation parameters even before a speaker makes an utterance. Images of the speaker and/or the environment are collected using an image capturing device, and then processed to extract biometric features and environmental features. The extracted features and environmental features are then used to estimate adaptation parameters. A voice sample may also be collected to refine the adaptation parameters for more accurate speech recognition.
    Type: Grant
    Filed: March 9, 2010
    Date of Patent: February 25, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventor: Antoine R. Raux
  • Patent number: 8655659
    Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: February 18, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventors: Qingfang Wang, Shouchun He
  • Patent number: 8655657
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving (i) audio data that encodes a spoken natural language query, and (ii) environmental audio data, obtaining a transcription of the spoken natural language query, determining a particular content type associated with one or more keywords in the transcription, providing at least a portion of the environmental audio data to a content recognition engine, and identifying a content item that has been output by the content recognition engine, and that matches the particular content type.
    Type: Grant
    Filed: February 15, 2013
    Date of Patent: February 18, 2014
    Assignee: Google Inc.
    Inventors: Matthew Sharifi, Gheorghe Postelnicu
  • Patent number: 8655650
    Abstract: A method is provided for decoding data streams in a voice communication system. The method includes: receiving two or more data streams having voice data encoded therein; decoding each data stream into a set of speech coding parameters; forming a set of combined speech coding parameters by combining the sets of decoded speech coding parameters, where speech coding parameters of a given type are combined with speech coding parameters of the same type; and inputting the set of combined speech coding parameters into a speech synthesizer.
    Type: Grant
    Filed: March 28, 2007
    Date of Patent: February 18, 2014
    Assignee: Harris Corporation
    Inventor: Mark W. Chamberlain
  • Patent number: 8650029
    Abstract: A voice activity detection (VAD) module analyzes a media file, such as an audio file or a video file, to determine whether one or more frames of the media file include speech. A speech recognizer generates feedback relating to an accuracy of the VAD determination. The VAD module leverages the feedback to improve subsequent VAD determinations. The VAD module also utilizes a look-ahead window associated with the media file to adjust estimated probabilities or VAD decisions for previously processed frames.
    Type: Grant
    Filed: February 25, 2011
    Date of Patent: February 11, 2014
    Assignee: Microsoft Corporation
    Inventors: Albert Joseph Kishan Thambiratnam, Weiwu Zhu, Frank Torsten Bernd Seide
  • Publication number: 20140039886
    Abstract: A Voice Activity Detection/Silence Suppression (VAD/SS) system is connected to a channel of a transmission pipe. The channel provides a pathway for the transmission of energy. A method for operating a VAD/SS system includes detecting the energy on the channel, and activating or suppressing activation of the VAD/SS system depending upon the nature of the energy detected on the channel.
    Type: Application
    Filed: October 7, 2013
    Publication date: February 6, 2014
    Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.
    Inventors: Bing Chen, James H. James
  • Patent number: 8645131
    Abstract: The disclosure describes a speech detection system for detecting one or more desired speech segments in an audio stream. The speech detection system includes an audio stream input and a speech detection technique. The speech detection technique may be performed in various ways, such as using pattern matching and/or signal processing. The pattern matching implementation may extract features representing types of sounds as in phrases, words, syllables, phonemes and so on. The signal processing implementation may extract spectrally-localized frequency-based features, amplitude-based features, and combinations of the frequency-based and amplitude-based features. Metrics may be obtained and used to determine a desired word in the audio stream. In addition, a keypad stream having keypad entries may be used in determining the desired word.
    Type: Grant
    Filed: October 16, 2009
    Date of Patent: February 4, 2014
    Inventors: Ashwin P. Rao, Gregory M. Aronov, Marat V. Garafutdinov
  • Patent number: 8645133
    Abstract: Encoding audio signals with selecting an encoding mode for encoding the signal categorizing the signal into active segments having voice activity and non-active segments having substantially no voice activity by using categorization parameters depending on the selected encoding mode and encoding at least the active segments using the selected encoding mode.
    Type: Grant
    Filed: February 7, 2013
    Date of Patent: February 4, 2014
    Assignee: Core Wireless Licensing S.a.r.l.
    Inventors: Kari Järvinen, Pasi Ojala, Ari Lakaniemi
  • Patent number: 8645132
    Abstract: Embodiments of the present invention improve content manipulation systems and methods using speech recognition. In one embodiment, the present invention includes a method comprising configuring a recognizer to recognize utterances in the presence of a background audio signal having particular audio characteristics. A composite signal comprising a first audio signal and a spoken utterance of a user is received by the recognizer, where the first audio signal comprises the particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal. The spoke utterance is recognized in the presence of the first audio signal when the spoken utterance is one of the predetermined utterances. An operation is performed on the first audio signal.
    Type: Grant
    Filed: August 24, 2011
    Date of Patent: February 4, 2014
    Assignee: Sensory, Inc.
    Inventors: Todd F. Mozer, Jeff Rogers, Pieter J. Vermeulen, Jonathan Shaw
  • Patent number: 8639502
    Abstract: A speech enhancement method (and concomitant computer-readable medium comprising computer software encoded thereon) comprising receiving samples of a user's speech, determining mel-frequency cepstral coefficients of the samples, constructing a Gaussian mixture model of the coefficients, receiving speech from a noisy environment, determining mel-frequency cepstral coefficients of the noisy speech, estimating mel-frequency cepstral coefficients of clean speech from the mel-frequency cepstral coefficients of the noisy speech and from the Gaussian mixture model, and outputting a time-domain waveform of enhanced speech computed from the estimated mel-frequency cepstral coefficients.
    Type: Grant
    Filed: February 16, 2010
    Date of Patent: January 28, 2014
    Assignee: Arrowhead Center, Inc.
    Inventors: Laura E. Boucheron, Phillip L. De Leon
  • Patent number: 8639508
    Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: January 28, 2014
    Assignee: General Motors LLC
    Inventors: Xufang Zhao, Gaurav Talwar
  • Publication number: 20140012573
    Abstract: A signal processing apparatus includes a speech recognition system and a voice activity detection unit. The voice activity detection unit is coupled to the speech recognition system, and arranged for detecting whether an audio signal is a voice signal and accordingly generating a voice activity detection result to the speech recognition system to control whether the speech recognition system should perform speech recognition upon the audio signal.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 9, 2014
    Inventors: Chia-Yu Hung, Tsung-Li Yeh, Yi-Chang Tu
  • Patent number: 8626514
    Abstract: A method, article of manufacture, and apparatus for presenting a plurality of auditory communications having associated data representing at least one term identified in the communication is disclosed. In an embodiment, this comprises providing a visual representation of the term, providing links associated with occurrences of the term in the plurality of communications, and when a link is selected, playing a portion of the communication corresponding to the occurrence of the term. A portion of the communication following the portion corresponding to the occurrence of the term may be played.
    Type: Grant
    Filed: October 1, 2004
    Date of Patent: January 7, 2014
    Assignee: EMC Corporation
    Inventors: Christopher Hercules Claudatos, William Dale Andruss
  • Publication number: 20140006019
    Abstract: A method for estimating background noise of an audio signal comprises detecting voice activity in one or more frames of the audio signal based on one or more first conditions. The method also comprises estimating a first background noise estimation if voice activity is not detected based on the one or more first conditions. Voice activity in the one or more frames of the audio signal based on one or more second conditions is detected. A second background noise estimation is estimated if voice activity is not detected based on the one or more second conditions. The voice activity is detected in the one or more frames less often based on the one or more first conditions than based on the one or more second conditions.
    Type: Application
    Filed: March 18, 2011
    Publication date: January 2, 2014
    Applicant: Nokia Corporation
    Inventors: Erkki Juhani Paajanen, Riitta Elina Niemistö
  • Patent number: 8620650
    Abstract: A system for combining signals includes a first microphone generating a first input signal having a first voice component and a first noise component, a second microphone generating a second input signal having a second voice component and a second noise component, a mixing circuit, and an adaptive filter. The mixing circuit applies a first gain having a value ? to the first input signal to produce a first scaled signal, applies a second gain having a value 1?? to the second input signal to produce a second scaled signal, and sums the first scaled signal and the second scaled signal to produce a summed signal. The adaptive filter computes an updated value of ? to minimize the energy of the summed signal based on the summed signal, the first input signal and the second input signal, and provides the updated value of ? to the mixing circuit.
    Type: Grant
    Filed: April 1, 2011
    Date of Patent: December 31, 2013
    Assignee: Bose Corporation
    Inventors: Luke C. Walters, Vasu Iyengar, Martin David Ring
  • Patent number: 8620646
    Abstract: A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and a representation of harmonic envelope at the estimated pitch. The estimated pitch and the representation of harmonic envelope may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.
    Type: Grant
    Filed: August 8, 2011
    Date of Patent: December 31, 2013
    Assignee: The Intellisis Corporation
    Inventors: David C. Bradley, Rodney Gateau, Daniel S. Goldin, Robert N. Hilton, Nicholas K. Fisher
  • Patent number: 8620655
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Grant
    Filed: August 10, 2011
    Date of Patent: December 31, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
  • Patent number: 8620653
    Abstract: Architecture that uses near-end speech detection and far-end energy level detection to notify a user when a local microphone and/or speaker that the user is using, are muted. A voice activity detector is employed to detect the presence of near-end speech, sense the existing mute state of the near-end microphone, and then notify the user when the current microphone is muted. Separately or in combination therewith, received far-end voice signals are detected, the associated energy level computed, the existing mute state of the near-end audio speaker is sensed, and the user notified when the speaker is muted and/or at a reduced volume setting. These determinations enhance the user experience when the architecture is employed for communications sessions where participants connect via different communications modalities by automatically notifying the user of the audio device state, without attempting to contribute only to find that a microphone or speaker was muted.
    Type: Grant
    Filed: June 18, 2009
    Date of Patent: December 31, 2013
    Assignee: Microsoft Corporation
    Inventor: Ross G. Cutler
  • Patent number: 8615391
    Abstract: An method and apparatus to extract an audio signal having an important spectral component (ISC) and a low bit-rate audio signal coding/decoding method using the method and apparatus to extract the ISC. The method of extracting the ISC includes calculating perceptual importance including an SMR (signal-to-mask ratio) value of transformed spectral audio signals by using a psychoacoustic model, selecting spectral signals having a masking threshold value smaller than that of the spectral audio signals using the SMR value as first ISCs, and extracting a spectral peak from the audio signals selected as the ISCs according to a predetermined weighting factor to select second ISCs. Accordingly, the perceptual important spectral components can be efficiently coded so as to obtain high sound quality at a low bit-rate.
    Type: Grant
    Filed: July 6, 2006
    Date of Patent: December 24, 2013
    Assignee: SAMSUNG Electronics Co., Ltd.
    Inventors: Junghoe Kim, Eunmi Oh, Konstantin Osipov, Boris Kudryashov