Normalizing Patents (Class 704/234)
  • Patent number: 8798985
    Abstract: A method for interpreting a dialogue between two terminals includes establishing a communication channel between interpretation terminals of two parties in response to an interpretation request; specifying a language of an initiating party and a language of the other party in each of the interpretation terminals of the two parties by exchanging information about the language of the initiating party used in the interpretation terminal of the initiating party and the language of the other party used in the interpretation terminal of the other party via the communication channel; recognizing speech uttered from the interpretation terminal of the initiating party; translating the speech recognized by the interpretation terminal of the initiating party into the language of the other party; and transmitting a sentence translated into the language of the other party to the interpretation terminal of the other party.
    Type: Grant
    Filed: June 2, 2011
    Date of Patent: August 5, 2014
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Seung Yun, Sanghun Kim
  • Publication number: 20140207448
    Abstract: A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 24, 2014
    Applicant: Microsoft Corporation
    Inventors: Shizhen Wang, Yifan Gong, Fileno Alleva
  • Patent number: 8781825
    Abstract: Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.
    Type: Grant
    Filed: August 24, 2011
    Date of Patent: July 15, 2014
    Assignee: Sensory, Incorporated
    Inventors: Jonathan Shaw, Pieter Vermeulen, Stephen Sutton, Robert Savoie
  • Patent number: 8768711
    Abstract: A method of voice-enabling an application for command and control and content navigation can include the application dynamically generating a markup language fragment specifying a command and control and content navigation grammar for the application, instantiating an interpreter from a voice library, and providing the markup language fragment to the interpreter. The method also can include the interpreter processing a speech input using the command and control and content navigation grammar specified by the markup language fragment and providing an event to the application indicating an instruction representative of the speech input.
    Type: Grant
    Filed: June 17, 2004
    Date of Patent: July 1, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Brien H. Muschett
  • Patent number: 8768695
    Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.
    Type: Grant
    Filed: June 13, 2012
    Date of Patent: July 1, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Yun Tang, Venkatesh Nagesha
  • Patent number: 8762143
    Abstract: Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information based on a previously recorded time and speed of the caller, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.
    Type: Grant
    Filed: May 29, 2007
    Date of Patent: June 24, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Mazin Gilbert
  • Patent number: 8744845
    Abstract: A noise estimation method for a noisy speech signal according to an embodiment of the present invention includes the steps of approximating a transformation spectrum by transforming an input noisy speech signal to a frequency domain, calculating a smoothed magnitude spectrum having a decreased difference in a magnitude of the transformation spectrum between neighboring frames, calculating a search spectrum to represent an estimated noise component of the smoothed magnitude spectrum, and estimating a noise spectrum by using a recursive average method using an adaptive forgetting factor defined by using the search spectrum. According to an embodiment of the present invention, the amount of calculation for noise estimation is small, and large-capacity memory is not required. Accordingly, the present invention can be easily implemented in hardware or software. Further, the accuracy of noise estimation can be increase because an adaptive procedure can be performed on each frequency sub-band.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: June 3, 2014
    Assignee: Transono Inc.
    Inventors: Sung Il Jung, Dong Gyung Ha
  • Patent number: 8711015
    Abstract: The invention relates to compressing of sparse data sets contains sequences of data values and position information therefor. The position information may be in the form of position indices defining active positions of the data values in a sparse vector of length N. The position information is encoded into the data values by adjusting one or more of the data values within a pre-defined tolerance range, so that a pre-defined mapping function of the data values and their positions is close to a target value. In one embodiment, the mapping function is defined using a sub-set of N filler values which elements are used to fill empty positions in the input sparse data vector. At the decoder, the correct data positions are identified by searching though possible sub-sets of filler values.
    Type: Grant
    Filed: August 24, 2011
    Date of Patent: April 29, 2014
    Assignee: Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, through the Communications Research Centre Canada
    Inventors: Frederic Mustiere, Hossein Najaf-Zadeh, Ramin Pishehvar, Hassan Lahdili, Louis Thibault, Martin Bouchard
  • Patent number: 8682671
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: April 17, 2013
    Date of Patent: March 25, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Patent number: 8660849
    Abstract: Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A user request is received, the user request including at least a speech input received from a user. The user request including the speech input is processed to obtain a representation of user intent for identifying items of a selection domain based on at least one selection criterion. A prompt is provided to the user, the prompt presenting two or more properties relevant to items of the selection domain and requesting the user to specify relative importance between the two or more properties. A listing of search results is provided to the user, where the listing of search results has been obtained based on the at least one selection criterion and the relative importance provided by the user.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: February 25, 2014
    Assignee: Apple Inc.
    Inventors: Thomas Robert Gruber, Adam John Cheyer, Didier Rene Guzzoni, Christopher Dean Brigham, Harry Joseph Saddler
  • Patent number: 8655659
    Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: February 18, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventors: Qingfang Wang, Shouchun He
  • Patent number: 8639508
    Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: January 28, 2014
    Assignee: General Motors LLC
    Inventors: Xufang Zhao, Gaurav Talwar
  • Patent number: 8600741
    Abstract: A system and method for tuning a speech recognition engine to an individual microphone using a database containing acoustical models for a plurality of microphones. Microphone performance characteristics are obtained from a microphone at a speech recognition engine, the database is searched for an acoustical model that matches the characteristics, and the speech recognition engine is then modified based on the matching acoustical model.
    Type: Grant
    Filed: August 20, 2008
    Date of Patent: December 3, 2013
    Assignee: General Motors LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan, Jesse T. Gratke, Subhash B. Gullapalli, Dana B. Fecher
  • Patent number: 8600744
    Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
    Type: Grant
    Filed: April 13, 2012
    Date of Patent: December 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Mazin Gilbert
  • Patent number: 8577678
    Abstract: A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Grant
    Filed: March 10, 2011
    Date of Patent: November 5, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Patent number: 8571870
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: October 29, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Patent number: 8560324
    Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: October 15, 2013
    Assignee: LG Electronics Inc.
    Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
  • Patent number: 8560316
    Abstract: The present invention relates to a system and method of making a verification decision within a speaker recognition system. A speech sample is gathered from a speaker over a period of time a verification score is then produce for said sample over the period. Once the verification score is determined a confidence measure is produced based on frame score observations from said sample over the period and a confidence measure calculated based on the standard Gaussian distribution. If the confidence measure indicates with a set level of confidence that the verification score is below the verification threshold the speaker is rejected and gathering process terminated.
    Type: Grant
    Filed: December 19, 2007
    Date of Patent: October 15, 2013
    Inventors: Robert Vogt, Michael Mason, Sridaran Subramanian
  • Patent number: 8554566
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: October 8, 2013
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8538752
    Abstract: The invention comprises a method and apparatus for predicting word accuracy. Specifically, the method comprises obtaining an utterance in speech data where the utterance comprises an actual word string, processing the utterance for generating an interpretation of the actual word string, processing the utterance to identify at least one utterance frame, and predicting a word accuracy associated with the interpretation according to at least one stationary signal-to-noise ratio and at least one non-stationary signal to noise ratio, wherein the at least one stationary signal-to-noise ratio and the at least one non-stationary signal to noise ratio are determined according to a frame energy associated with each of the at least one utterance frame.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: September 17, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mazin Gilbert, Hong Kook Kim
  • Patent number: 8538751
    Abstract: A speech recognition system and a speech recognizing method for high-accuracy speech recognition in the environment with ego noise are provided. A speech recognition system according to the present invention includes a sound source separating and speech enhancing section; an ego noise predicting section; and a missing feature mask generating section for generating missing feature masks using outputs of the sound source separating and speech enhancing section and the ego noise predicting section; an acoustic feature extracting section for extracting an acoustic feature of each sound source using an output for said each sound source of the sound source separating and speech enhancing section; and a speech recognizing section for performing speech recognition using outputs of the acoustic feature extracting section and the missing feature masks.
    Type: Grant
    Filed: June 10, 2011
    Date of Patent: September 17, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Gokhan Ince
  • Patent number: 8521536
    Abstract: A Mobile Voice Self Service (MVSS) mobile device and method thereof. A VoiceXML browser that is implemented directly on the MVSS mobile device may request a VoiceXML application from a VoiceXML application server and process it. A call data manager may also be implemented on the MVSS mobile device and may provide call data that, in conjunction with data from the VoiceXML application server, may authorize access to advanced Media Resource Control Protocol (MRCP) services, such as Automatic Speech Recognition (ASR) or Text-To-Speech (TTS). A media resource gateway may then provide the advanced MRCP services to the VoiceXML application processed by the VoiceXML application browser. Hotkey navigations and bookmarked application points to VoiceXML applications may be created and applied through application analysis and state tracking. Therein, VoiceXML document transitions and user input are stored to maintain application state changes until the user requests creation of an application bookmark.
    Type: Grant
    Filed: October 22, 2012
    Date of Patent: August 27, 2013
    Assignee: West Corporation
    Inventor: Chad Daniel Fox
  • Patent number: 8515747
    Abstract: A transmitted data that includes audio data and a transmitted spectral sharpness parameter representing a spectral harmonic/noise sharpness of a plurality of subbands are received. A measured spectral sharpness parameter is estimated from received audio data. The transmitted spectral sharpness parameter is compared with the measured spectral sharpness parameter. A main sharpness control parameter is formed for each of the decoded subbands. The main sharpness control parameter for each of the decoded subbands is analyzed. Ones of the decoded subbands are sharpened if the corresponding main sharpness control indicates that a corresponding subband is not sharp enough, wherein sharpened subbands are formed. Likewise, ones of the decoded subbands are flattened if the corresponding main sharpness control indicates that a corresponding subband is not flat enough, wherein flattened subbands are formed.
    Type: Grant
    Filed: September 4, 2009
    Date of Patent: August 20, 2013
    Assignee: Huawei Technologies Co., Ltd.
    Inventor: Yang Gao
  • Patent number: 8494847
    Abstract: A weighting factor learning system includes an audio recognition section that recognizes learning audio data and outputting the recognition result; a weighting factor updating section that updates a weighting factor applied to a score obtained from an acoustic model and a language model so that the difference between a correct-answer score calculated with the use of a correct-answer text of the learning audio data and a score of the recognition result becomes large; a convergence determination section that determines, with the use of the score after updating, whether to return to the weighting factor updating section to update the weighting factor again; and a weighting factor convergence determination section that determines, with the use of the score after updating, whether to return to the audio recognition section to perform the process again and update the weighting factor using the weighting factor updating section.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: July 23, 2013
    Assignee: NEC Corporation
    Inventors: Tadashi Emori, Yoshifumi Onishi
  • Patent number: 8489396
    Abstract: The system provides a technique for suppressing or eliminating tonal noise in and input signal. The system operates on the input signal at a plurality of frequency bins and uses information generated at a prior bin to assist in calculating values at subsequent bins. The system first identifies peaks in a signal and then determines if the peaks are from tonal effects. This can be done by comparing the estimated background noise of a current bin to the smoothed background noise of the same bin. The smoothed background noise can be calculated using an asymmetric IIR filter. When the ratio of the current background noise estimate to the currently calculated smoothed background noise is far greater than 1, tonal noise is assumed. When tonal noise is found, a number of suppression techniques can be applied to reduce the tonal noise, including gain suppression with fixed floor factor, an adaptive floor factor gain suppression technique, and a random phase technique.
    Type: Grant
    Filed: December 20, 2007
    Date of Patent: July 16, 2013
    Assignee: QNX Software Systems Limited
    Inventors: Phil A. Hetherington, Xueman Li
  • Publication number: 20130179164
    Abstract: A vehicle voice interface system calibration method comprising electronically convolving voice command data with voice impulse response data, electronically convolving audio system output data with feedback impulse response data, and calibrating the vehicle voice interface system. The voice command data is electronically convolved with voice impulse response data representing a voice acoustic signal path between an artificial mouth simulator and a first microphone, to simulate a voice acoustic transfer function pertaining to the passenger compartment. The audio system output data is convolved with feedback impulse response data representing a feedback acoustic signal path between a vehicle audio system output and a second microphone, to simulate a feedback acoustic transfer function pertaining to the passenger compartment. The voice interface system is calibrated to recognize voice commands represented by the voice command data based on the simulated voice and feedback acoustic transfer functions.
    Type: Application
    Filed: January 6, 2012
    Publication date: July 11, 2013
    Applicant: Nissan North America, Inc.
    Inventor: Patrick Dennis
  • Patent number: 8484267
    Abstract: Weight normalization in hardware or software without a division operator is described, using only right bit shift, addition and subtraction operations. A right bit shift is performed on an expected sum to effectively divide the expected sum by two to provide a first updated value for the expected sum. An iteration is performed which includes: incrementing with a first adder a first variable by the first updated value of the expected sum to provide an updated value for the first variable; subtracting with a first subtractor a second weight from a first weight to provide a first updated value for the first weight; and performing a left bit shift on the second weight to effectively multiply the second weight by two to provide a first updated value for the second weight.
    Type: Grant
    Filed: November 19, 2009
    Date of Patent: July 9, 2013
    Assignee: Xilinx, Inc.
    Inventor: Gabor Szedo
  • Patent number: 8478587
    Abstract: A sound analysis device comprises: a sound parameter calculation unit operable to acquire an audio signal and calculate a sound parameter for each of partial audio signals, the partial audio signals each being the acquired audio signal in a unit of time; a category determination unit operable to determine, from among a plurality of environmental sound categories, which environmental sound category each of the partial audio signals belongs to, based on a corresponding one of the calculated sound parameters; a section setting unit operable to sequentially set judgement target sections on a time axis as time elapses, each of the judgment target sections including two or more of the units of time, the two or more of the units of time being consecutive; and an environment judgment unit operable to judge, based on a number of partial audio signals in each environmental sound category determined in at least a most recent judgment target section, an environment that surrounds the sound analysis device in at least the
    Type: Grant
    Filed: March 13, 2008
    Date of Patent: July 2, 2013
    Assignee: Panasonic Corporation
    Inventors: Takashi Kawamura, Ryouichi Kawanishi
  • Patent number: 8463720
    Abstract: A method for defining a network of nodes is provided, each representing a unique concept, and making connections between individual concepts through unique relationships to other concepts. Each of the nodes is operable to store a unique identifier in the network and information regarding the concept in addition to the unique relationships.
    Type: Grant
    Filed: March 26, 2010
    Date of Patent: June 11, 2013
    Assignee: Neuric Technologies, LLC
    Inventors: Jennifer Seale, Hannah Lindsley, Timothy Allen Margheim
  • Patent number: 8447610
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: May 21, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Patent number: 8396711
    Abstract: A user's voice is authenticated by prompting a user to say a challenge phrase from a list of predetermined phrases and comparing the user's response with a prerecorded version of the same response. The user's stored recordings are associated with an electronic identification or serial number for a specific device, so that when communication is established using the device, only the specific user may authenticate the session. When several phrases and recordings are used, one may be selected at random for authentication so that fraudulent authentication using a recording of the user's voice may be thwarted. The system and method may be used for authenticating a device when it is first activated, such as a telephony device, or may be used when authenticating a specific communications session.
    Type: Grant
    Filed: May 1, 2006
    Date of Patent: March 12, 2013
    Assignee: Microsoft Corporation
    Inventors: Dawson Yee, Gurdeep S. Pall
  • Patent number: 8392185
    Abstract: The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: March 5, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Patent number: 8379880
    Abstract: An example of a method of correcting an audio level of a stored program asset comprises retrieving a stored program asset having audio encoded at a first loudness setting. Dialog of the audio of the asset is identified, a loudness of the dialog is determined and the determined loudness is compared to the first loudness setting. The asset is re-encoded at a second loudness setting corresponding to the determined loudness, if the first loudness setting and the second loudness are different by more than a predetermined amount. The determined loudness is preferably a DIALNORM of the dialog. The asset may be stored with the re-encoded loudness setting. The method may be applied to programs as they are being received from a source, as well. Aspects of the method may also be applied to programs to be provided by a source. Systems are also disclosed.
    Type: Grant
    Filed: June 2, 2008
    Date of Patent: February 19, 2013
    Assignee: Time Warner Cable Inc.
    Inventor: Steven E. Riedl
  • Patent number: 8374873
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: August 11, 2009
    Date of Patent: February 12, 2013
    Assignee: Morphism, LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8326625
    Abstract: A system and method are provided to authenticate a voice in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.
    Type: Grant
    Filed: November 10, 2009
    Date of Patent: December 4, 2012
    Assignee: Research In Motion Limited
    Inventor: Sasan Adibi
  • Patent number: 8316148
    Abstract: A method and apparatus for obtaining a real time media stream provided as a plurality of media fragments from a plurality of remote nodes in a communications network. A first series of media fragments satisfying a first selection criterion is requested from a first remote node and a further series of media fragments satisfying a further different selection criterion is requested from at least one further remote node. When combined, the first series of fragments and the further series of fragments provide the complete media stream.
    Type: Grant
    Filed: February 22, 2008
    Date of Patent: November 20, 2012
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Andreas Ljunggren, Robert Skog
  • Patent number: 8316108
    Abstract: A method and apparatus for obtaining a real time media stream provided as a plurality of media fragments from a plurality of remote nodes in a communications network is described. Media fragments are requested from the plurality of remote nodes. A series of media fragments is received from at least one of the plurality of remote nodes. A selection criterion is determined for identifying the series of data fragments, and a blocking request is sent to at least one other of the plurality of remote nodes, the blocking request instructing the at least one other node to block the media fragments satisfying the selection criterion from being sent.
    Type: Grant
    Filed: February 22, 2008
    Date of Patent: November 20, 2012
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Andreas Ljunggren, Robert Skog
  • Patent number: 8311821
    Abstract: A method (1) for classifying at least one audio signal (A) into at least one audio class (AC), the method (1) comprising the steps of analyzing (10) said audio signal to extract at least one predetermined audio feature, performing (12) a frequency analysis on a set of values of said audio feature at different time instances, deriving (12) at least one further audio feature representing a temporal behavior of said audio feature based on said frequency analysis, and classifying (14) said audio signal based on said further audio feature. With the further audio feature, information is obtained about the temporal fluctuation of an audio feature, which may be advantageous for a classification of audio.
    Type: Grant
    Filed: April 21, 2004
    Date of Patent: November 13, 2012
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Dirk Jeroen Breebaart, Martin Franciscus McKinney
  • Patent number: 8311837
    Abstract: A Mobile Voice Self Service (MVSS) mobile system that includes an MVSS mobile device, on which a VoiceXML browser is implemented directly. The VoiceXML browser may request a VoiceXML application from a VoiceXML application server and process it. A client system may include the VoiceXML application server that the VoiceXML application is requested from. Upon request, the VoiceXML application may deliver the requested VoiceXML application to the VoiceXML application browser. A vendor media resource system may provide advanced Media Resource Control Protocol (MRCP) services, such as Automatic Speech Recognition (ASR) or Text-To-Speech (TTS), to the VoiceXML application that is being processed by the VoiceXML application browser. A call data manager may also be implemented on the MVSS mobile device and may provide call data that, in conjunction with data from the VoiceXML application server, may authorize access to advanced Media Resource Control Protocol (MRCP) services.
    Type: Grant
    Filed: June 13, 2008
    Date of Patent: November 13, 2012
    Assignee: West Corporation
    Inventor: Chad Daniel Fox
  • Patent number: 8306819
    Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 9, 2009
    Date of Patent: November 6, 2012
    Assignee: Microsoft Corporation
    Inventors: Chaojun Liu, Yifan Gong
  • Patent number: 8296148
    Abstract: A Mobile Voice Self Service (MVSS) mobile device and method thereof. A VoiceXML browser that is implemented directly on the MVSS mobile device may request a VoiceXML application from a VoiceXML application server and process it. A call data manager may also be implemented on the MVSS mobile device and may provide call data that, in conjunction with data from the VoiceXML application server, may authorize access to advanced Media Resource Control Protocol (MRCP) services, such as Automatic Speech Recognition (ASR) or Text-To-Speech (TTS). A media resource gateway may then provide the advanced MRCP services to the VoiceXML application processed by the VoiceXML application browser. Hotkey navigations and bookmarked application points to VoiceXML applications may be created and applied through application analysis and state tracking. Therein, VoiceXML document transitions and user input are stored to maintain application state changes until the user requests creation of an application bookmark.
    Type: Grant
    Filed: June 13, 2008
    Date of Patent: October 23, 2012
    Assignee: West Corporation
    Inventor: Chad Daniel Fox
  • Publication number: 20120259632
    Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.
    Type: Application
    Filed: February 22, 2010
    Publication date: October 11, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Daniel Willett
  • Patent number: 8275154
    Abstract: An apparatus for processing an audio signal and method thereof are disclosed, by which a local dynamic range of an audio signal can be adaptively normalized as well as a maximum dynamic range of the audio signal. The present invention includes receiving, by an audio processing apparatus, a signal, and feedback information estimated based on a normalizing gain; generating a noise estimation based on the signal; computing a gain filter for noise canceling, based on the noise estimation and the signal; and, obtaining a restricted gain filter by applying the feedback information to the gain filter.
    Type: Grant
    Filed: July 29, 2009
    Date of Patent: September 25, 2012
    Assignee: LG Electronics Inc.
    Inventors: Jong Ha Moon, Hyen O Oh, Joon Il Lee, Myung Hoon Lee, Yang Won Jung, Alexis Favrot, Christof Faller
  • Patent number: 8249873
    Abstract: Tonal correction of speech is provided. Received speech is analyzed and compared to a table of commonly mispronounced phrases. These phrases are mapped to the phrase likely intended by the speaker. The phrase determines to be the phrase the user likely intended can be suggested to the user. If the user approves of the suggestion, tonal correction can be applied to the speech before that speech is delivered to a recipient.
    Type: Grant
    Filed: August 12, 2005
    Date of Patent: August 21, 2012
    Assignee: Avaya Inc.
    Inventors: Colin Blair, Kevin Chan, Christopher R. Gentle, Neil Hepworth, Andrew W. Lang, Paul R. Michaelis
  • Publication number: 20120196629
    Abstract: In one embodiment, a method provides for monitoring and analyzing communications of a monitored user on behalf of a monitoring user, to determine whether the communication includes a violation. For example, SMS messages, MMS messages, IMs, e-mails, social network site postings or voice mails of a child may be monitored on behalf of a parent. In one embodiment, an algorithm is used to analyze a normalized version of the communication, which algorithm is retrained using results of past analysis, to determine a probability of a communication including a violation.
    Type: Application
    Filed: January 28, 2011
    Publication date: August 2, 2012
    Applicant: PROTEXT MOBILITY, INC.
    Inventors: Edward Movsesyan, Igor Slavinsky
  • Patent number: 8234411
    Abstract: Methods, systems, computer readable media, and apparatuses for providing enhanced content are presented. Data including a first program, a first caption stream associated with the first program, and a second caption stream associated with the first program may be received. The second caption stream may be extracted from the data, and a second program may be encoded with the second caption stream. The first program may be transmitted with the first caption stream including first captions and may include first content configured to be played back at a first speed. In response to receiving an instruction to change play back speed, the second program may be transmitted with the second caption stream. The second program may include the first content configured to be played back at a second speed different from the first speed, and the second caption stream may include second captions different from the first captions.
    Type: Grant
    Filed: September 2, 2010
    Date of Patent: July 31, 2012
    Assignee: Comcast Cable Communications, LLC
    Inventor: Ross Gilson
  • Patent number: 8229744
    Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.
    Type: Grant
    Filed: August 26, 2003
    Date of Patent: July 24, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
  • Patent number: 8175882
    Abstract: A method for task execution improvement, the method includes: generating a baseline model for executing a task; recording a user executing a task; comparing the baseline model to the user's execution of the task; and providing feedback to the user based on the differences in the user's execution and the baseline model.
    Type: Grant
    Filed: January 25, 2008
    Date of Patent: May 8, 2012
    Assignee: International Business Machines Corporation
    Inventors: Sara H. Basson, Dimitiri Kanevsky, Edward E. Kelley, Bhuvana Ramabhadran
  • Patent number: 8160875
    Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
    Type: Grant
    Filed: August 26, 2010
    Date of Patent: April 17, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Mazin Gilbert
  • Patent number: 8150688
    Abstract: A voice recognizing apparatus includes a microphone 12 which inputs an input voice including speech voice uttered by a user speaker and interference voice uttered by an interference speaker other than the user speaker, superimposition amount determining unit 14 which determines a noise superimposition amount for the input voice on the basis of a speech voice and an interference voice separately input as the input voice, a noise superimposing unit 16 which superimposes noise according to the noise superimposition amount onto the input voice and outputs the resultant voice as noise-superimposed voice; and a voice recognizing unit 18 which recognizes the noise-superimposed voice.
    Type: Grant
    Filed: January 10, 2007
    Date of Patent: April 3, 2012
    Assignee: NEC Corporation
    Inventor: Toru Iwasawa