Normalizing Patents (Class 704/234)
-
Patent number: 8798985Abstract: A method for interpreting a dialogue between two terminals includes establishing a communication channel between interpretation terminals of two parties in response to an interpretation request; specifying a language of an initiating party and a language of the other party in each of the interpretation terminals of the two parties by exchanging information about the language of the initiating party used in the interpretation terminal of the initiating party and the language of the other party used in the interpretation terminal of the other party via the communication channel; recognizing speech uttered from the interpretation terminal of the initiating party; translating the speech recognized by the interpretation terminal of the initiating party into the language of the other party; and transmitting a sentence translated into the language of the other party to the interpretation terminal of the other party.Type: GrantFiled: June 2, 2011Date of Patent: August 5, 2014Assignee: Electronics and Telecommunications Research InstituteInventors: Seung Yun, Sanghun Kim
-
Publication number: 20140207448Abstract: A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.Type: ApplicationFiled: January 23, 2013Publication date: July 24, 2014Applicant: Microsoft CorporationInventors: Shizhen Wang, Yifan Gong, Fileno Alleva
-
Patent number: 8781825Abstract: Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.Type: GrantFiled: August 24, 2011Date of Patent: July 15, 2014Assignee: Sensory, IncorporatedInventors: Jonathan Shaw, Pieter Vermeulen, Stephen Sutton, Robert Savoie
-
Patent number: 8768711Abstract: A method of voice-enabling an application for command and control and content navigation can include the application dynamically generating a markup language fragment specifying a command and control and content navigation grammar for the application, instantiating an interpreter from a voice library, and providing the markup language fragment to the interpreter. The method also can include the interpreter processing a speech input using the command and control and content navigation grammar specified by the markup language fragment and providing an event to the application indicating an instruction representative of the speech input.Type: GrantFiled: June 17, 2004Date of Patent: July 1, 2014Assignee: Nuance Communications, Inc.Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Brien H. Muschett
-
Patent number: 8768695Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.Type: GrantFiled: June 13, 2012Date of Patent: July 1, 2014Assignee: Nuance Communications, Inc.Inventors: Yun Tang, Venkatesh Nagesha
-
Patent number: 8762143Abstract: Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information based on a previously recorded time and speed of the caller, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.Type: GrantFiled: May 29, 2007Date of Patent: June 24, 2014Assignee: AT&T Intellectual Property II, L.P.Inventor: Mazin Gilbert
-
Method for processing noisy speech signal, apparatus for same and computer-readable recording medium
Patent number: 8744845Abstract: A noise estimation method for a noisy speech signal according to an embodiment of the present invention includes the steps of approximating a transformation spectrum by transforming an input noisy speech signal to a frequency domain, calculating a smoothed magnitude spectrum having a decreased difference in a magnitude of the transformation spectrum between neighboring frames, calculating a search spectrum to represent an estimated noise component of the smoothed magnitude spectrum, and estimating a noise spectrum by using a recursive average method using an adaptive forgetting factor defined by using the search spectrum. According to an embodiment of the present invention, the amount of calculation for noise estimation is small, and large-capacity memory is not required. Accordingly, the present invention can be easily implemented in hardware or software. Further, the accuracy of noise estimation can be increase because an adaptive procedure can be performed on each frequency sub-band.Type: GrantFiled: March 31, 2009Date of Patent: June 3, 2014Assignee: Transono Inc.Inventors: Sung Il Jung, Dong Gyung Ha -
Patent number: 8711015Abstract: The invention relates to compressing of sparse data sets contains sequences of data values and position information therefor. The position information may be in the form of position indices defining active positions of the data values in a sparse vector of length N. The position information is encoded into the data values by adjusting one or more of the data values within a pre-defined tolerance range, so that a pre-defined mapping function of the data values and their positions is close to a target value. In one embodiment, the mapping function is defined using a sub-set of N filler values which elements are used to fill empty positions in the input sparse data vector. At the decoder, the correct data positions are identified by searching though possible sub-sets of filler values.Type: GrantFiled: August 24, 2011Date of Patent: April 29, 2014Assignee: Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, through the Communications Research Centre CanadaInventors: Frederic Mustiere, Hossein Najaf-Zadeh, Ramin Pishehvar, Hassan Lahdili, Louis Thibault, Martin Bouchard
-
Patent number: 8682671Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: April 17, 2013Date of Patent: March 25, 2014Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Patent number: 8660849Abstract: Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A user request is received, the user request including at least a speech input received from a user. The user request including the speech input is processed to obtain a representation of user intent for identifying items of a selection domain based on at least one selection criterion. A prompt is provided to the user, the prompt presenting two or more properties relevant to items of the selection domain and requesting the user to specify relative importance between the two or more properties. A listing of search results is provided to the user, where the listing of search results has been obtained based on the at least one selection criterion and the relative importance provided by the user.Type: GrantFiled: December 21, 2012Date of Patent: February 25, 2014Assignee: Apple Inc.Inventors: Thomas Robert Gruber, Adam John Cheyer, Didier Rene Guzzoni, Christopher Dean Brigham, Harry Joseph Saddler
-
Patent number: 8655659Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.Type: GrantFiled: August 12, 2010Date of Patent: February 18, 2014Assignees: Sony Corporation, Sony Mobile Communications ABInventors: Qingfang Wang, Shouchun He
-
Patent number: 8639508Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.Type: GrantFiled: February 14, 2011Date of Patent: January 28, 2014Assignee: General Motors LLCInventors: Xufang Zhao, Gaurav Talwar
-
Patent number: 8600741Abstract: A system and method for tuning a speech recognition engine to an individual microphone using a database containing acoustical models for a plurality of microphones. Microphone performance characteristics are obtained from a microphone at a speech recognition engine, the database is searched for an acoustical model that matches the characteristics, and the speech recognition engine is then modified based on the matching acoustical model.Type: GrantFiled: August 20, 2008Date of Patent: December 3, 2013Assignee: General Motors LLCInventors: Gaurav Talwar, Rathinavelu Chengalvarayan, Jesse T. Gratke, Subhash B. Gullapalli, Dana B. Fecher
-
Patent number: 8600744Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.Type: GrantFiled: April 13, 2012Date of Patent: December 3, 2013Assignee: AT&T Intellectual Property II, L.P.Inventor: Mazin Gilbert
-
Patent number: 8577678Abstract: A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.Type: GrantFiled: March 10, 2011Date of Patent: November 5, 2013Assignee: Honda Motor Co., Ltd.Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
-
Patent number: 8571870Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: August 9, 2010Date of Patent: October 29, 2013Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Patent number: 8560324Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.Type: GrantFiled: January 31, 2012Date of Patent: October 15, 2013Assignee: LG Electronics Inc.Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
-
Patent number: 8560316Abstract: The present invention relates to a system and method of making a verification decision within a speaker recognition system. A speech sample is gathered from a speaker over a period of time a verification score is then produce for said sample over the period. Once the verification score is determined a confidence measure is produced based on frame score observations from said sample over the period and a confidence measure calculated based on the standard Gaussian distribution. If the confidence measure indicates with a set level of confidence that the verification score is below the verification threshold the speaker is rejected and gathering process terminated.Type: GrantFiled: December 19, 2007Date of Patent: October 15, 2013Inventors: Robert Vogt, Michael Mason, Sridaran Subramanian
-
Patent number: 8554566Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: November 29, 2012Date of Patent: October 8, 2013Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Patent number: 8538752Abstract: The invention comprises a method and apparatus for predicting word accuracy. Specifically, the method comprises obtaining an utterance in speech data where the utterance comprises an actual word string, processing the utterance for generating an interpretation of the actual word string, processing the utterance to identify at least one utterance frame, and predicting a word accuracy associated with the interpretation according to at least one stationary signal-to-noise ratio and at least one non-stationary signal to noise ratio, wherein the at least one stationary signal-to-noise ratio and the at least one non-stationary signal to noise ratio are determined according to a frame energy associated with each of the at least one utterance frame.Type: GrantFiled: May 7, 2012Date of Patent: September 17, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Mazin Gilbert, Hong Kook Kim
-
Patent number: 8538751Abstract: A speech recognition system and a speech recognizing method for high-accuracy speech recognition in the environment with ego noise are provided. A speech recognition system according to the present invention includes a sound source separating and speech enhancing section; an ego noise predicting section; and a missing feature mask generating section for generating missing feature masks using outputs of the sound source separating and speech enhancing section and the ego noise predicting section; an acoustic feature extracting section for extracting an acoustic feature of each sound source using an output for said each sound source of the sound source separating and speech enhancing section; and a speech recognizing section for performing speech recognition using outputs of the acoustic feature extracting section and the missing feature masks.Type: GrantFiled: June 10, 2011Date of Patent: September 17, 2013Assignee: Honda Motor Co., Ltd.Inventors: Kazuhiro Nakadai, Gokhan Ince
-
Patent number: 8521536Abstract: A Mobile Voice Self Service (MVSS) mobile device and method thereof. A VoiceXML browser that is implemented directly on the MVSS mobile device may request a VoiceXML application from a VoiceXML application server and process it. A call data manager may also be implemented on the MVSS mobile device and may provide call data that, in conjunction with data from the VoiceXML application server, may authorize access to advanced Media Resource Control Protocol (MRCP) services, such as Automatic Speech Recognition (ASR) or Text-To-Speech (TTS). A media resource gateway may then provide the advanced MRCP services to the VoiceXML application processed by the VoiceXML application browser. Hotkey navigations and bookmarked application points to VoiceXML applications may be created and applied through application analysis and state tracking. Therein, VoiceXML document transitions and user input are stored to maintain application state changes until the user requests creation of an application bookmark.Type: GrantFiled: October 22, 2012Date of Patent: August 27, 2013Assignee: West CorporationInventor: Chad Daniel Fox
-
Patent number: 8515747Abstract: A transmitted data that includes audio data and a transmitted spectral sharpness parameter representing a spectral harmonic/noise sharpness of a plurality of subbands are received. A measured spectral sharpness parameter is estimated from received audio data. The transmitted spectral sharpness parameter is compared with the measured spectral sharpness parameter. A main sharpness control parameter is formed for each of the decoded subbands. The main sharpness control parameter for each of the decoded subbands is analyzed. Ones of the decoded subbands are sharpened if the corresponding main sharpness control indicates that a corresponding subband is not sharp enough, wherein sharpened subbands are formed. Likewise, ones of the decoded subbands are flattened if the corresponding main sharpness control indicates that a corresponding subband is not flat enough, wherein flattened subbands are formed.Type: GrantFiled: September 4, 2009Date of Patent: August 20, 2013Assignee: Huawei Technologies Co., Ltd.Inventor: Yang Gao
-
Patent number: 8494847Abstract: A weighting factor learning system includes an audio recognition section that recognizes learning audio data and outputting the recognition result; a weighting factor updating section that updates a weighting factor applied to a score obtained from an acoustic model and a language model so that the difference between a correct-answer score calculated with the use of a correct-answer text of the learning audio data and a score of the recognition result becomes large; a convergence determination section that determines, with the use of the score after updating, whether to return to the weighting factor updating section to update the weighting factor again; and a weighting factor convergence determination section that determines, with the use of the score after updating, whether to return to the audio recognition section to perform the process again and update the weighting factor using the weighting factor updating section.Type: GrantFiled: February 19, 2008Date of Patent: July 23, 2013Assignee: NEC CorporationInventors: Tadashi Emori, Yoshifumi Onishi
-
Patent number: 8489396Abstract: The system provides a technique for suppressing or eliminating tonal noise in and input signal. The system operates on the input signal at a plurality of frequency bins and uses information generated at a prior bin to assist in calculating values at subsequent bins. The system first identifies peaks in a signal and then determines if the peaks are from tonal effects. This can be done by comparing the estimated background noise of a current bin to the smoothed background noise of the same bin. The smoothed background noise can be calculated using an asymmetric IIR filter. When the ratio of the current background noise estimate to the currently calculated smoothed background noise is far greater than 1, tonal noise is assumed. When tonal noise is found, a number of suppression techniques can be applied to reduce the tonal noise, including gain suppression with fixed floor factor, an adaptive floor factor gain suppression technique, and a random phase technique.Type: GrantFiled: December 20, 2007Date of Patent: July 16, 2013Assignee: QNX Software Systems LimitedInventors: Phil A. Hetherington, Xueman Li
-
Publication number: 20130179164Abstract: A vehicle voice interface system calibration method comprising electronically convolving voice command data with voice impulse response data, electronically convolving audio system output data with feedback impulse response data, and calibrating the vehicle voice interface system. The voice command data is electronically convolved with voice impulse response data representing a voice acoustic signal path between an artificial mouth simulator and a first microphone, to simulate a voice acoustic transfer function pertaining to the passenger compartment. The audio system output data is convolved with feedback impulse response data representing a feedback acoustic signal path between a vehicle audio system output and a second microphone, to simulate a feedback acoustic transfer function pertaining to the passenger compartment. The voice interface system is calibrated to recognize voice commands represented by the voice command data based on the simulated voice and feedback acoustic transfer functions.Type: ApplicationFiled: January 6, 2012Publication date: July 11, 2013Applicant: Nissan North America, Inc.Inventor: Patrick Dennis
-
Patent number: 8484267Abstract: Weight normalization in hardware or software without a division operator is described, using only right bit shift, addition and subtraction operations. A right bit shift is performed on an expected sum to effectively divide the expected sum by two to provide a first updated value for the expected sum. An iteration is performed which includes: incrementing with a first adder a first variable by the first updated value of the expected sum to provide an updated value for the first variable; subtracting with a first subtractor a second weight from a first weight to provide a first updated value for the first weight; and performing a left bit shift on the second weight to effectively multiply the second weight by two to provide a first updated value for the second weight.Type: GrantFiled: November 19, 2009Date of Patent: July 9, 2013Assignee: Xilinx, Inc.Inventor: Gabor Szedo
-
Voice analysis device, voice analysis method, voice analysis program, and system integration circuit
Patent number: 8478587Abstract: A sound analysis device comprises: a sound parameter calculation unit operable to acquire an audio signal and calculate a sound parameter for each of partial audio signals, the partial audio signals each being the acquired audio signal in a unit of time; a category determination unit operable to determine, from among a plurality of environmental sound categories, which environmental sound category each of the partial audio signals belongs to, based on a corresponding one of the calculated sound parameters; a section setting unit operable to sequentially set judgement target sections on a time axis as time elapses, each of the judgment target sections including two or more of the units of time, the two or more of the units of time being consecutive; and an environment judgment unit operable to judge, based on a number of partial audio signals in each environmental sound category determined in at least a most recent judgment target section, an environment that surrounds the sound analysis device in at least theType: GrantFiled: March 13, 2008Date of Patent: July 2, 2013Assignee: Panasonic CorporationInventors: Takashi Kawamura, Ryouichi Kawanishi -
Patent number: 8463720Abstract: A method for defining a network of nodes is provided, each representing a unique concept, and making connections between individual concepts through unique relationships to other concepts. Each of the nodes is operable to store a unique identifier in the network and information regarding the concept in addition to the unique relationships.Type: GrantFiled: March 26, 2010Date of Patent: June 11, 2013Assignee: Neuric Technologies, LLCInventors: Jennifer Seale, Hannah Lindsley, Timothy Allen Margheim
-
Patent number: 8447610Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: August 9, 2010Date of Patent: May 21, 2013Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Patent number: 8396711Abstract: A user's voice is authenticated by prompting a user to say a challenge phrase from a list of predetermined phrases and comparing the user's response with a prerecorded version of the same response. The user's stored recordings are associated with an electronic identification or serial number for a specific device, so that when communication is established using the device, only the specific user may authenticate the session. When several phrases and recordings are used, one may be selected at random for authentication so that fraudulent authentication using a recording of the user's voice may be thwarted. The system and method may be used for authenticating a device when it is first activated, such as a telephony device, or may be used when authenticating a specific communications session.Type: GrantFiled: May 1, 2006Date of Patent: March 12, 2013Assignee: Microsoft CorporationInventors: Dawson Yee, Gurdeep S. Pall
-
Patent number: 8392185Abstract: The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.Type: GrantFiled: August 19, 2009Date of Patent: March 5, 2013Assignee: Honda Motor Co., Ltd.Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
-
Patent number: 8379880Abstract: An example of a method of correcting an audio level of a stored program asset comprises retrieving a stored program asset having audio encoded at a first loudness setting. Dialog of the audio of the asset is identified, a loudness of the dialog is determined and the determined loudness is compared to the first loudness setting. The asset is re-encoded at a second loudness setting corresponding to the determined loudness, if the first loudness setting and the second loudness are different by more than a predetermined amount. The determined loudness is preferably a DIALNORM of the dialog. The asset may be stored with the re-encoded loudness setting. The method may be applied to programs as they are being received from a source, as well. Aspects of the method may also be applied to programs to be provided by a source. Systems are also disclosed.Type: GrantFiled: June 2, 2008Date of Patent: February 19, 2013Assignee: Time Warner Cable Inc.Inventor: Steven E. Riedl
-
Patent number: 8374873Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: August 11, 2009Date of Patent: February 12, 2013Assignee: Morphism, LLCInventor: James H. Stephens, Jr.
-
Patent number: 8326625Abstract: A system and method are provided to authenticate a voice in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.Type: GrantFiled: November 10, 2009Date of Patent: December 4, 2012Assignee: Research In Motion LimitedInventor: Sasan Adibi
-
Patent number: 8316148Abstract: A method and apparatus for obtaining a real time media stream provided as a plurality of media fragments from a plurality of remote nodes in a communications network. A first series of media fragments satisfying a first selection criterion is requested from a first remote node and a further series of media fragments satisfying a further different selection criterion is requested from at least one further remote node. When combined, the first series of fragments and the further series of fragments provide the complete media stream.Type: GrantFiled: February 22, 2008Date of Patent: November 20, 2012Assignee: Telefonaktiebolaget LM Ericsson (publ)Inventors: Andreas Ljunggren, Robert Skog
-
Patent number: 8316108Abstract: A method and apparatus for obtaining a real time media stream provided as a plurality of media fragments from a plurality of remote nodes in a communications network is described. Media fragments are requested from the plurality of remote nodes. A series of media fragments is received from at least one of the plurality of remote nodes. A selection criterion is determined for identifying the series of data fragments, and a blocking request is sent to at least one other of the plurality of remote nodes, the blocking request instructing the at least one other node to block the media fragments satisfying the selection criterion from being sent.Type: GrantFiled: February 22, 2008Date of Patent: November 20, 2012Assignee: Telefonaktiebolaget LM Ericsson (publ)Inventors: Andreas Ljunggren, Robert Skog
-
Patent number: 8311821Abstract: A method (1) for classifying at least one audio signal (A) into at least one audio class (AC), the method (1) comprising the steps of analyzing (10) said audio signal to extract at least one predetermined audio feature, performing (12) a frequency analysis on a set of values of said audio feature at different time instances, deriving (12) at least one further audio feature representing a temporal behavior of said audio feature based on said frequency analysis, and classifying (14) said audio signal based on said further audio feature. With the further audio feature, information is obtained about the temporal fluctuation of an audio feature, which may be advantageous for a classification of audio.Type: GrantFiled: April 21, 2004Date of Patent: November 13, 2012Assignee: Koninklijke Philips Electronics N.V.Inventors: Dirk Jeroen Breebaart, Martin Franciscus McKinney
-
Patent number: 8311837Abstract: A Mobile Voice Self Service (MVSS) mobile system that includes an MVSS mobile device, on which a VoiceXML browser is implemented directly. The VoiceXML browser may request a VoiceXML application from a VoiceXML application server and process it. A client system may include the VoiceXML application server that the VoiceXML application is requested from. Upon request, the VoiceXML application may deliver the requested VoiceXML application to the VoiceXML application browser. A vendor media resource system may provide advanced Media Resource Control Protocol (MRCP) services, such as Automatic Speech Recognition (ASR) or Text-To-Speech (TTS), to the VoiceXML application that is being processed by the VoiceXML application browser. A call data manager may also be implemented on the MVSS mobile device and may provide call data that, in conjunction with data from the VoiceXML application server, may authorize access to advanced Media Resource Control Protocol (MRCP) services.Type: GrantFiled: June 13, 2008Date of Patent: November 13, 2012Assignee: West CorporationInventor: Chad Daniel Fox
-
Patent number: 8306819Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.Type: GrantFiled: March 9, 2009Date of Patent: November 6, 2012Assignee: Microsoft CorporationInventors: Chaojun Liu, Yifan Gong
-
Patent number: 8296148Abstract: A Mobile Voice Self Service (MVSS) mobile device and method thereof. A VoiceXML browser that is implemented directly on the MVSS mobile device may request a VoiceXML application from a VoiceXML application server and process it. A call data manager may also be implemented on the MVSS mobile device and may provide call data that, in conjunction with data from the VoiceXML application server, may authorize access to advanced Media Resource Control Protocol (MRCP) services, such as Automatic Speech Recognition (ASR) or Text-To-Speech (TTS). A media resource gateway may then provide the advanced MRCP services to the VoiceXML application processed by the VoiceXML application browser. Hotkey navigations and bookmarked application points to VoiceXML applications may be created and applied through application analysis and state tracking. Therein, VoiceXML document transitions and user input are stored to maintain application state changes until the user requests creation of an application bookmark.Type: GrantFiled: June 13, 2008Date of Patent: October 23, 2012Assignee: West CorporationInventor: Chad Daniel Fox
-
Publication number: 20120259632Abstract: A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search.Type: ApplicationFiled: February 22, 2010Publication date: October 11, 2012Applicant: NUANCE COMMUNICATIONS, INC.Inventor: Daniel Willett
-
Patent number: 8275154Abstract: An apparatus for processing an audio signal and method thereof are disclosed, by which a local dynamic range of an audio signal can be adaptively normalized as well as a maximum dynamic range of the audio signal. The present invention includes receiving, by an audio processing apparatus, a signal, and feedback information estimated based on a normalizing gain; generating a noise estimation based on the signal; computing a gain filter for noise canceling, based on the noise estimation and the signal; and, obtaining a restricted gain filter by applying the feedback information to the gain filter.Type: GrantFiled: July 29, 2009Date of Patent: September 25, 2012Assignee: LG Electronics Inc.Inventors: Jong Ha Moon, Hyen O Oh, Joon Il Lee, Myung Hoon Lee, Yang Won Jung, Alexis Favrot, Christof Faller
-
Patent number: 8249873Abstract: Tonal correction of speech is provided. Received speech is analyzed and compared to a table of commonly mispronounced phrases. These phrases are mapped to the phrase likely intended by the speaker. The phrase determines to be the phrase the user likely intended can be suggested to the user. If the user approves of the suggestion, tonal correction can be applied to the speech before that speech is delivered to a recipient.Type: GrantFiled: August 12, 2005Date of Patent: August 21, 2012Assignee: Avaya Inc.Inventors: Colin Blair, Kevin Chan, Christopher R. Gentle, Neil Hepworth, Andrew W. Lang, Paul R. Michaelis
-
Publication number: 20120196629Abstract: In one embodiment, a method provides for monitoring and analyzing communications of a monitored user on behalf of a monitoring user, to determine whether the communication includes a violation. For example, SMS messages, MMS messages, IMs, e-mails, social network site postings or voice mails of a child may be monitored on behalf of a parent. In one embodiment, an algorithm is used to analyze a normalized version of the communication, which algorithm is retrained using results of past analysis, to determine a probability of a communication including a violation.Type: ApplicationFiled: January 28, 2011Publication date: August 2, 2012Applicant: PROTEXT MOBILITY, INC.Inventors: Edward Movsesyan, Igor Slavinsky
-
Patent number: 8234411Abstract: Methods, systems, computer readable media, and apparatuses for providing enhanced content are presented. Data including a first program, a first caption stream associated with the first program, and a second caption stream associated with the first program may be received. The second caption stream may be extracted from the data, and a second program may be encoded with the second caption stream. The first program may be transmitted with the first caption stream including first captions and may include first content configured to be played back at a first speed. In response to receiving an instruction to change play back speed, the second program may be transmitted with the second caption stream. The second program may include the first content configured to be played back at a second speed different from the first speed, and the second caption stream may include second captions different from the first captions.Type: GrantFiled: September 2, 2010Date of Patent: July 31, 2012Assignee: Comcast Cable Communications, LLCInventor: Ross Gilson
-
Patent number: 8229744Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.Type: GrantFiled: August 26, 2003Date of Patent: July 24, 2012Assignee: Nuance Communications, Inc.Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
-
Patent number: 8175882Abstract: A method for task execution improvement, the method includes: generating a baseline model for executing a task; recording a user executing a task; comparing the baseline model to the user's execution of the task; and providing feedback to the user based on the differences in the user's execution and the baseline model.Type: GrantFiled: January 25, 2008Date of Patent: May 8, 2012Assignee: International Business Machines CorporationInventors: Sara H. Basson, Dimitiri Kanevsky, Edward E. Kelley, Bhuvana Ramabhadran
-
Patent number: 8160875Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.Type: GrantFiled: August 26, 2010Date of Patent: April 17, 2012Assignee: AT&T Intellectual Property II, L.P.Inventor: Mazin Gilbert
-
Patent number: 8150688Abstract: A voice recognizing apparatus includes a microphone 12 which inputs an input voice including speech voice uttered by a user speaker and interference voice uttered by an interference speaker other than the user speaker, superimposition amount determining unit 14 which determines a noise superimposition amount for the input voice on the basis of a speech voice and an interference voice separately input as the input voice, a noise superimposing unit 16 which superimposes noise according to the noise superimposition amount onto the input voice and outputs the resultant voice as noise-superimposed voice; and a voice recognizing unit 18 which recognizes the noise-superimposed voice.Type: GrantFiled: January 10, 2007Date of Patent: April 3, 2012Assignee: NEC CorporationInventor: Toru Iwasawa