Speech Signal Processing Patents (Class 704/200)
  • Patent number: 9015039
    Abstract: System and method embodiments for dual modes pitch coding are provided. The system and method embodiments are configured to adaptively code pitch lags of a voiced speech signal using one of two pitch coding modes according to a pitch length, stability, or both. The two pitch coding modes include a first pitch coding mode with relatively high precision and reduced dynamic range, and a second pitch coding mode with relatively large dynamic range and reduced precision. The first pitch coding mode is used upon determining that the voiced speech signal has a relatively short or substantially stable pitch. The second pitch coding mode is used upon determining that the voiced speech signal has a relatively long or less stable pitch or is a substantially noisy signal.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: April 21, 2015
    Assignee: Huawei Technologies Co., Ltd.
    Inventor: Yang Gao
  • Patent number: 9014246
    Abstract: An alternative approach to coping with the ever increasing demand for faster communications hardware is to design modems that are capable of operating its speeds at a higher data rate than a speed required for a single port of the standard communication rate for that modem. Basically, by utilizing a resource manager, that directs the data in and out of the various portions of the modem in an orderly manner, keeping track of which of the ports is being operated at any given point in time, a standard single port modem can be reconfigured, for example, at an over clocked rate, to manipulate the data input and output of a modem.
    Type: Grant
    Filed: May 4, 2012
    Date of Patent: April 21, 2015
    Assignee: Intel Corporation
    Inventors: Peter N. Heller, Edmund C. Reiter, Michael A. Tzannes
  • Patent number: 9009032
    Abstract: A method and system for performing sample rate conversion is provided. The method may include configuring a system to convert a sample rate of a first audio channel of a plurality of audio channels to produce a first audio stream of samples. The system may be dynamically reconfigured to convert a sample rate of a second of the plurality of audio channels to produce a second audio stream of samples, wherein the first and second audio streams are output from the system at the same time. The method may further include arbitrating between request for additional data from the first and second audio stream of samples, where processing of the first channel is suspended when the request corresponds to a second channel that is of higher priority.
    Type: Grant
    Filed: November 9, 2006
    Date of Patent: April 14, 2015
    Assignee: Broadcom Corporation
    Inventors: David Wu, Keith Klinger
  • Patent number: 9009033
    Abstract: A method and apparatus for implementation of real-time speech recognition using a handheld computing apparatus are provided. The handheld computing apparatus receives an audio signal, such as a user's voice. The handheld computing apparatus ultimately transmits the voice data to a remote or distal computing device with greater processing power and operating a speech recognition software application. The speech recognition software application processes the signal and outputs a set of instructions for implementation either by the computing device or the handheld apparatus. The instructions can include a variety of items including instructing the presentation of a textual representation of dictation, or a function or command to be executed by the handheld device (such as linking to a website, opening a file, cutting, pasting, saving, or other file menu type functionalities), or by the computing device itself.
    Type: Grant
    Filed: December 1, 2010
    Date of Patent: April 14, 2015
    Assignee: Nuance Communications, Inc.
    Inventor: Eric Hon-Anderson
  • Patent number: 9002708
    Abstract: A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user.
    Type: Grant
    Filed: May 8, 2012
    Date of Patent: April 7, 2015
    Assignee: NHN Corporation
    Inventors: Sang Ho Lee, Hoon Kim, Dong Ook Koo, Dae Sung Jung
  • Patent number: 8996388
    Abstract: An apparatus for processing an audio signal and method thereof are disclosed. The present invention includes receiving, by an audio processing apparatus, an audio signal including a first data of a first block encoded with rectangular coding scheme and a second data of a second block encoded with non-rectangular coding scheme; receiving a compensation signal corresponding to the second block; estimating a prediction of an aliasing part using the first data; and, obtaining a reconstructed signal for the second block based on the second data, the compensation signal and the prediction of aliasing part.
    Type: Grant
    Filed: August 6, 2013
    Date of Patent: March 31, 2015
    Assignee: Industry-Academic Cooperation Foundation, Yonsei University
    Inventors: Hyen-O Oh, Chang Heon Lee, Hong Goo Kang, Jung Wook Song
  • Patent number: 8996363
    Abstract: An apparatus for determining a plurality of local center-of-gravity frequencies of a spectrum of an audio signal includes an offset determiner, a frequency determiner and an iteration controller. The offset determiner determines an offset frequency for each iteration start frequency of a plurality of iteration start frequencies based on the spectrum of the audio signal, wherein a number of discrete sample values of the spectrum is larger than a number of iteration start frequencies. The frequency determiner determines a new plurality of iteration start frequencies by increasing or reducing each iteration start frequency of the plurality of iteration start frequencies by the corresponding determined offset frequency. The iteration controller provides the new plurality of iteration start frequencies to the offset determiner for further iteration or provides the plurality of local center-of-gravity frequencies, if a predefined termination condition is fulfilled.
    Type: Grant
    Filed: March 18, 2010
    Date of Patent: March 31, 2015
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Sascha Disch, Harald Popp
  • Patent number: 8996389
    Abstract: Various techniques are disclosed for reducing artifacts generated by time compression. by adapting the time compression based on the state of the received audio. The amount of time compression may be bounded based on audio characteristics. Another feature provides a way of determining the most correlated portions of segments of audio. Voiced speech may be distinguished from unvoiced speech. Another feature provides a way of distinguishing between silence, voiced speech, and unvoiced speech. Time compression may be adapted during periods of lengthy silence. Another feature allows for reducing time compression during sensitive portions of the received audio. One or more of these features may be present in different embodiments.
    Type: Grant
    Filed: June 14, 2011
    Date of Patent: March 31, 2015
    Assignee: Polycom, Inc.
    Inventor: Eric David Elias
  • Patent number: 8990081
    Abstract: A method of analyzing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analyzing the audio signal, based on the determined property of the first output function.
    Type: Grant
    Filed: September 11, 2009
    Date of Patent: March 24, 2015
    Assignee: Newsouth Innovations Pty Limited
    Inventors: Wenliang Lu, Dipanjan Sen
  • Patent number: 8990095
    Abstract: An apparatus for processing an audio signal and method thereof are disclosed. The present invention includes receiving, by an audio processing apparatus, an audio signal including a first data of a first block encoded with rectangular coding scheme and a second data of a second block encoded with non-rectangular coding scheme; receiving a compensation signal corresponding to the second block; estimating a prediction of an aliasing part using the first data; and, obtaining a reconstructed signal for the second block based on the second data, the compensation signal and the prediction of aliasing part.
    Type: Grant
    Filed: August 6, 2013
    Date of Patent: March 24, 2015
    Assignee: Industry-Academic Cooperation Foundation, Yonsei University
    Inventors: Hyen-O Oh, Chang Heon Lee, Hong Goo Kang, Jung Wook Song
  • Patent number: 8990073
    Abstract: A device and method for estimating a tonal stability of a sound signal include: calculating a current residual spectrum of the sound signal; detecting peaks in the current residual spectrum; calculating a correlation map between the current residual spectrum and a previous residual spectrum for each detected peak; and calculating a long-term correlation map based on the calculated correlation map, the long-term correlation map being indicative of a tonal stability in the sound signal.
    Type: Grant
    Filed: June 20, 2008
    Date of Patent: March 24, 2015
    Assignee: Voiceage Corporation
    Inventors: Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
  • Patent number: 8977248
    Abstract: Systems and methods that can be utilized to convert a voice communication received over a telecommunication network to text are described. In an illustrative embodiment, a call processing system coupled to a telecommunications network receives a call from a caller intended for a first party, wherein the call is associated with call signaling information. At least a portion of the call signaling information is stored in a computer readable medium. A greeting is played the caller, and a voice communication from the caller is recorded. At least a portion of the voice communication is converted to text, which is analyzed to identify portions that are inferred to be relatively more important to communicate to the first party. A text communication is generated including at least some of the identified portions and including fewer words than the recorded voice communication. At least a portion of the text communication is made available to the first party over a data network.
    Type: Grant
    Filed: March 20, 2014
    Date of Patent: March 10, 2015
    Assignee: Callwave Communications, LLC
    Inventors: Anthony Bladon, David Giannini, David Frank Hofstatter, Colin Kelley, David C. McClintock, Robert F. Smith, David S. Trandal, Leland W. Kirchhoff
  • Patent number: 8977543
    Abstract: A quantizing apparatus is provided that includes a quantization path determiner that determines a path from a first path not using inter-frame prediction and a second path using the inter-frame prediction, as a quantization path of an input signal, based on a criterion before quantization of the input signal; a first quantizer that quantizes the input signal, if the first path is determined as the quantization path of the input signal; and a second quantizer that quantizes the input signal, if the second path is determined as the quantization path of the input signal.
    Type: Grant
    Filed: April 23, 2012
    Date of Patent: March 10, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ho-sang Sung, Eun-mi Oh
  • Patent number: 8977557
    Abstract: A method, medium, and apparatus encoding and/or decoding a multichannel audio signal. The method includes detecting the type of spatial extension data included in an encoding result of an audio signal, if the spatial extension data is data indicating a core audio object type related to a technique of encoding core audio data, detecting the core audio object type; decoding core audio data by using a decoding technique according to the detected core audio object type, if the spatial extension data is residual coding data, decoding the residual coding data by using the decoding technique according to the core audio object type, and up-mixing the decoded core audio data by using the decoded residual coding data. According to the method, the core audio data and residual coding data may be decoded by using an identical decoding technique, thereby reducing complexity at the decoding end.
    Type: Grant
    Filed: October 28, 2013
    Date of Patent: March 10, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jung-hoe Kim, Eun-mi Oh
  • Publication number: 20150066486
    Abstract: A method for improving decomposition of digital signals using training sequences is presented. A method for improving decomposition of digital signals using initialization is also provided. A method for sorting digital signals using frames based upon energy content in the frame is further presented. A method for utilizing user input for combining parts of a decomposed signal is also presented.
    Type: Application
    Filed: August 28, 2013
    Publication date: March 5, 2015
    Applicant: ACCUSONUS S.A.
    Inventors: Elias Kokkinis, Alexandros Tsilfidis
  • Patent number: 8972270
    Abstract: A method for processing an audio signal is disclosed. The method for processing an audio signal includes frequency-transforming an audio signal to generate a frequency-spectrum, deciding a weighting per band corresponding to energy per band using the frequency spectrum, receiving a masking threshold based on a psychoacoustic model, applying the weighting to the masking threshold to generate a modified masking threshold, and quantizing the audio signal using the modified masking threshold.
    Type: Grant
    Filed: May 25, 2009
    Date of Patent: March 3, 2015
    Assignees: LG Electronics Inc., Industry-Academic Cooperation Foundation, Yonsei University
    Inventors: Hyen-O Oh, Chang Heon Lee, Jeongook Song, Yang Won Jung, Hong Goo Kang
  • Patent number: 8972259
    Abstract: A method and system for teaching non-lexical speech effects includes delexicalizing a first speech segment to provide a first prosodic speech signal and data indicative of the first prosodic speech signal is stored in a computer memory. The first speech segment is audibly played to a language student and the student is prompted to recite the speech segment. The speech uttered by the student in response to the prompt, is recorded.
    Type: Grant
    Filed: September 9, 2010
    Date of Patent: March 3, 2015
    Assignee: Rosetta Stone, Ltd.
    Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
  • Patent number: 8972251
    Abstract: An electronic device for generating a masking signal is described. The electronic device includes a plurality of microphones and a speaker. The electronic device also includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a plurality of audio signals from the plurality of microphones. The electronic device also obtains an ambience signal based on the plurality of audio signals. The electronic device further determines an ambience feature based on the ambience signal. Additionally, the electronic device obtains a voice signal based on the plurality of audio signals. The electronic device also determines a voice feature based on the voice signal. The electronic device additionally generates a masking signal based on the voice feature and the ambience feature. The electronic device further outputs the masking signal using the speaker.
    Type: Grant
    Filed: June 7, 2011
    Date of Patent: March 3, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Pei Xiang, Joseph Jyh-huei Huang, Andre Gustavo Pucci Schevciw, Anthony Mauro, Erik Visser
  • Patent number: 8971217
    Abstract: Aspects of the present invention are directed at sending a data item from a sending client to a receiving client. In accordance with one embodiment, a method provides controls for generating an audio-based command to send a data item from a sending client to a receiving client. More specifically, the method includes receiving an audio stream at the sending client from a sending party. As the audio stream is being received, a determination is made regarding whether a command to send a data item to the receiving client was received. If a command to send a data item is included in the audio stream, the method identifies the data item that is the object of the command and then transmits the data item to the receiving client over the network.
    Type: Grant
    Filed: June 30, 2006
    Date of Patent: March 3, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Leonard Smith, Jr., David Milstein, Linda Criddle, Michael D. Malueg, Philip A. Chou
  • Patent number: 8972246
    Abstract: A method for embedding digital information into an audio signal, is provided. The method includes dividing the digital information into low-priority data and high-priority data; dividing the audio signal into first and second signal parts; embedding at least one echo signal into the first signal part; embedding a communication signal modulated with low-priority data, which has a spectrum according to psychoacoustic analysis of the second signal part, into the second signal part; and combining the embedded first and second signal parts.
    Type: Grant
    Filed: December 6, 2012
    Date of Patent: March 3, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Kyong-Ha Park, Sergey Zhidkov, Hyun-Su Hong
  • Patent number: 8965761
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Grant
    Filed: February 27, 2014
    Date of Patent: February 24, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: William Kress Bodin, Michael John Burkhart, Daniel G. Eisenhauer, Thomas James Watson, Daniel Mark Schumacher
  • Patent number: 8965756
    Abstract: Systems and methods to automatically equalize coloration in speech recordings is provided. In example embodiments, a reference spectral shape based on a reference signal is determined. An estimated spectral shape for an input signal is derived. Using the estimated spectral shape and the reference spectral shape a comparison is performed to determine gain settings. The gain settings comprise a gain value for each filter of a filter system. Using gain values associated with the gain setting, automatic equalization is performed on the input signal.
    Type: Grant
    Filed: March 14, 2011
    Date of Patent: February 24, 2015
    Assignee: Adobe Systems Incorporated
    Inventors: Sven Duwenhorst, Martin Schmitz
  • Patent number: 8959015
    Abstract: Provided is an apparatus for integrally encoding and decoding a speech signal and an audio signal. An encoding apparatus for integrally encoding a speech signal and an audio signal, may include: a module selection unit to analyze a characteristic of an input signal and to select a first encoding module for encoding a first frame of the input signal; a speech encoding unit to encode the input signal according to a selection of the module selection unit and to generate a speech bitstream; an audio encoding unit to encode the input signal according to the selection of the module selection unit and to generate an audio bitstream; and a bitstream generation unit to generate an output bitstream from the speech encoding unit or the audio encoding unit according to the selection of the module selection unit.
    Type: Grant
    Filed: July 14, 2009
    Date of Patent: February 17, 2015
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Tae Jin Lee, Seung Kwon Beack, Minje Kim, Dae Young Jang, Kyeongok Kang, Jin Woo Hong, Hochong Park, Young-Cheol Park
  • Patent number: 8959026
    Abstract: An apparatus and method for encoding/decoding a multi-channel signal may be provided. The apparatus of encoding a multi-channel signal may insert information about whether to encode a phase parameter indicating phase information of a plurality of channels, included in the multi-channel signal, in a bitstream of the multi-channel signal. The apparatus of decoding a multi-channel signal may determine whether to up-mix a mono signal using the phase parameter based on the information about whether to encode.
    Type: Grant
    Filed: October 28, 2009
    Date of Patent: February 17, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jung Hoe Kim, Eunmi Oh, Mi Young Kim, Ki Hyun Choo
  • Patent number: 8953800
    Abstract: A method is presented that uses steganographic codeword(s) carried in a speech payload in such a way that (i) the steganographic codeword(s) survive compression and/or transcoding as the payload travels from a transmitter to a receiver across at least one diverse network, and (ii) the embedded steganographic codeword(s) do not degrade the perceived voice quality of the received signal below an acceptable level. The steganographic codewords are combined with a speech payload by summing the amplitude of a steganographic codeword to the amplitude of the speech payload at a relatively low steganographic-to-speech bit rate. Advantageously, the illustrative embodiment of the present invention enables (i) steganographic codewords to be decoded by a compliant receiver and applied accordingly, and (ii) legacy or non-compliant receivers to play the received speech payload with resultant voice quality that is acceptable to listeners even though the steganographic codeword(s) remain in the received speech payload.
    Type: Grant
    Filed: February 18, 2010
    Date of Patent: February 10, 2015
    Assignee: Avaya Inc.
    Inventors: Anjur Sundaresan Krishnakumar, Lawrence O'Gorman
  • Patent number: 8948404
    Abstract: Provided is an apparatus and method of encoding and decoding multiple channel signals based upon phase information and one or more residual signals.
    Type: Grant
    Filed: October 22, 2010
    Date of Patent: February 3, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jung Hoe Kim, Eun Mi Oh
  • Patent number: 8949120
    Abstract: Systems and methods for controlling adaptivity of noise cancellation are presented. One or more audio signals are received by one or more corresponding microphones. The one or more signals may be decomposed into frequency sub-bands. Noise cancellation consistent with identified adaptation constraints is performed on the one or more audio signals. The one or more audio signals may then be reconstructed from the frequency sub-bands and outputted via an output device.
    Type: Grant
    Filed: April 13, 2009
    Date of Patent: February 3, 2015
    Assignee: Audience, Inc.
    Inventors: Mark Every, Ludger Solbach, Carlo Murgia, Ye Jiang
  • Patent number: 8938313
    Abstract: An auditory event boundary detector employs down-sampling of the input digital audio signal without an anti-aliasing filter, resulting in a narrower bandwidth intermediate signal with aliasing. Spectral changes of that intermediate signal, indicating event boundaries, may be detected using an adaptive filter to track a linear predictive model of the samples of the intermediate signal. Changes in the magnitude or power of the filter error correspond to changes in the spectrum of the input audio signal. The adaptive filter converges at a rate consistent with the duration of auditory events, so filter error magnitude or power changes indicate event boundaries. The detector is much less complex than methods employing time-to-frequency transforms for the full bandwidth of the audio signal.
    Type: Grant
    Filed: April 12, 2010
    Date of Patent: January 20, 2015
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: Glenn N. Dickins
  • Patent number: 8938387
    Abstract: The present invention teaches a new audio coding system that can code both general audio and speech signals well at low bit rates. A proposed audio coding system comprises linear prediction unit for filtering an input signal based on an adaptive filter; a transformation unit for transforming a frame of the filtered input signal into a transform domain; and a quantization unit for quantizing the transform domain signal. The quantization unit decides, based on input signal characteristics, to encode the transform domain signal with a model-based quantizer or a non-model-based quantizer. Preferably, the decision is based on the frame size applied by the transformation unit.
    Type: Grant
    Filed: May 28, 2013
    Date of Patent: January 20, 2015
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Per Hedelin, Pontus Carlsson, Leif Jonas Samuelsson, Michael Schug
  • Patent number: 8935157
    Abstract: An audio decoding system including a decoder decoding a first part of audio data, and an audio buffer compressor compressing and storing the decoded first part of audio data in a first time interval and decompressing the stored first part of audio data in a second time interval.
    Type: Grant
    Filed: March 22, 2011
    Date of Patent: January 13, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Byoungil Kim, Jongin Kim
  • Patent number: 8935158
    Abstract: Disclosed is a frame comparison apparatus and method for comparing frames included in an audio signal by using spectrum information. The frame comparison apparatus includes a spectrum information estimation apparatus for receiving an audio signal and estimating and outputting spectrum information for the respective frames included in the audio signal, an estimation operation option determiner for determining an estimation order of the spectrum information estimated from the spectrum information estimation apparatus, a frame comparison option determiner for determining a comparison order for the frames output from the spectrum information estimation apparatus, and a frame comparator for determining a comparison target frame which is a comparison target for a current frame included in the audio signal, comparing the spectrum information for the current frame with the spectrum information for the comparison target frame, and outputting a comparison result value.
    Type: Grant
    Filed: July 26, 2012
    Date of Patent: January 13, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Hyun-Soo Kim
  • Patent number: 8930183
    Abstract: A method of converting speech from the characteristics of a first voice to the characteristics of a second voice, the method comprising: receiving a speech input from a first voice, dividing said speech input into a plurality of frames; mapping the speech from the first voice to a second voice; and outputting the speech in the second voice, wherein mapping the speech from the first voice to the second voice comprises, deriving kernels demonstrating the similarity between speech features derived from the frames of the speech input from the first voice and stored frames of training data for said first voice, the training data corresponding to different text to that of the speech input and wherein the mapping step uses a plurality of kernels derived for each frame of input speech with a plurality of stored frames of training data of the first voice.
    Type: Grant
    Filed: August 25, 2011
    Date of Patent: January 6, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Byung Ha Chun, Mark John Francis Gales
  • Patent number: 8930201
    Abstract: An apparatus for processing an audio signal and method thereof are disclosed. The present invention includes receiving, by an audio processing apparatus, an audio signal including a first data of a first block encoded with rectangular coding scheme and a second data of a second block encoded with non-rectangular coding scheme; receiving a compensation signal corresponding to the second block; estimating a prediction of an aliasing part using the first data; and, obtaining a reconstructed signal for the second block based on the second data, the compensation signal and the prediction of aliasing part.
    Type: Grant
    Filed: August 6, 2013
    Date of Patent: January 6, 2015
    Assignee: Industry-Academic Cooperation Foundation, Yonsei University
    Inventors: Hyen-O Oh, Chang Heon Lee, Hong Goo Kang, Jung Wook Song
  • Patent number: 8930199
    Abstract: A method of processing an audio signal is disclosed.
    Type: Grant
    Filed: September 17, 2010
    Date of Patent: January 6, 2015
    Assignee: Industry-Academic Cooperation Foundation, Yonsei University
    Inventors: Hyen-O Oh, Chang Heon Lee, Hong Goo Kang, Jeongook Song
  • Patent number: 8924200
    Abstract: A method for decoding an audio signal in a decoder having a CELP-based decoder element including a fixed codebook component, at least one pitch period value, and a first decoder output, wherein a bandwidth of the audio signal extends beyond a bandwidth of the CELP-based decoder element. The method includes obtaining an up-sampled fixed codebook signal by up-sampling the fixed codebook component to a higher sample rate, obtaining an up-sampled excitation signal based on the up-sampled fixed codebook signal and an up-sampled pitch period value, and obtaining a composite output signal based on the up-sampled excitation signal and an output signal of the CELP-based decoder element, wherein the composite output signal includes a bandwidth portion that extends beyond a bandwidth of the CELP-based decoder element.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: December 30, 2014
    Assignee: Motorola Mobility LLC
    Inventors: Jonathan A. Gibbs, James P. Ashley, Udar Mittal
  • Patent number: 8924201
    Abstract: The present invention teaches a new audio coding system that can code both general audio and speech signals well at low bit rates. A proposed audio coding system comprises linear prediction unit for filtering an input signal based on an adaptive filter; a transformation unit for transforming a frame of the filtered input signal into a transform domain; and a quantization unit for quantizing the transform domain signal. The quantization unit decides, based on input signal characteristics, to encode the transform domain signal with a model-based quantizer or a non-model-based quantizer. Preferably, the decision is based on the frame size applied by the transformation unit.
    Type: Grant
    Filed: May 24, 2013
    Date of Patent: December 30, 2014
    Assignee: Dolby International AB
    Inventors: Per Hedelin, Pontus Carlsson, Leif Jonas Samuelsson, Michael Schug
  • Patent number: 8918126
    Abstract: A system for providing multimedia ring back for a voice-call is disclosed. The system may include an MMRB for VC control module and a network access module operatively connected to the MMRB. The network access module is adapted to interface the MMRB with external network components. The MMRB is responsive to indication that a caller is inviting a callee to join a voice-call wherein the caller and/or the callee is subscribed to an MMRB for VC service. The MMRB is responsive to indication that the callee received an invitation message and is now pending acceptance of the voice-call, for causing the caller to adapt its media-specification for the ongoing voice-call establishment process to a media-specification that is compatible with multimedia-content communication, thereby enabling a multimedia-content communication with the caller during at least a portion of the ongoing voice-call establishment process.
    Type: Grant
    Filed: May 3, 2012
    Date of Patent: December 23, 2014
    Assignee: Comverse Ltd
    Inventors: Ronen Shalom David, Ella Pinski, Noam Mordechai Eshel, Yael Ashkenazi
  • Patent number: 8914396
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for an iterative disambiguation interface. A system practicing the method receives a search query formatted according to a standard XML markup language for containing and annotating interpretations of user input, the search query being based on a natural language spoken query from a user and retrieves search results based on the search query. The system transmits the search results to a user device and iteratively receives multimodal input from the user to change search attributes and transmits updated search results to the user device based on the changed search attributes. The search results can include a link to additional information, such as a video presentation, related to the search results. The standard XML markup language can be Extensible MultiModal Annotation (EMMA) markup language from W3C. The system can generate an iteration transaction history for each multimodal input and updated search result.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: December 16, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Michael Johnston
  • Patent number: 8914401
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for providing an N-best list interface. A system practicing the method receives a search query formatted according to a standard language for containing and annotating interpretations of user input, the search query being based on a natural language spoken query from a user and retrieves an N-best list of recognition results based on the search query. The system then transmits the N-best list of recognition results to a user device, receives multimodal disambiguation input from the user, the input indicating an entry in the N-best list, and transmits to the user device additional information associated with the selected entry. The additional information can be a map indicating an address for the selected entry. The standard language can be XML-based Extensible MultiModal Annotation (EMMA) markup language from W3C.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: December 16, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Michael Johnston
  • Patent number: 8909517
    Abstract: A voice-coded in-band communication device monitors a voice-coded channel to detect data to present to a user. During operation, the communication device can detect a data-encoding signal from the voice-coded channel, such that the voice-coded channel can carry an audio signal that includes a voice signal and the data-encoding signal. The device decodes the data-encoding signal to detect a data element. The data element can include information that is to be presented to a local user, a request from a remote device for information about the local user, or information that the system can use to establish a peer-to-peer connection with the remote device over a separate data channel. The device can also generate a filtered audio signal to present to the user by removing the detected data-encoding signal from the voice-coded channel, and then reproduces the filtered audio signal for the user.
    Type: Grant
    Filed: August 3, 2012
    Date of Patent: December 9, 2014
    Assignee: Palo Alto Research Center Incorporated
    Inventors: Marc E. Mosko, Simon E. M. Barber
  • Patent number: 8909527
    Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Grant
    Filed: June 24, 2009
    Date of Patent: December 9, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Patent number: 8909539
    Abstract: A method for extending a bandwidth of a speech signal received, according to an embodiment of the present invention, includes: transforming the received speech signal into a frequency domain by decoding the received speech signal; normalizing the transformed speech signal; differentiating a voiced sound period or unvoiced sound period from the received speech signal; extracting, from the normalized speech signal, a first period including a harmonic component of the voiced sound period on the basis of the voiced sound period; extracting, from the normalized speech signal, a second period on the basis of correlation between the unvoiced sound period and the normalized speech signal; generating a high-band speech signal on the basis of the first period and the second period; and synthesizing the generated high-band speech signal and the transformed speech signal to output a wideband speech signal.
    Type: Grant
    Filed: December 7, 2012
    Date of Patent: December 9, 2014
    Assignee: Gwangju Institute of Science and Technology
    Inventors: Hong Kook Kim, Nam In Park
  • Patent number: 8909538
    Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.
    Type: Grant
    Filed: November 11, 2013
    Date of Patent: December 9, 2014
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: James Mark Kondziela
  • Patent number: 8898055
    Abstract: A voice quality conversion device including: a target vowel vocal tract information hold unit holding target vowel vocal tract information of each vowel indicating target voice quality; a vowel conversion unit (i) receiving vocal tract information with phoneme boundary information of the speech including information of phonemes and phoneme durations, (ii) approximating a temporal change of vocal tract information of a vowel in the vocal tract information with phoneme boundary information applying a first function, (iii) approximating a temporal change of vocal tract information of the same vowel held in the target vowel vocal tract information hold unit applying a second function, (iv) calculating a third function by combining the first function with the second function, and (v) converting the vocal tract information of the vowel applying the third function; and a synthesis unit synthesizing a speech using the converted information.
    Type: Grant
    Filed: May 8, 2008
    Date of Patent: November 25, 2014
    Assignee: Panasonic Intellectual Property Corporation of America
    Inventors: Yoshifumi Hirose, Takahiro Kamai, Yumiko Kato
  • Patent number: 8898568
    Abstract: An audio user interface that provides audio prompts that help a user interact with a user interface of an electronic device is disclosed. The audio prompts can provide audio indicators that allow a user to focus his or her visual attention upon other tasks such as driving an automobile, exercising, or crossing a street, yet still enable the user to interact with the user interface. An intelligent path can provide access to different types of audio prompts from a variety of different sources. The different types of audio prompts may be presented based on availability of a particular type of audio prompt. As examples, the audio prompts may include pre-recorded voice audio, such as celebrity voices or cartoon characters, obtained from a dedicate voice server. Absent availability of pre-recorded or synthesized audio data, non-voice audio prompts may be provided.
    Type: Grant
    Filed: September 9, 2008
    Date of Patent: November 25, 2014
    Assignee: Apple Inc.
    Inventors: William Bull, Ben Rottler, Jonathan A. Schiller
  • Patent number: 8892426
    Abstract: Methods of, apparatuses for, and computer readable media having instructions thereon that when executed cause carrying out methods of determining and modifying the perceived loudness of a frequency domain audio signal where the frequency resolution, and corresponding temporal coverage of the frequency domain information is not constant. The frequency (and thus temporal) resolution of the perceived loudness processing is maintained constant at the longest block size. One method includes a block combiner and a loudness modification interpolator.
    Type: Grant
    Filed: June 23, 2011
    Date of Patent: November 18, 2014
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: Michael J. Smithers
  • Patent number: 8886527
    Abstract: A purpose is to suppress recognition process delay generated due to load in signal processing. Included is a speech input means 10 that inputs a speech signal, an output evaluation means 20 that evaluates whether or not the speech signal input by the speech input means 10 is the speech signal in a sound section, which is a speech section assuming that a speaker is speaking, and outputs the speech signal as a speech signal to be processed only when evaluated as the speech signal in the sound section, a signal processing means 30 that performs signal processing to the speech signal, which is output by the output evaluation means 20 as the speech signal to be processed, and a speech recognition processing means 40 that performs a speech recognition process to the speech signal which is signal-processed by the signal processing means 30.
    Type: Grant
    Filed: April 16, 2009
    Date of Patent: November 11, 2014
    Assignee: NEC Corporation
    Inventor: Toru Iwasawa
  • Patent number: 8886542
    Abstract: A voice interactive service system provides different speech-based services to a plurality of users. Using a communication terminal, the services are accessed via a telecommunication network through service-specific connectivity ports. The system comprises processing cores which have different configurations of speech processing resources for performing different services. For performing a requested service, a connection module establishes a connection between the respective connectivity port and a processing core having a configuration of speech processing resources suitable for performing the requested service. Because of the service-specific resourcing of cores, there is no need for requesting and allocating processing resources from external resource servers. Moreover, the port-dedicated resourcing of the cores ensures that a successful access to a connectivity port leads to a successful provision of the requested service.
    Type: Grant
    Filed: August 26, 2010
    Date of Patent: November 11, 2014
    Inventors: Roger Lagadec, Patrik Estermann, Luciano Butera
  • Patent number: 8880606
    Abstract: Disclosed is a flexible, multi-modal system useful in communications among users, capable of synchronizing real world and augmented reality, wherein the system is deployed in centralized and distributed computational platforms. The system comprises input devices to generate signals representing speech, gestures, pointing direction, and location of a user, and transmit the same to a multi-modal interface. A plurality of agents and one or more databases are integrated into the system, where at least some of the agents receive signals from the multi-modal interface, translate the signals into data, compare the same to a database, generate signals representing meanings as defined by the database, and transmit the signals to the multi-modal interface. Finally, a plurality of output devices are associated with the system to receive and process signals from the multi-modal interface, some of said signals representing messages to the user to be communicated by means of an output device.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: November 4, 2014
    Assignee: Applied Research Associates, Inc.
    Inventors: Roberto Aldunate, Gregg E Larson
  • Patent number: 8874436
    Abstract: One embodiment of the present invention is a method for playing a portion of a media work which includes steps of: (a) playing the media work; (b) receiving input from a user; and (c) analyzing parameters to determine the portion of the media work to play; (d) altering at least a part of the portion; and (e) playing the portion.
    Type: Grant
    Filed: April 3, 2007
    Date of Patent: October 28, 2014
    Assignee: Enounce, Inc.
    Inventor: Richard S. Goldhor