Pitch Determination Of Speech Signals (epo) Patents (Class 704/E11.006)
  • Publication number: 20130325455
    Abstract: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.
    Type: Application
    Filed: June 4, 2012
    Publication date: December 5, 2013
    Applicants: INTERNATIONAL BUSINESS MACHINES CORPORATION, UZDAROJI AKCINÊ BENDROVÊ LIETUVOS TYRIMU CENTRAS
    Inventors: Aharon Satt, Zvi Kons, Ron Hoory
  • Publication number: 20130231924
    Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.
    Type: Application
    Filed: August 20, 2012
    Publication date: September 5, 2013
    Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S.H. Chu, Shawn E. Stevenson
  • Publication number: 20130136276
    Abstract: A method and apparatus for receiving and playing a signal in a radio receiver to suppress microphonic feedback are provided by alternately pitch shifting a received audio signal. The pitch of the received audio signal is alternately shifted up and then down, repeatedly over successive intervals of the audio signal, to produce a pitch swing signal which is then played over a speaker. The alternating pitch shifting prevents the buildup of regenerative feedback normally caused by acoustic vibrations coupling into the radio receiver.
    Type: Application
    Filed: November 29, 2011
    Publication date: May 30, 2013
    Applicant: MOTOROLA SOLUTIONS, INC.
    Inventors: V. C. PRAKASH VK CHACKO, THEAN HAI OOI, KAR BOON OUNG, CHEAH HENG TAN, HUOY THYNG YOW
  • Publication number: 20130117014
    Abstract: Disclosed are various embodiments of multiple microphone based pitch detection. In one embodiment, a method includes obtaining a primary signal and a secondary signal associated with multiple microphones. A pitch value is determined based at least in part upon a level difference between the primary and secondary signals. In another embodiment, a system includes a plurality of microphones configured to provide a primary signal and a secondary signal. A level difference detector is configured to determine a level difference between the primary and secondary signals and a pitch identifier is configured to clip the primary and secondary signals based at least in part upon the level difference. In another embodiment, a method determines the presence of voice activity based upon a pitch prediction gain variation that is determined based at least in part upon a pitch lag.
    Type: Application
    Filed: November 7, 2011
    Publication date: May 9, 2013
    Applicant: BROADCOM CORPORATION
    Inventors: Xianxian Zhang, Alfonsus Lunardhi
  • Publication number: 20130046533
    Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a region of the signal representing speech. The region can comprise a portion of a frame of the signal representing speech classified as a voiced frame. The region can be marked based on one or more pitch estimates for the region. A cord can be identified within the region based on occurrence of one or more events within the region of the signal. For example, the one or more events can comprise one or more glottal pulses. In such cases, cord can begin with onset of a first glottal pulse and extend to a point prior to an onset of a second glottal pulse. The cord may exclude a portion of the region of the signal prior to the onset of the second glottal pulse.
    Type: Application
    Filed: October 19, 2012
    Publication date: February 21, 2013
    Applicant: RED SHIFT COMPANY, LLC
    Inventor: RED SHIFT COMPANY, LLC
  • Publication number: 20130041657
    Abstract: A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and a representation of harmonic envelope at the estimated pitch. The estimated pitch and the representation of harmonic envelope may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.
    Type: Application
    Filed: August 8, 2011
    Publication date: February 14, 2013
    Applicant: The Intellisis Corporation
    Inventors: David C. BRADLEY, Rodney Gateau, Daniel S. Goldin, Robert N. Hilton, Nicholas K. Fisher
  • Publication number: 20130041656
    Abstract: A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and an estimated fractional chirp rate of the harmonics at the estimated pitch. The estimated pitch and the estimated fractional chirp rate may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.
    Type: Application
    Filed: August 8, 2011
    Publication date: February 14, 2013
    Applicant: The Intellisis Corporation
    Inventors: David C. BRADLEY, Daniel S. GOLDIN, Rodney GATEAU, Nicholas K. FISHER, Robert N. HILTON, Derrick R. ROOS, Eric WIEWIORA
  • Publication number: 20130024192
    Abstract: Disclosed is an information display system provided with: a signal analyzing unit which analyzes the audio signals obtained from a predetermined location and which generates ambient sound information regarding the sound generated at the predetermined location; and an ambient expression selection unit which selects an ambient expression which expresses the content of what a person is feeling from the sound generated at the predetermined location on the basis of the ambient sound information.
    Type: Application
    Filed: March 28, 2011
    Publication date: January 24, 2013
    Applicant: NEC CORPORATION
    Inventors: Toshiyuki Nomura, Yuzo Senda, Kyota Higa, Takayuki Arakawa, Yasuyuki Mitsui
  • Publication number: 20120239389
    Abstract: Disclosed is an audio signal processing method comprising the steps of: receiving an audio signal containing current frame data; generating a first temporary output signal for the current frame when an error occurs in the current frame data, by carrying out frame error concealment with respect to the current frame data a random codebook; generating a parameter by carrying out one or more of short-term prediction, long-term prediction and a fixed codebook search based on the first temporary output signal; and memory updating the parameter for the next frame; wherein the parameter comprises one or more of pitch gain, pitch delay, fixed codebook gain and a fixed codebook.
    Type: Application
    Filed: November 24, 2010
    Publication date: September 20, 2012
    Applicant: LG ELECTRONICS INC.
    Inventors: Hye Jeong Jeon, Dae Hwan Kim, Hong Goo Kang, Min Ki Lee, Byung Suk Lee, Gyu Hyeok Jeong
  • Publication number: 20120209598
    Abstract: A state detecting device includes an input unit that receives an input voice sound; an analyzer that calculates a feature parameter of each of plurality of frames extracted from the voice sound; a calculator that calculates the average of the feature parameters of the frames, determines a threshold on the basis of the average and statistical data representing relationships between other averages of other feature parameters obtained from a plurality of speakers and cumulative frequencies of the other feature parameters, and calculates an appearance frequency of a frame that is among the plurality of frames and whose feature parameter is larger than the threshold; a determining unit that determines, on the basis of the appearance frequency, a strained state of a vocal cord that has made the voice sound; and an output unit that outputs a result of the determination.
    Type: Application
    Filed: January 23, 2012
    Publication date: August 16, 2012
    Applicant: FUJITSU LIMITED
    Inventors: Shoji HAYAKAWA, Naoshi MATSUO
  • Publication number: 20120185244
    Abstract: According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.
    Type: Application
    Filed: January 26, 2012
    Publication date: July 19, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masahiro Morita, Javier Latorre, Takehiko Kagoshima
  • Publication number: 20120166187
    Abstract: A system and method for audio synthesizer utilizing frequency aperture cells (FAC) and frequency aperture arrays (FAA). In accordance with an embodiment, an audio processing system can be provided for the transformation of audio-band frequencies for musical and other purposes. In accordance with an embodiment, a single stream of mono, stereo, or multi-channel monophonic audio can be transformed into polyphonic music, based on a desired target musical note or set of multiple notes. At its core, the system utilizes an input waveform(s) (which can be either file-based or streamed) which is then fed into an array of filters, which are themselves optionally modulated, to generate a new synthesized audio output.
    Type: Application
    Filed: August 26, 2011
    Publication date: June 28, 2012
    Applicant: SONIC NETWORK, INC.
    Inventors: James Edwin Van Buskirk, Jennifer Hruska, Jason Jordan, Al Joelson, Borislav Zlatkov
  • Publication number: 20120143601
    Abstract: The invention relates to a method for determining a quality indicator representing a perceived quality of an output signal of an audio system with respect to a reference signal. The reference signal and the output signal are processed and compared. The processing includes dividing the reference signal and the output signal into mutually corresponding time frames. Additionally, the processing includes scaling the intensity of the reference signal towards a fixed intensity level, and then performing measurements on time frames within the scaled reference signal for determining reference signal time frame characteristics. The intensity of the reference signal is then scaled from the fixed intensity level towards an intensity level related to the output signal. Further on in the method, the loudness of the output signal is scaled towards a fixed loudness level in the perceptual loudness domain. This scaling action uses the reference signal time frame characteristics.
    Type: Application
    Filed: August 9, 2010
    Publication date: June 7, 2012
    Applicants: Nederlandse Organsatie Voor Toegespast-Natuurweten schappelijk Onderzoek TNO, KONINKLIJKE KPN N.V.
    Inventors: John Gerard Beerends, Jeroen Van Vugt
  • Publication number: 20120136655
    Abstract: A signal portion is extracted per frame having a specific duration from an input signal, thus generating a per-frame input signal. The per-frame input signal in the time domain is converted into a per-frame input signal in the frequency domain, thereby generating a spectral pattern of spectra. Peak spectra having peaks are detected in the spectral pattern. A harmonic spectrum is determined, in the peak spectra, having a harmonic structure showing a relationship between a fundamental pitch and a harmonic overtone.
    Type: Application
    Filed: November 28, 2011
    Publication date: May 31, 2012
    Applicant: JVC KENWOOD Corporation a corporation of Japan
    Inventor: Takaaki YAMABE
  • Publication number: 20120116756
    Abstract: In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.
    Type: Application
    Filed: November 10, 2010
    Publication date: May 10, 2012
    Applicant: Sony Computer Entertainment Inc.
    Inventor: Ozlem Kalinli
  • Publication number: 20120109645
    Abstract: There is provided a unique signal processing technique for localizing and characterizing each of a number of differently located acoustic sources. Specifically there is provided a method for auditory segregation of multiple voice inputs comprising the steps of: receiving a plurality of voice input signals from different source locations; filtering said voice input signals with head related transfer functions (HRTF) using a digital signal processor (DSP) thereby assigning the voice input signals to different locations in virtual auditory space; and changing the HRTF filtered voice input signals in two dimensions, wherein pitch is changed and the signal is filtered with different filters emulating vocal tracts of different sizes thereby further segregating the voice input signals from each other.
    Type: Application
    Filed: June 23, 2010
    Publication date: May 3, 2012
    Applicant: LIZARD TECHNOLOGY
    Inventors: John Hallam, Jakob Christensen-Dalsgaard
  • Publication number: 20120101815
    Abstract: Described is a technology by which a user hums, sings or otherwise plays a user-provided rendition of a ringtone (or ringback tone) through a mobile telephone to a ringtone search service (e.g., a WAP, interactive voice response or SMS-based search platform). The service matches features of the user's rendition against features of actual ringtones to determine one or more matching candidate ringtones for downloading. Features may include pitch contours (up or down), pitch intervals and durations of notes. Matching candidates may be ranked based on the determined similarity, possibly in conjunction with weighting criterion such as the popularity of the ringtone and/or the importance of the matched part. The candidate set may be augmented with other ringtones independent of the matching, such as the most popular ones downloaded by other users, ringtones from similar artists, and so forth.
    Type: Application
    Filed: December 29, 2011
    Publication date: April 26, 2012
    Applicant: Microsoft Corporation
    Inventors: Lie LU, Yutao XIE, Sing XIE, Jiafan OU, Ruihao WENG
  • Publication number: 20120101814
    Abstract: Various techniques are disclosed for improving packet loss concealment to reduce artifacts by using audio character measures of the audio signal. These techniques include attenuation to a noise fill instead of attenuation to silence, varying how long to wait before attenuating the extrapolation, varying the rate of attenuation of the extrapolation, attenuating periodic extrapolation at a different rate than non-periodic extrapolation, and performing period extrapolation on successively longer fill data based on the audio character measures, adjusting weighting between periodic and non-periodic extrapolation based on the audio character measures, and adjusting weighting between periodic extrapolation and non-periodic extrapolation non-linearly.
    Type: Application
    Filed: October 25, 2010
    Publication date: April 26, 2012
    Applicant: POLYCOM, INC.
    Inventor: Eric David Elias
  • Publication number: 20120089391
    Abstract: Methods for estimating speech model parameters are disclosed. For pulsed parameter estimation, a speech signal is divided into multiple frequency bands or channels using bandpass filters. Channel processing reduces sensitivity to pole magnitudes and frequencies and reduces impulse response time duration to improve pulse location and strength estimation performance. These methods are useful for high quality speech coding and reproduction at various bit rates for applications such as satellite and cellular voice communication.
    Type: Application
    Filed: October 7, 2011
    Publication date: April 12, 2012
    Applicant: Digital Voice Systems, Inc.
    Inventor: Daniel W. Griffin
  • Publication number: 20120072209
    Abstract: An electronic device for estimating a pitch lag is described. The electronic device includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a current frame. The electronic device also obtains a residual signal based on the current frame. The electronic device additionally determines a set of peak locations based on the residual signal. Furthermore, the electronic device obtains a set of pitch lag candidates based on the set of peak locations. The electronic device also estimates a pitch lag based on the set of pitch lag candidates.
    Type: Application
    Filed: September 8, 2011
    Publication date: March 22, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Venkatesh Krishnan, Stephane Pierre Villette
  • Publication number: 20120072208
    Abstract: An electronic device for determining a set of pitch cycle energy parameters is described. The electronic device includes a processor and executable instructions stored in memory. The electronic device obtains a frame, a set of filter coefficients and a residual signal based on the frame and the set of filter coefficients. The electronic device determines a set of peak locations based on the residual signal and segments the residual signal such that each segment includes one peak. The electronic device determines a first set of pitch cycle energy parameters based on a frame region between two consecutive peak locations and maps regions between peaks in the residual signal to regions between peaks in a synthesized excitation signal to produce a mapping. The electronic device determines a second set of pitch cycle energy parameters based on the first set of pitch cycle energy parameters and the mapping.
    Type: Application
    Filed: September 8, 2011
    Publication date: March 22, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Venkatesh Krishnan, Stephane Pierre Villette
  • Publication number: 20120058747
    Abstract: A method for communication and for displaying an interactive avatar or hologram corresponding to a remote party.
    Type: Application
    Filed: September 8, 2010
    Publication date: March 8, 2012
    Inventors: James Yiannios, Mourad Ben Ayed
  • Publication number: 20120053933
    Abstract: According to one embodiment, a first storage unit stores n band noise signals obtained by applying n band-pass filters to a noise signal. A second storage unit stores n band pulse signals. A parameter input unit inputs a fundamental frequency, n band noise intensities, and a spectrum parameter. A extraction unit extracts for each pitch mark the n band noise signals while shifting. An amplitude control unit changes amplitudes of the extracted band noise signals and band pulse signals in accordance with the band noise intensities. A generation unit generates a mixed sound source signal by adding the n band noise signals and the n band pulse signals. A generation unit generates the mixed sound source signal generated based on the pitch mark. A vocal tract filter unit generates a speech waveform by applying a vocal tract filter using the spectrum parameter to the generated mixed sound source signal.
    Type: Application
    Filed: March 18, 2011
    Publication date: March 1, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima
  • Publication number: 20120022859
    Abstract: An automatic marking method for Karaoke vocal accompaniment is provided. In the method, pitch, beat position and volume of a singer are compared with the original pitch, beat position and volume of the theme of a song to generate a score of pitch, a score of beat and a score of emotion respectively, so as to obtain a weighted total score in a weighted marking method. By using the method, the pitch, beat position and volume error of each section of the song sung by the singer can be exactly worked out, and a pitch curve and a volume curve can be displayed, so that the singer can learn which part is sung incorrectly and which part needs to be enhanced. The present invention also has the advantages of dual effects of teaching and entertainment, high practicability and technical advancement.
    Type: Application
    Filed: April 7, 2009
    Publication date: January 26, 2012
    Inventor: Wen-Hsin Lin
  • Publication number: 20120004908
    Abstract: A voice recognition terminal executes a local voice recognition process and utilizes an external center voice recognition process. The terminal includes: a voice message synthesizing element for synthesizing at least one of a voice message to be output from a speaker according to the external center voice recognition process and a voice message to be output from the speaker according to the local voice recognition process so as to distinguish between characteristics of the voice message to be output from the speaker according to the external center voice recognition process and characteristics of the voice message to be output from the speaker according to the local voice recognition process; and a voice output element for outputting a synthesized voice message from the speaker.
    Type: Application
    Filed: June 28, 2011
    Publication date: January 5, 2012
    Applicant: DENSO CORPORATION
    Inventors: Kunio YOKOI, Kazuhisa SUZUKI, Masayuki TAKAMI, Naoyori TANZAWA
  • Patent number: 8063809
    Abstract: A transient signal encoding method and device, decoding method and device, and processing system, where the transient signal encoding method includes: obtaining a reference sub-frame where a maximal time envelope having a maximal amplitude value is located from time envelopes of all sub-frames of an input transient signal; adjusting an amplitude value of the time envelope of each sub-frame before the reference sub-frame in such a way that a first difference is greater than a preset first threshold, in which the first difference is a difference between the amplitude value of the time envelope of each sub-frame before the reference sub-frame and the amplitude value of the maximal time envelope; and writing the adjusted time envelope into bitstream.
    Type: Grant
    Filed: June 29, 2011
    Date of Patent: November 22, 2011
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Zexin Liu, Longyin Chen, Lei Miao, Chen Hu, Wei Xiao, Herve Marcel Taddei, Qing Zhang
  • Publication number: 20110282658
    Abstract: The present invention relates to co-channel audio source separation. In one embodiment a first frequency-related representation of plural regions of the acoustic signal is prepared over time, and a two-dimensional transform of plural two-dimensional localized regions of the first frequency-related representation, each less than an entire frequency range of the first frequency related representation, is obtained to provide a two-dimensional compressed frequency-related representation with respect to each two dimensional localized region. For each of the plural regions, at least one pitch is identified. The pitch from the plural regions is processed to provide multiple pitch estimates over time. In another embodiment, a mixed acoustic signal is processed by localizing multiple time-frequency regions of a spectrogram of the mixed acoustic signal to obtain one or more acoustic properties.
    Type: Application
    Filed: September 3, 2010
    Publication date: November 17, 2011
    Applicant: Massachusetts Institute of Technology
    Inventors: Tianyu Wang, Thomas R. Quatieri, JR.
  • Publication number: 20110276324
    Abstract: An enhancement system extracts pitch from a processed speech signal. The system estimates the pitch of voiced speech by deriving filter coefficients of an adaptive filter and using the obtained filter coefficients to derive pitch. The pitch estimation may be enhanced by using various techniques to condition the input speech signal, such as spectral modification of the background noise and the speech signal, and/or reduction of the tonal noise from the speech signal.
    Type: Application
    Filed: May 11, 2011
    Publication date: November 10, 2011
    Inventors: Rajeev Nongpiur, Phillip A. Hetherington
  • Publication number: 20110276323
    Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.
    Type: Application
    Filed: May 6, 2010
    Publication date: November 10, 2011
    Applicant: Senam Consulting, Inc.
    Inventor: Serge Olegovich Seyfetdinov
  • Publication number: 20110257965
    Abstract: Encoding a sequence of digital speech samples into a bit stream includes dividing the digital speech samples into one or more frames and computing a set of model parameters for the frames. The set of model parameters includes at least a first parameter conveying pitch information. The voicing state of a frame is determined and the first parameter conveying pitch information is modified to designate the determined voicing state of the frame, if the determined voicing state of the frame is equal to one of a set of reserved voicing states. The model parameters are quantized to generate quantizer bits which are used to produce the bit stream.
    Type: Application
    Filed: June 27, 2011
    Publication date: October 20, 2011
    Applicant: DIGITAL VOICE SYSTEMS, INC.
    Inventor: John C. Hardwick
  • Publication number: 20110251840
    Abstract: Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at a portable computing device (such as a mobile phone, personal digital assistant, laptop computer, notebook computer, pad-type computer or netbook) in accord with pitch correction settings. In some cases, pitch correction settings include a score-coded melody and/or harmonies supplied with, or for association with, the lyrics and backing tracks.
    Type: Application
    Filed: April 12, 2011
    Publication date: October 13, 2011
    Inventors: Perry R. Cook, Ari Lazier, Tom Lieber, Turner E. Kirk
  • Publication number: 20110251841
    Abstract: Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.
    Type: Application
    Filed: April 12, 2011
    Publication date: October 13, 2011
    Inventors: Perry R. Cook, Ari Lazier, Tom Lieber, Turner E. Kirk
  • Publication number: 20110251842
    Abstract: Using signal processing techniques described herein, pitch detection and correction of a user's vocal performance can be performed continuously and in real-time with respect to the audible rendering of the backing track at the handheld or portable computing device. In some implementations, pitch detection builds on time-domain pitch correction techniques that employ average magnitude difference function (AMDF) or autocorrelation-based techniques together with zero-crossing and/or peak picking techniques to identify differences between pitch of a captured vocal signal and score-coded target pitches. Based on detected differences, pitch correction based on pitch synchronous overlapped add (PSOLA) and/or linear predictive coding (LPC) techniques allow captured vocals to be pitch shifted in real-time to “correct” notes in accord with pitch correction settings that code score-coded melody targets and harmonies.
    Type: Application
    Filed: April 12, 2011
    Publication date: October 13, 2011
    Inventors: Perry R. Cook, Ari Lazier, Tom Lieber
  • Publication number: 20110246188
    Abstract: A music sound generation system is formed with a high sound quality and with a small size using a large-capacity NAND flash memory for storing music sound data. Music sound data is divided into N pitch groups and stored into N different storage modules as being divided in these storage modules. A sound generation command classification unit (3000) classifies sound generation commands provided from an external unit into N sound generation command groups. A read command unit in each access module reads data from a storage module based on the sound generation command group. This structure enables music sound data to be read from a plurality of storage modules in parallel.
    Type: Application
    Filed: May 26, 2010
    Publication date: October 6, 2011
    Inventor: Masahiro Nakanishi
  • Publication number: 20110224977
    Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.
    Type: Application
    Filed: September 14, 2010
    Publication date: September 15, 2011
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Kazuhiro NAKADAI, Takuma OTSUKA, Hiroshi OKUNO
  • Publication number: 20110218800
    Abstract: The present invention relates to a method and apparatus for obtaining a pitch gain, and a coder and a decoder. The method includes: obtaining information about an input signal; and obtaining a pitch gain corresponding to the information about the input signal according to the correspondence between the signal information and the pitch gain. The embodiments of the present invention obtain the corresponding pitch gain according to the signal information by using the obtained correspondence between the signal information and the pitch gain, and the pitch gain is applicable to the coder and the decoder, thus making it unnecessary for the coder to transmit the pitch gain to the decoder and solving the problem of bit overhead. The embodiments of the present invention determine the pitch gain adaptively according to the signal information, avoid consumption of extra bits for quantizing the pitch gain, avoid impact on the coding performance, and improve the compression ratio.
    Type: Application
    Filed: May 17, 2011
    Publication date: September 8, 2011
    Applicant: Huawei Technologies Co., Ltd.
    Inventors: Dejun Zhang, Lei Miao, Jianfeng Xu, Fengyan Qi, Qing Zhang, Lixiong Li, Fuwei Ma
  • Publication number: 20110196674
    Abstract: A spectrum coding apparatus capable of performing coding at a low bit rate and with high quality is disclosed. This apparatus is provided with a section that performs the frequency transformation of a first signal and calculates a first spectrum, a section that converts the frequency of a second signal and calculates a second spectrum, a section that estimates the shape of the second spectrum in a band of FL?k<FH using a filter having the first spectrum in a band of 0?k<FL as an internal state and a section that codes an outline of the second spectrum determined based on a coefficient indicating the characteristic of the filter at this time.
    Type: Application
    Filed: April 17, 2011
    Publication date: August 11, 2011
    Applicant: PANASONIC CORPORATION
    Inventor: Masahiro Oshikiri
  • Publication number: 20110196673
    Abstract: An electronic device for reconstructing a lost packet in a Sub-Band Coding (SBC) decoder is described. The electronic device includes a processor and instructions stored in memory. The electronic device detects a lost packet, obtains a zero-input response of a synthesis filter bank and obtains a coarse pitch estimate. The electronic device also obtains a fine pitch estimate based on the zero-input response and the coarse pitch estimate. The electronic device selects a last pitch period based on the fine pitch estimate and uses samples from the last pitch period for the lost packet.
    Type: Application
    Filed: January 26, 2011
    Publication date: August 11, 2011
    Applicant: QUALCOMM Incorporated
    Inventors: Amit Sharma, Jeremy P. Toman, Hyun Jin Park, Sang-Uk Ryu
  • Publication number: 20110191102
    Abstract: In some embodiments, a processor-readable medium stores code representing instructions to cause a processor to receive an input signal having a first component and a second component. An estimate of the first component of the input signal is calculated based on an estimate of a pitch of the first component of the input signal. An estimate of the input signal is calculated based on the estimate of the first component of the input signal and an estimate of the second component of the input signal. The estimate of the first component of the input signal is modified based on a scaling function to produce a reconstructed first component of the input signal. The scaling function is a function of at least one of the input signal, the estimate of the first component of the input signal, the estimate of the second component of the input signal, or a residual signal.
    Type: Application
    Filed: January 31, 2011
    Publication date: August 4, 2011
    Applicant: UNIVERSITY OF MARYLAND, COLLEGE PARK
    Inventors: Carol Espy-Wilson, Srikanth Vishnubhotla
  • Publication number: 20110184732
    Abstract: A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.
    Type: Application
    Filed: April 4, 2011
    Publication date: July 28, 2011
    Applicant: DITECH NETWORKS, INC.
    Inventor: Mahesh Godavarti
  • Publication number: 20110153317
    Abstract: An apparatus for wireless communications includes a processing system. The processing system is configured to receive an input sound stream of a user, split the input sound stream into a plurality of frames, classify each of the frames as one selected from the group consisting of a non-speech frame and a speech frame, determine a pitch of each of the frames in a subset of the speech frames, and identify a gender of the user from the determined pitch. To determine the pitch, the processing system is configured to filter the speech frames to compute an error signal, compute an autocorrelation of the error signal, find a maximum autocorrelation value, and set the pitch to an index of the maximum autocorrelation value.
    Type: Application
    Filed: December 23, 2009
    Publication date: June 23, 2011
    Applicant: QUALCOMM INCORPORATED
    Inventors: Yinian Mao, Gene Marsh
  • Publication number: 20110144982
    Abstract: Vocal musical performances may be captured and continuously pitch-corrected at a mobile device for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. In some cases, such pitch correction settings code a particular key or scale for the vocal performance or for portions thereof. In some cases, pitch correction settings include a score-coded melody sequence of note targets supplied with, or for association with, the lyrics and/or backing track. In some cases, pitch correction settings are dynamically variable based on gestures captured at a user interface.
    Type: Application
    Filed: September 4, 2010
    Publication date: June 16, 2011
    Inventors: Spencer Salazar, Rebecca A. Fiebrink, Ge Wang, Mattias Ljungström, Jeffrey C. Smith, Perry R. Cook
  • Publication number: 20110144981
    Abstract: Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.
    Type: Application
    Filed: September 4, 2010
    Publication date: June 16, 2011
    Inventors: Spencer Salazar, Rebecca A. Fiebrink, Ge Wang, Mattias Ljungström, Jeffrey C. Smith, Perry R. Cook
  • Publication number: 20110125491
    Abstract: The perceived quality of a speech signal is improved by estimating the average power of first and second signal components and applying a first gain factor to the second signal components to generate adjusted second signal components. The first gain factor is selected such that on application of the first gain factor to the second signal components, the ratio of the average power of the first signal components to the average power of the adjusted second signal components would be a first predetermined value, the first predetermined value being such as to inhibit perceptual distortion of the improved speech signal.
    Type: Application
    Filed: November 23, 2009
    Publication date: May 26, 2011
    Inventors: Rogerio Guedes Alves, Kuan-Chieh Yen, Michael Christopher Vartanian, Sameer Arun Gadre
  • Publication number: 20110125492
    Abstract: The perceived quality of a narrowband speech signal truncated from a wideband speech signal is improved by generating in a third frequency band third speech components matching first speech components in a first frequency band of the narrowband signal, and generating in a fourth frequency band fourth speech components matching second speech components in a second frequency band of the narrowband signal. A first gain factor is applied to the third speech components to generate adjusted third speech components, and a second gain factor is applied to the fourth speech components to generate adjusted fourth speech components, the gain factors being selected such that the ratios of the average powers of the adjusted third and fourth speech components to the average power of the first speech components are predetermined values.
    Type: Application
    Filed: November 23, 2009
    Publication date: May 26, 2011
    Applicant: CAMBRIDGE SILICON RADIO LIMITED
    Inventors: Rogerio Guedes Alves, Kuan-Chieh Yen, Michael Christopher Vartanian, Sameer Arun Gadre
  • Publication number: 20110125493
    Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.
    Type: Application
    Filed: January 31, 2011
    Publication date: May 26, 2011
    Inventors: Yoshifumi Hirose, Takahiro Kamai
  • Publication number: 20110099005
    Abstract: A framing method and apparatus are disclosed to overcome inconsistency of gains between sub-frames caused by simple average framing in the prior art. The method includes: obtaining the Linear Prediction Coding (LPC) order and the pitch of the signal; removing the samples inapplicable to Long-Term Prediction (LTP) synthesis according to the LPC prediction order and the pitch; and splitting the remaining samples of the signal into several sub-frames. The technical solution under the present invention is applicable to the multimedia speech coding field.
    Type: Application
    Filed: December 30, 2010
    Publication date: April 28, 2011
    Inventors: Dejun ZHANG, Fengyan Qi, Lei Miao, Jianfeng Xu, Qing Zhang, Lixiong Li, Fuwei Ma
  • Publication number: 20110087489
    Abstract: The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.
    Type: Application
    Filed: December 21, 2010
    Publication date: April 14, 2011
    Inventor: David A. Kapilow
  • Publication number: 20110087488
    Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.
    Type: Application
    Filed: December 16, 2010
    Publication date: April 14, 2011
    Inventors: Ryo Morinaka, Takehiko Kagoshima
  • Publication number: 20110066426
    Abstract: A speech recognition apparatus and method for real-time speaker adaptation are provided. The speech recognition apparatus may estimate a pitch of a speech section from an inputted speech signal, extract a speech feature for speech recognition based on the estimated pitch, and perform speech recognition with respect to the speech signal based on the speech feature. The speech recognition apparatus may be adaptively normalized depending on a speaker. Thus, the speech recognition apparatus may extract a speech feature for speech recognition, and may improve a performance of speech recognition based on the extracted speech feature.
    Type: Application
    Filed: July 15, 2010
    Publication date: March 17, 2011
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Gil Ho LEE