Pitch Determination Of Speech Signals (epo) Patents (Class 704/E11.006)
-
Publication number: 20130325455Abstract: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.Type: ApplicationFiled: June 4, 2012Publication date: December 5, 2013Applicants: INTERNATIONAL BUSINESS MACHINES CORPORATION, UZDAROJI AKCINÊ BENDROVÊ LIETUVOS TYRIMU CENTRASInventors: Aharon Satt, Zvi Kons, Ron Hoory
-
Publication number: 20130231924Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.Type: ApplicationFiled: August 20, 2012Publication date: September 5, 2013Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S.H. Chu, Shawn E. Stevenson
-
Publication number: 20130136276Abstract: A method and apparatus for receiving and playing a signal in a radio receiver to suppress microphonic feedback are provided by alternately pitch shifting a received audio signal. The pitch of the received audio signal is alternately shifted up and then down, repeatedly over successive intervals of the audio signal, to produce a pitch swing signal which is then played over a speaker. The alternating pitch shifting prevents the buildup of regenerative feedback normally caused by acoustic vibrations coupling into the radio receiver.Type: ApplicationFiled: November 29, 2011Publication date: May 30, 2013Applicant: MOTOROLA SOLUTIONS, INC.Inventors: V. C. PRAKASH VK CHACKO, THEAN HAI OOI, KAR BOON OUNG, CHEAH HENG TAN, HUOY THYNG YOW
-
Publication number: 20130117014Abstract: Disclosed are various embodiments of multiple microphone based pitch detection. In one embodiment, a method includes obtaining a primary signal and a secondary signal associated with multiple microphones. A pitch value is determined based at least in part upon a level difference between the primary and secondary signals. In another embodiment, a system includes a plurality of microphones configured to provide a primary signal and a secondary signal. A level difference detector is configured to determine a level difference between the primary and secondary signals and a pitch identifier is configured to clip the primary and secondary signals based at least in part upon the level difference. In another embodiment, a method determines the presence of voice activity based upon a pitch prediction gain variation that is determined based at least in part upon a pitch lag.Type: ApplicationFiled: November 7, 2011Publication date: May 9, 2013Applicant: BROADCOM CORPORATIONInventors: Xianxian Zhang, Alfonsus Lunardhi
-
Publication number: 20130046533Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a region of the signal representing speech. The region can comprise a portion of a frame of the signal representing speech classified as a voiced frame. The region can be marked based on one or more pitch estimates for the region. A cord can be identified within the region based on occurrence of one or more events within the region of the signal. For example, the one or more events can comprise one or more glottal pulses. In such cases, cord can begin with onset of a first glottal pulse and extend to a point prior to an onset of a second glottal pulse. The cord may exclude a portion of the region of the signal prior to the onset of the second glottal pulse.Type: ApplicationFiled: October 19, 2012Publication date: February 21, 2013Applicant: RED SHIFT COMPANY, LLCInventor: RED SHIFT COMPANY, LLC
-
Publication number: 20130041657Abstract: A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and a representation of harmonic envelope at the estimated pitch. The estimated pitch and the representation of harmonic envelope may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.Type: ApplicationFiled: August 8, 2011Publication date: February 14, 2013Applicant: The Intellisis CorporationInventors: David C. BRADLEY, Rodney Gateau, Daniel S. Goldin, Robert N. Hilton, Nicholas K. Fisher
-
Publication number: 20130041656Abstract: A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and an estimated fractional chirp rate of the harmonics at the estimated pitch. The estimated pitch and the estimated fractional chirp rate may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.Type: ApplicationFiled: August 8, 2011Publication date: February 14, 2013Applicant: The Intellisis CorporationInventors: David C. BRADLEY, Daniel S. GOLDIN, Rodney GATEAU, Nicholas K. FISHER, Robert N. HILTON, Derrick R. ROOS, Eric WIEWIORA
-
Publication number: 20130024192Abstract: Disclosed is an information display system provided with: a signal analyzing unit which analyzes the audio signals obtained from a predetermined location and which generates ambient sound information regarding the sound generated at the predetermined location; and an ambient expression selection unit which selects an ambient expression which expresses the content of what a person is feeling from the sound generated at the predetermined location on the basis of the ambient sound information.Type: ApplicationFiled: March 28, 2011Publication date: January 24, 2013Applicant: NEC CORPORATIONInventors: Toshiyuki Nomura, Yuzo Senda, Kyota Higa, Takayuki Arakawa, Yasuyuki Mitsui
-
Publication number: 20120239389Abstract: Disclosed is an audio signal processing method comprising the steps of: receiving an audio signal containing current frame data; generating a first temporary output signal for the current frame when an error occurs in the current frame data, by carrying out frame error concealment with respect to the current frame data a random codebook; generating a parameter by carrying out one or more of short-term prediction, long-term prediction and a fixed codebook search based on the first temporary output signal; and memory updating the parameter for the next frame; wherein the parameter comprises one or more of pitch gain, pitch delay, fixed codebook gain and a fixed codebook.Type: ApplicationFiled: November 24, 2010Publication date: September 20, 2012Applicant: LG ELECTRONICS INC.Inventors: Hye Jeong Jeon, Dae Hwan Kim, Hong Goo Kang, Min Ki Lee, Byung Suk Lee, Gyu Hyeok Jeong
-
Publication number: 20120209598Abstract: A state detecting device includes an input unit that receives an input voice sound; an analyzer that calculates a feature parameter of each of plurality of frames extracted from the voice sound; a calculator that calculates the average of the feature parameters of the frames, determines a threshold on the basis of the average and statistical data representing relationships between other averages of other feature parameters obtained from a plurality of speakers and cumulative frequencies of the other feature parameters, and calculates an appearance frequency of a frame that is among the plurality of frames and whose feature parameter is larger than the threshold; a determining unit that determines, on the basis of the appearance frequency, a strained state of a vocal cord that has made the voice sound; and an output unit that outputs a result of the determination.Type: ApplicationFiled: January 23, 2012Publication date: August 16, 2012Applicant: FUJITSU LIMITEDInventors: Shoji HAYAKAWA, Naoshi MATSUO
-
Publication number: 20120185244Abstract: According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.Type: ApplicationFiled: January 26, 2012Publication date: July 19, 2012Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masahiro Morita, Javier Latorre, Takehiko Kagoshima
-
Publication number: 20120166187Abstract: A system and method for audio synthesizer utilizing frequency aperture cells (FAC) and frequency aperture arrays (FAA). In accordance with an embodiment, an audio processing system can be provided for the transformation of audio-band frequencies for musical and other purposes. In accordance with an embodiment, a single stream of mono, stereo, or multi-channel monophonic audio can be transformed into polyphonic music, based on a desired target musical note or set of multiple notes. At its core, the system utilizes an input waveform(s) (which can be either file-based or streamed) which is then fed into an array of filters, which are themselves optionally modulated, to generate a new synthesized audio output.Type: ApplicationFiled: August 26, 2011Publication date: June 28, 2012Applicant: SONIC NETWORK, INC.Inventors: James Edwin Van Buskirk, Jennifer Hruska, Jason Jordan, Al Joelson, Borislav Zlatkov
-
Publication number: 20120143601Abstract: The invention relates to a method for determining a quality indicator representing a perceived quality of an output signal of an audio system with respect to a reference signal. The reference signal and the output signal are processed and compared. The processing includes dividing the reference signal and the output signal into mutually corresponding time frames. Additionally, the processing includes scaling the intensity of the reference signal towards a fixed intensity level, and then performing measurements on time frames within the scaled reference signal for determining reference signal time frame characteristics. The intensity of the reference signal is then scaled from the fixed intensity level towards an intensity level related to the output signal. Further on in the method, the loudness of the output signal is scaled towards a fixed loudness level in the perceptual loudness domain. This scaling action uses the reference signal time frame characteristics.Type: ApplicationFiled: August 9, 2010Publication date: June 7, 2012Applicants: Nederlandse Organsatie Voor Toegespast-Natuurweten schappelijk Onderzoek TNO, KONINKLIJKE KPN N.V.Inventors: John Gerard Beerends, Jeroen Van Vugt
-
Publication number: 20120136655Abstract: A signal portion is extracted per frame having a specific duration from an input signal, thus generating a per-frame input signal. The per-frame input signal in the time domain is converted into a per-frame input signal in the frequency domain, thereby generating a spectral pattern of spectra. Peak spectra having peaks are detected in the spectral pattern. A harmonic spectrum is determined, in the peak spectra, having a harmonic structure showing a relationship between a fundamental pitch and a harmonic overtone.Type: ApplicationFiled: November 28, 2011Publication date: May 31, 2012Applicant: JVC KENWOOD Corporation a corporation of JapanInventor: Takaaki YAMABE
-
Publication number: 20120116756Abstract: In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.Type: ApplicationFiled: November 10, 2010Publication date: May 10, 2012Applicant: Sony Computer Entertainment Inc.Inventor: Ozlem Kalinli
-
Publication number: 20120109645Abstract: There is provided a unique signal processing technique for localizing and characterizing each of a number of differently located acoustic sources. Specifically there is provided a method for auditory segregation of multiple voice inputs comprising the steps of: receiving a plurality of voice input signals from different source locations; filtering said voice input signals with head related transfer functions (HRTF) using a digital signal processor (DSP) thereby assigning the voice input signals to different locations in virtual auditory space; and changing the HRTF filtered voice input signals in two dimensions, wherein pitch is changed and the signal is filtered with different filters emulating vocal tracts of different sizes thereby further segregating the voice input signals from each other.Type: ApplicationFiled: June 23, 2010Publication date: May 3, 2012Applicant: LIZARD TECHNOLOGYInventors: John Hallam, Jakob Christensen-Dalsgaard
-
Publication number: 20120101815Abstract: Described is a technology by which a user hums, sings or otherwise plays a user-provided rendition of a ringtone (or ringback tone) through a mobile telephone to a ringtone search service (e.g., a WAP, interactive voice response or SMS-based search platform). The service matches features of the user's rendition against features of actual ringtones to determine one or more matching candidate ringtones for downloading. Features may include pitch contours (up or down), pitch intervals and durations of notes. Matching candidates may be ranked based on the determined similarity, possibly in conjunction with weighting criterion such as the popularity of the ringtone and/or the importance of the matched part. The candidate set may be augmented with other ringtones independent of the matching, such as the most popular ones downloaded by other users, ringtones from similar artists, and so forth.Type: ApplicationFiled: December 29, 2011Publication date: April 26, 2012Applicant: Microsoft CorporationInventors: Lie LU, Yutao XIE, Sing XIE, Jiafan OU, Ruihao WENG
-
Publication number: 20120101814Abstract: Various techniques are disclosed for improving packet loss concealment to reduce artifacts by using audio character measures of the audio signal. These techniques include attenuation to a noise fill instead of attenuation to silence, varying how long to wait before attenuating the extrapolation, varying the rate of attenuation of the extrapolation, attenuating periodic extrapolation at a different rate than non-periodic extrapolation, and performing period extrapolation on successively longer fill data based on the audio character measures, adjusting weighting between periodic and non-periodic extrapolation based on the audio character measures, and adjusting weighting between periodic extrapolation and non-periodic extrapolation non-linearly.Type: ApplicationFiled: October 25, 2010Publication date: April 26, 2012Applicant: POLYCOM, INC.Inventor: Eric David Elias
-
Publication number: 20120089391Abstract: Methods for estimating speech model parameters are disclosed. For pulsed parameter estimation, a speech signal is divided into multiple frequency bands or channels using bandpass filters. Channel processing reduces sensitivity to pole magnitudes and frequencies and reduces impulse response time duration to improve pulse location and strength estimation performance. These methods are useful for high quality speech coding and reproduction at various bit rates for applications such as satellite and cellular voice communication.Type: ApplicationFiled: October 7, 2011Publication date: April 12, 2012Applicant: Digital Voice Systems, Inc.Inventor: Daniel W. Griffin
-
Publication number: 20120072209Abstract: An electronic device for estimating a pitch lag is described. The electronic device includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a current frame. The electronic device also obtains a residual signal based on the current frame. The electronic device additionally determines a set of peak locations based on the residual signal. Furthermore, the electronic device obtains a set of pitch lag candidates based on the set of peak locations. The electronic device also estimates a pitch lag based on the set of pitch lag candidates.Type: ApplicationFiled: September 8, 2011Publication date: March 22, 2012Applicant: QUALCOMM IncorporatedInventors: Venkatesh Krishnan, Stephane Pierre Villette
-
Publication number: 20120072208Abstract: An electronic device for determining a set of pitch cycle energy parameters is described. The electronic device includes a processor and executable instructions stored in memory. The electronic device obtains a frame, a set of filter coefficients and a residual signal based on the frame and the set of filter coefficients. The electronic device determines a set of peak locations based on the residual signal and segments the residual signal such that each segment includes one peak. The electronic device determines a first set of pitch cycle energy parameters based on a frame region between two consecutive peak locations and maps regions between peaks in the residual signal to regions between peaks in a synthesized excitation signal to produce a mapping. The electronic device determines a second set of pitch cycle energy parameters based on the first set of pitch cycle energy parameters and the mapping.Type: ApplicationFiled: September 8, 2011Publication date: March 22, 2012Applicant: QUALCOMM IncorporatedInventors: Venkatesh Krishnan, Stephane Pierre Villette
-
Publication number: 20120058747Abstract: A method for communication and for displaying an interactive avatar or hologram corresponding to a remote party.Type: ApplicationFiled: September 8, 2010Publication date: March 8, 2012Inventors: James Yiannios, Mourad Ben Ayed
-
Publication number: 20120053933Abstract: According to one embodiment, a first storage unit stores n band noise signals obtained by applying n band-pass filters to a noise signal. A second storage unit stores n band pulse signals. A parameter input unit inputs a fundamental frequency, n band noise intensities, and a spectrum parameter. A extraction unit extracts for each pitch mark the n band noise signals while shifting. An amplitude control unit changes amplitudes of the extracted band noise signals and band pulse signals in accordance with the band noise intensities. A generation unit generates a mixed sound source signal by adding the n band noise signals and the n band pulse signals. A generation unit generates the mixed sound source signal generated based on the pitch mark. A vocal tract filter unit generates a speech waveform by applying a vocal tract filter using the spectrum parameter to the generated mixed sound source signal.Type: ApplicationFiled: March 18, 2011Publication date: March 1, 2012Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima
-
Publication number: 20120022859Abstract: An automatic marking method for Karaoke vocal accompaniment is provided. In the method, pitch, beat position and volume of a singer are compared with the original pitch, beat position and volume of the theme of a song to generate a score of pitch, a score of beat and a score of emotion respectively, so as to obtain a weighted total score in a weighted marking method. By using the method, the pitch, beat position and volume error of each section of the song sung by the singer can be exactly worked out, and a pitch curve and a volume curve can be displayed, so that the singer can learn which part is sung incorrectly and which part needs to be enhanced. The present invention also has the advantages of dual effects of teaching and entertainment, high practicability and technical advancement.Type: ApplicationFiled: April 7, 2009Publication date: January 26, 2012Inventor: Wen-Hsin Lin
-
Publication number: 20120004908Abstract: A voice recognition terminal executes a local voice recognition process and utilizes an external center voice recognition process. The terminal includes: a voice message synthesizing element for synthesizing at least one of a voice message to be output from a speaker according to the external center voice recognition process and a voice message to be output from the speaker according to the local voice recognition process so as to distinguish between characteristics of the voice message to be output from the speaker according to the external center voice recognition process and characteristics of the voice message to be output from the speaker according to the local voice recognition process; and a voice output element for outputting a synthesized voice message from the speaker.Type: ApplicationFiled: June 28, 2011Publication date: January 5, 2012Applicant: DENSO CORPORATIONInventors: Kunio YOKOI, Kazuhisa SUZUKI, Masayuki TAKAMI, Naoyori TANZAWA
-
Patent number: 8063809Abstract: A transient signal encoding method and device, decoding method and device, and processing system, where the transient signal encoding method includes: obtaining a reference sub-frame where a maximal time envelope having a maximal amplitude value is located from time envelopes of all sub-frames of an input transient signal; adjusting an amplitude value of the time envelope of each sub-frame before the reference sub-frame in such a way that a first difference is greater than a preset first threshold, in which the first difference is a difference between the amplitude value of the time envelope of each sub-frame before the reference sub-frame and the amplitude value of the maximal time envelope; and writing the adjusted time envelope into bitstream.Type: GrantFiled: June 29, 2011Date of Patent: November 22, 2011Assignee: Huawei Technologies Co., Ltd.Inventors: Zexin Liu, Longyin Chen, Lei Miao, Chen Hu, Wei Xiao, Herve Marcel Taddei, Qing Zhang
-
Publication number: 20110282658Abstract: The present invention relates to co-channel audio source separation. In one embodiment a first frequency-related representation of plural regions of the acoustic signal is prepared over time, and a two-dimensional transform of plural two-dimensional localized regions of the first frequency-related representation, each less than an entire frequency range of the first frequency related representation, is obtained to provide a two-dimensional compressed frequency-related representation with respect to each two dimensional localized region. For each of the plural regions, at least one pitch is identified. The pitch from the plural regions is processed to provide multiple pitch estimates over time. In another embodiment, a mixed acoustic signal is processed by localizing multiple time-frequency regions of a spectrogram of the mixed acoustic signal to obtain one or more acoustic properties.Type: ApplicationFiled: September 3, 2010Publication date: November 17, 2011Applicant: Massachusetts Institute of TechnologyInventors: Tianyu Wang, Thomas R. Quatieri, JR.
-
Publication number: 20110276324Abstract: An enhancement system extracts pitch from a processed speech signal. The system estimates the pitch of voiced speech by deriving filter coefficients of an adaptive filter and using the obtained filter coefficients to derive pitch. The pitch estimation may be enhanced by using various techniques to condition the input speech signal, such as spectral modification of the background noise and the speech signal, and/or reduction of the tonal noise from the speech signal.Type: ApplicationFiled: May 11, 2011Publication date: November 10, 2011Inventors: Rajeev Nongpiur, Phillip A. Hetherington
-
Publication number: 20110276323Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.Type: ApplicationFiled: May 6, 2010Publication date: November 10, 2011Applicant: Senam Consulting, Inc.Inventor: Serge Olegovich Seyfetdinov
-
Publication number: 20110257965Abstract: Encoding a sequence of digital speech samples into a bit stream includes dividing the digital speech samples into one or more frames and computing a set of model parameters for the frames. The set of model parameters includes at least a first parameter conveying pitch information. The voicing state of a frame is determined and the first parameter conveying pitch information is modified to designate the determined voicing state of the frame, if the determined voicing state of the frame is equal to one of a set of reserved voicing states. The model parameters are quantized to generate quantizer bits which are used to produce the bit stream.Type: ApplicationFiled: June 27, 2011Publication date: October 20, 2011Applicant: DIGITAL VOICE SYSTEMS, INC.Inventor: John C. Hardwick
-
Publication number: 20110251840Abstract: Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at a portable computing device (such as a mobile phone, personal digital assistant, laptop computer, notebook computer, pad-type computer or netbook) in accord with pitch correction settings. In some cases, pitch correction settings include a score-coded melody and/or harmonies supplied with, or for association with, the lyrics and backing tracks.Type: ApplicationFiled: April 12, 2011Publication date: October 13, 2011Inventors: Perry R. Cook, Ari Lazier, Tom Lieber, Turner E. Kirk
-
Publication number: 20110251841Abstract: Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.Type: ApplicationFiled: April 12, 2011Publication date: October 13, 2011Inventors: Perry R. Cook, Ari Lazier, Tom Lieber, Turner E. Kirk
-
Publication number: 20110251842Abstract: Using signal processing techniques described herein, pitch detection and correction of a user's vocal performance can be performed continuously and in real-time with respect to the audible rendering of the backing track at the handheld or portable computing device. In some implementations, pitch detection builds on time-domain pitch correction techniques that employ average magnitude difference function (AMDF) or autocorrelation-based techniques together with zero-crossing and/or peak picking techniques to identify differences between pitch of a captured vocal signal and score-coded target pitches. Based on detected differences, pitch correction based on pitch synchronous overlapped add (PSOLA) and/or linear predictive coding (LPC) techniques allow captured vocals to be pitch shifted in real-time to “correct” notes in accord with pitch correction settings that code score-coded melody targets and harmonies.Type: ApplicationFiled: April 12, 2011Publication date: October 13, 2011Inventors: Perry R. Cook, Ari Lazier, Tom Lieber
-
Publication number: 20110246188Abstract: A music sound generation system is formed with a high sound quality and with a small size using a large-capacity NAND flash memory for storing music sound data. Music sound data is divided into N pitch groups and stored into N different storage modules as being divided in these storage modules. A sound generation command classification unit (3000) classifies sound generation commands provided from an external unit into N sound generation command groups. A read command unit in each access module reads data from a storage module based on the sound generation command group. This structure enables music sound data to be read from a plurality of storage modules in parallel.Type: ApplicationFiled: May 26, 2010Publication date: October 6, 2011Inventor: Masahiro Nakanishi
-
Publication number: 20110224977Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.Type: ApplicationFiled: September 14, 2010Publication date: September 15, 2011Applicant: HONDA MOTOR CO., LTD.Inventors: Kazuhiro NAKADAI, Takuma OTSUKA, Hiroshi OKUNO
-
Publication number: 20110218800Abstract: The present invention relates to a method and apparatus for obtaining a pitch gain, and a coder and a decoder. The method includes: obtaining information about an input signal; and obtaining a pitch gain corresponding to the information about the input signal according to the correspondence between the signal information and the pitch gain. The embodiments of the present invention obtain the corresponding pitch gain according to the signal information by using the obtained correspondence between the signal information and the pitch gain, and the pitch gain is applicable to the coder and the decoder, thus making it unnecessary for the coder to transmit the pitch gain to the decoder and solving the problem of bit overhead. The embodiments of the present invention determine the pitch gain adaptively according to the signal information, avoid consumption of extra bits for quantizing the pitch gain, avoid impact on the coding performance, and improve the compression ratio.Type: ApplicationFiled: May 17, 2011Publication date: September 8, 2011Applicant: Huawei Technologies Co., Ltd.Inventors: Dejun Zhang, Lei Miao, Jianfeng Xu, Fengyan Qi, Qing Zhang, Lixiong Li, Fuwei Ma
-
Publication number: 20110196674Abstract: A spectrum coding apparatus capable of performing coding at a low bit rate and with high quality is disclosed. This apparatus is provided with a section that performs the frequency transformation of a first signal and calculates a first spectrum, a section that converts the frequency of a second signal and calculates a second spectrum, a section that estimates the shape of the second spectrum in a band of FL?k<FH using a filter having the first spectrum in a band of 0?k<FL as an internal state and a section that codes an outline of the second spectrum determined based on a coefficient indicating the characteristic of the filter at this time.Type: ApplicationFiled: April 17, 2011Publication date: August 11, 2011Applicant: PANASONIC CORPORATIONInventor: Masahiro Oshikiri
-
Publication number: 20110196673Abstract: An electronic device for reconstructing a lost packet in a Sub-Band Coding (SBC) decoder is described. The electronic device includes a processor and instructions stored in memory. The electronic device detects a lost packet, obtains a zero-input response of a synthesis filter bank and obtains a coarse pitch estimate. The electronic device also obtains a fine pitch estimate based on the zero-input response and the coarse pitch estimate. The electronic device selects a last pitch period based on the fine pitch estimate and uses samples from the last pitch period for the lost packet.Type: ApplicationFiled: January 26, 2011Publication date: August 11, 2011Applicant: QUALCOMM IncorporatedInventors: Amit Sharma, Jeremy P. Toman, Hyun Jin Park, Sang-Uk Ryu
-
Publication number: 20110191102Abstract: In some embodiments, a processor-readable medium stores code representing instructions to cause a processor to receive an input signal having a first component and a second component. An estimate of the first component of the input signal is calculated based on an estimate of a pitch of the first component of the input signal. An estimate of the input signal is calculated based on the estimate of the first component of the input signal and an estimate of the second component of the input signal. The estimate of the first component of the input signal is modified based on a scaling function to produce a reconstructed first component of the input signal. The scaling function is a function of at least one of the input signal, the estimate of the first component of the input signal, the estimate of the second component of the input signal, or a residual signal.Type: ApplicationFiled: January 31, 2011Publication date: August 4, 2011Applicant: UNIVERSITY OF MARYLAND, COLLEGE PARKInventors: Carol Espy-Wilson, Srikanth Vishnubhotla
-
Publication number: 20110184732Abstract: A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.Type: ApplicationFiled: April 4, 2011Publication date: July 28, 2011Applicant: DITECH NETWORKS, INC.Inventor: Mahesh Godavarti
-
Publication number: 20110153317Abstract: An apparatus for wireless communications includes a processing system. The processing system is configured to receive an input sound stream of a user, split the input sound stream into a plurality of frames, classify each of the frames as one selected from the group consisting of a non-speech frame and a speech frame, determine a pitch of each of the frames in a subset of the speech frames, and identify a gender of the user from the determined pitch. To determine the pitch, the processing system is configured to filter the speech frames to compute an error signal, compute an autocorrelation of the error signal, find a maximum autocorrelation value, and set the pitch to an index of the maximum autocorrelation value.Type: ApplicationFiled: December 23, 2009Publication date: June 23, 2011Applicant: QUALCOMM INCORPORATEDInventors: Yinian Mao, Gene Marsh
-
Publication number: 20110144982Abstract: Vocal musical performances may be captured and continuously pitch-corrected at a mobile device for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. In some cases, such pitch correction settings code a particular key or scale for the vocal performance or for portions thereof. In some cases, pitch correction settings include a score-coded melody sequence of note targets supplied with, or for association with, the lyrics and/or backing track. In some cases, pitch correction settings are dynamically variable based on gestures captured at a user interface.Type: ApplicationFiled: September 4, 2010Publication date: June 16, 2011Inventors: Spencer Salazar, Rebecca A. Fiebrink, Ge Wang, Mattias Ljungström, Jeffrey C. Smith, Perry R. Cook
-
Publication number: 20110144981Abstract: Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.Type: ApplicationFiled: September 4, 2010Publication date: June 16, 2011Inventors: Spencer Salazar, Rebecca A. Fiebrink, Ge Wang, Mattias Ljungström, Jeffrey C. Smith, Perry R. Cook
-
Publication number: 20110125491Abstract: The perceived quality of a speech signal is improved by estimating the average power of first and second signal components and applying a first gain factor to the second signal components to generate adjusted second signal components. The first gain factor is selected such that on application of the first gain factor to the second signal components, the ratio of the average power of the first signal components to the average power of the adjusted second signal components would be a first predetermined value, the first predetermined value being such as to inhibit perceptual distortion of the improved speech signal.Type: ApplicationFiled: November 23, 2009Publication date: May 26, 2011Inventors: Rogerio Guedes Alves, Kuan-Chieh Yen, Michael Christopher Vartanian, Sameer Arun Gadre
-
Publication number: 20110125492Abstract: The perceived quality of a narrowband speech signal truncated from a wideband speech signal is improved by generating in a third frequency band third speech components matching first speech components in a first frequency band of the narrowband signal, and generating in a fourth frequency band fourth speech components matching second speech components in a second frequency band of the narrowband signal. A first gain factor is applied to the third speech components to generate adjusted third speech components, and a second gain factor is applied to the fourth speech components to generate adjusted fourth speech components, the gain factors being selected such that the ratios of the average powers of the adjusted third and fourth speech components to the average power of the first speech components are predetermined values.Type: ApplicationFiled: November 23, 2009Publication date: May 26, 2011Applicant: CAMBRIDGE SILICON RADIO LIMITEDInventors: Rogerio Guedes Alves, Kuan-Chieh Yen, Michael Christopher Vartanian, Sameer Arun Gadre
-
Publication number: 20110125493Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.Type: ApplicationFiled: January 31, 2011Publication date: May 26, 2011Inventors: Yoshifumi Hirose, Takahiro Kamai
-
Publication number: 20110099005Abstract: A framing method and apparatus are disclosed to overcome inconsistency of gains between sub-frames caused by simple average framing in the prior art. The method includes: obtaining the Linear Prediction Coding (LPC) order and the pitch of the signal; removing the samples inapplicable to Long-Term Prediction (LTP) synthesis according to the LPC prediction order and the pitch; and splitting the remaining samples of the signal into several sub-frames. The technical solution under the present invention is applicable to the multimedia speech coding field.Type: ApplicationFiled: December 30, 2010Publication date: April 28, 2011Inventors: Dejun ZHANG, Fengyan Qi, Lei Miao, Jianfeng Xu, Qing Zhang, Lixiong Li, Fuwei Ma
-
Publication number: 20110087489Abstract: The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.Type: ApplicationFiled: December 21, 2010Publication date: April 14, 2011Inventor: David A. Kapilow
-
Publication number: 20110087488Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.Type: ApplicationFiled: December 16, 2010Publication date: April 14, 2011Inventors: Ryo Morinaka, Takehiko Kagoshima
-
Publication number: 20110066426Abstract: A speech recognition apparatus and method for real-time speaker adaptation are provided. The speech recognition apparatus may estimate a pitch of a speech section from an inputted speech signal, extract a speech feature for speech recognition based on the estimated pitch, and perform speech recognition with respect to the speech signal based on the speech feature. The speech recognition apparatus may be adaptively normalized depending on a speaker. Thus, the speech recognition apparatus may extract a speech feature for speech recognition, and may improve a performance of speech recognition based on the extracted speech feature.Type: ApplicationFiled: July 15, 2010Publication date: March 17, 2011Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventor: Gil Ho LEE