Interpolation Patents (Class 704/265)

Transforming voice signals to compensate for effects from a facial covering

Patent number: 12361960

Abstract: In one example embodiment, audio characteristics of audio signals are adjusted by a first machine learning model to reduce effects of a facial covering and produce adjusted audio signals. The audio signals correspond to resulting voice signals produced from the facial covering affecting original voice signals. Speech characteristics are predicted for the adjusted audio signals by a second machine learning model. Transformed audio signals corresponding to the original voice signals are produced based on the adjusted audio signals and predicted speech characteristics.

Type: Grant

Filed: April 25, 2022

Date of Patent: July 15, 2025

Assignee: CISCO TECHNOLOGY, INC.

Inventors: Eric Y. Chen, Shamim S. Pirzada, Cullen Frishman Jennings
Token confidence scores for automatic speech recognition

Patent number: 12223948

Abstract: Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.

Type: Grant

Filed: February 3, 2022

Date of Patent: February 11, 2025

Assignee: SoundHound, Inc.

Inventors: Pranav Singh, Saraswati Mishra, Eunjee Na
Data compression for a neural network

Patent number: 12141689

Abstract: Systems and methods for generating a representative value of a data set by first compressing a portion of values in the data set to determine a first common value and further compressing a subset of the portion of values to determine a second common value. The representative value is generated by taking the difference between the first common value and the second common value, wherein the representative value corresponds to a mathematical relationship between the first and second common values and each value within the subset of the portion of values. The representative value requires less storage than the first and second common values.

Type: Grant

Filed: March 18, 2019

Date of Patent: November 12, 2024

Assignee: NVIDIA Corporation

Inventor: David Rigel Garcia Garcia
Adaptive text-to-speech synthesis for dynamic advertising insertion in podcasts and broadcasts

Patent number: 12106330

Abstract: Method and system for generation of audio clip replicating the voice of a human speaker that may be dynamically inserted as an audio clip in digitally requested media files, such as podcasts, streams and broadcasts. Using a sample of speech from a previously-recorded audio file, a streaming audio source or a broadcast, a text-to-speech synthesis engine mimicking or cloning the voice present in the audio input is used to generate novel audio clip which is inserted in the requested media file.

Type: Grant

Filed: November 11, 2021

Date of Patent: October 1, 2024

Inventors: Alberto Betella, Benjamin Richardson
System and method for voice morphing in a data annotator tool

Patent number: 12086564

Abstract: A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift. Labeling the morphed speech comprises at least one or more of transcribing the morphed speech, identifying a gender of the speaker, identifying an accent of the speaker, and identifying a noise type of the morphed speech.

Type: Grant

Filed: November 30, 2021

Date of Patent: September 10, 2024

Assignee: SoundHound AI IP, LLC.

Inventor: Dylan H. Ross
Audio encoder and decoder using a frequency domain processor with full-band gap filling and a time domain processor

Patent number: 12080310

Abstract: An audio encoder for encoding an audio signal has: a first encoding processor for encoding a first audio signal portion in a frequency domain, having: a time frequency converter for converting the first audio signal portion into a frequency domain representation; an analyzer for analyzing the frequency domain representation to determine first spectral portions to be encoded with a first spectral resolution and second regions to be encoded with a second resolution; and a spectral encoder for encoding the first spectral portions with the first spectral resolution and encoding the second portions with the second resolution; a second encoding processor for encoding a second different audio signal portion in the time domain; a controller for analyzing and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion is the second audio signal portion encoded in the time domain; and an encoded signal former for forming an encoded audio signal havi

Type: Grant

Filed: June 1, 2021

Date of Patent: September 3, 2024

Assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.

Inventors: Sascha Disch, Martin Dietz, Markus Multrus, Guillaume Fuchs, Emmanuel Ravelli, Matthias Neusinger, Markus Schnell, Benjamin Schubert, Bernhard Grill
Audio encoder and decoder using a frequency domain processor with full-band gap filling and a time domain processor

Patent number: 11929084

Abstract: An audio encoder for encoding an audio signal has: a first encoding processor for encoding a first audio signal portion in a frequency domain, having: a time frequency converter for converting the first audio signal portion into a frequency domain representation; an analyzer for analyzing the frequency domain representation to determine first spectral portions to be encoded with a first spectral resolution and second regions to be encoded with a second resolution; and a spectral encoder for encoding the first spectral portions with the first spectral resolution and encoding the second portions with the second resolution; a second encoding processor for encoding a second different audio signal portion in the time domain; a controller for analyzing and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion is the second audio signal portion encoded in the time domain; and an encoded signal former for forming an encoded audio signal havi

Type: Grant

Filed: January 23, 2023

Date of Patent: March 12, 2024

Assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.

Inventors: Sascha Disch, Martin Dietz, Markus Multrus, Guillaume Fuchs, Emmanuel Ravelli, Matthias Neusinger, Markus Schnell, Benjamin Schubert, Bernhard Grill
Method for outputting blend shape value, storage medium, and electronic device

Patent number: 11847726

Abstract: A method for outputting a blend shape value includes: performing feature extraction on obtained target audio data to obtain a target audio feature vector; inputting the target audio feature vector and a target identifier into an audio-driven animation model; inputting the target audio feature vector into an audio encoding layer, determining an input feature vector of a next layer at a (2t?n)/2 time point based on an input feature vector of a previous layer between a t time point and a t?n time point, determining a feature vector having a causal relationship with the input feature vector of the previous layer as a valid feature vector, outputting sequentially target-audio encoding features, and inputting the target identifier into a one-hot encoding layer for binary vector encoding to obtain a target-identifier encoding feature; and outputting a blend shape value corresponding to the target audio data.

Type: Grant

Filed: July 22, 2022

Date of Patent: December 19, 2023

Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.

Inventors: Huapeng Sima, Cuicui Tang, Zheng Liao
Context-aware prosody correction of edited speech

Patent number: 11830481

Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.

Type: Grant

Filed: November 30, 2021

Date of Patent: November 28, 2023

Assignee: Adobe Inc.

Inventors: Maxwell Morrison, Zeyu Jin, Nicholas Bryan, Juan Pablo Caceres Chomali, Lucas Rencker
Method for building database in which voice signals and texts are matched and a system therefor, and a computer-readable recording medium recording the same

Patent number: 11714788

Abstract: According to an embodiment, a method of building a database in which voice signals match texts comprises providing a captcha-purposed voice signal including a first voice signal matched with a first text and a second voice signal matched with no text, sending a request for a first input text and a second input text for the captcha-purposed voice signal, when the first input text and the second input text are received, comparing the first text with the first input text, and when the first text is identical to the first input text, matching the second voice signal with the second input text and storing the match. Embodiments of the present invention may be related to artificial intelligence (Al) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.

Type: Grant

Filed: September 17, 2019

Date of Patent: August 1, 2023

Assignee: LG ELECTRONICS INC.

Inventor: Dami Kim
Speech style transfer

Patent number: 11538455

Abstract: Computer-implemented methods for speech synthesis are provided. A speech synthesizer may be trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker. The speech synthesizer may be trained by time-stamped phoneme sequences, pitch contour data and speaker identification data. The speech synthesizer may include a voice modeling neural network and a conditioning neural network.

Type: Grant

Filed: February 14, 2019

Date of Patent: December 27, 2022

Assignee: Dolby Laboratories Licensing Corporation

Inventors: Cong Zhou, Michael Getty Horgan, Vivek Kumar, Jaime H. Morales, Cristina Michel Vasco
Method and apparatus for generating dialogue model

Patent number: 11537798

Abstract: Embodiments of the present disclosure relate to a method and apparatus for generating a dialogue model. The method may include: acquiring a corpus sample set, a corpus sample including input information and target response information; classifying corpus samples in the corpus sample set, setting discrete hidden variables for the corpus samples based on a classification result to generate a training sample set, a training sample including the input information, the target response information, and a discrete hidden variable; and training a preset neural network using the training sample set to obtain the dialogue model, the dialogue model being used to represent a corresponding relationship between inputted input information and outputted target response information.

Type: Grant

Filed: June 8, 2020

Date of Patent: December 27, 2022

Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.

Inventors: Siqi Bao, Huang He, Junkun Chen, Fan Wang, Hua Wu, Jingzhou He
Synthesis of speech from text in a voice of a target speaker using neural networks

Patent number: 11488575

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

Type: Grant

Filed: May 17, 2019

Date of Patent: November 1, 2022

Assignee: Google LLC

Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick Nguyen
Acoustic and visual enhancement methods for training and learning

Patent number: 11457313

Abstract: An enhancement method for learning is defined by a combination of intelligent application acoustic signals and filters to enhance learning. Data from a training database can be used to modify the enhancement to a pre-recorded audio presentation portion of the learning materials. The method preferably optimizes the learning materials based on the profile of the learner and includes audio or video enhancements to improve retention of the learning material.

Type: Grant

Filed: September 4, 2019

Date of Patent: September 27, 2022

Assignee: SOCIETY OF CABLE TELECOMMUNICATIONS ENGINEERS, INC.

Inventors: Mark Dzuban, Christopher Bastian, Margaret Bernroth
Generating audio using neural networks

Patent number: 11386914

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes, for each of the time steps: providing a current sequence of audio data as input to a convolutional subnetwork, wherein the current sequence comprises the respective audio sample at each time step that precedes the time step in the output sequence, and wherein the convolutional subnetwork is configured to process the current sequence of audio data to generate an alternative representation for the time step; and providing the alternative representation for the time step as input to an output layer, wherein the output layer is configured to: process the alternative representation to generate an output that defines a score distribution over a plurality of possible audio samples for the time step.

Type: Grant

Filed: September 14, 2020

Date of Patent: July 12, 2022

Assignee: DeepMind Technologies Limited

Inventors: Aaron Gerard Antonius van den Oord, Sander Etienne Lea Dieleman, Nal Emmerich Kalchbrenner, Karen Simonyan, Oriol Vinyals
Method and apparatus for concealing frame error and method and apparatus for audio decoding

Patent number: 10714097

Abstract: A frame error concealment (FEC) method is provided. The method includes: selecting an FEC mode based on states of a current frame and a previous frame of the current frame in a time domain signal generated after time-frequency inverse transform processing; and performing corresponding time domain error concealment processing on the current frame based on the selected FEC mode, wherein the current frame is an error frame or the current frame is a normal frame when the previous frame is an error frame.

Type: Grant

Filed: October 5, 2018

Date of Patent: July 14, 2020

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ho-sang Sung, Nam-suk Lee
Dynamically generated audio in advertisements

Patent number: 10643248

Abstract: A content server provides a client device with audio content including an audio advertisement, which is provided in response to receiving a request for digital audio content from a client device associated with a user. The content server obtains user information about the user and retrieves advertisement text received from an advertiser, which are used to generate a personalized text advertisement. The personalized text advertisement is generated according to an advertisement template specifying an ordered combination of text components. The personalized text advertisement includes the received advertisement text, user information text selected from the obtained user information, and template text. The client device is provided with an advertisement based on the personalized text advertisement and is configured to play an audio version of the personalized text advertisement. The audio advertisement is generated using a text-to-speech algorithm at the client device or at the content server.

Type: Grant

Filed: April 2, 2018

Date of Patent: May 5, 2020

Assignee: Pandora Media, LLC

Inventors: Shriram Bharath, Jacek Adam Krawczyk, Christopher Irwin
Multipoint offset sampling deformation

Patent number: 10559109

Abstract: A skin deformation system for use in computer animation is disclosed. The skin deformation system accesses the skeleton structure of a computer generated character, and accesses a user's identification of features of the skeleton structure that may affect a skin deformation. The system also accesses the user's identification of a weighting strategy. Using the identified weighting strategy and identified features of the skeleton structure, the skin deformation system determines the degree to which each feature identified by the user may influence the deformation of a skin of the computer generated character. The skin deformation system may incorporate secondary operations including bulge, slide, scale and twist into the deformation of a skin. Information relating to a deformed skin may be stored by the skin deformation system so that the information may be used to produce a visual image for a viewer.

Type: Grant

Filed: October 2, 2017

Date of Patent: February 11, 2020

Assignee: DreamWorks Animation L.L.C.

Inventors: Paul Carmen DiLorenzo, Matthew Christopher Gong, Arthur D. Gregory
Real-time voice masking in a computer network

Patent number: 10141008

Abstract: A voice signal may be adjusted to mask traits such as the gender of a speaker by separating source and filter components of a voice signal using cepstral analysis, adjusting the components based on pitch and formant parameters, and synthesizing a modified signal. Features are disclosed to support real-time voice masking in a computer network by limiting computational complexity and reducing delays in processing and transmission while maintaining signal quality.

Type: Grant

Filed: February 26, 2018

Date of Patent: November 27, 2018

Assignee: Interviewing.io, Inc.

Inventors: Andrew Tatanka Marsh, Steven Young Yi
Real-time voice masking in a computer network

Patent number: 9947341

Abstract: A voice signal may be adjusted to mask traits such as the gender of a speaker by separating source and filter components of a voice signal using cepstral analysis, adjusting the components based on pitch and formant parameters, and synthesizing a modified signal. Features are disclosed to support real-time voice masking in a computer network by limiting computational complexity and reducing delays in processing and transmission while maintaining signal quality.

Type: Grant

Filed: January 18, 2017

Date of Patent: April 17, 2018

Assignee: Interviewing.io, Inc.

Inventors: Andrew Tatanka Marsh, Steven Young Yi
Speech synthesis apparatus, method, and computer-readable medium that generates synthesized speech having prosodic feature

Patent number: 9905219

Abstract: According to one embodiment, a speech synthesis apparatus is provided with generation, normalization, interpolation and synthesis units. The generation unit generates a first parameter using a prosodic control dictionary of a target speaker and one or more second parameters using a prosodic control dictionary of one or more standard speakers based on language information for an input text. The normalization unit normalizes the one or more second parameters based a normalization parameter. The interpolation unit interpolates the first parameter and the one or more normalized second parameters based on weight information to generate a third parameter and the synthesis unit generates synthesized speech using the third parameter.

Type: Grant

Filed: August 16, 2013

Date of Patent: February 27, 2018

Assignee: Kabushiki Kaisha Toshiba

Inventors: Kentaro Tachibana, Takehiko Kagoshima, Masahiro Morita
Off-band resolution emhancement

Patent number: 9811881

Abstract: A method of enhancing an image includes increasing sampling rate of a first image to a target sampling rate to form an interpolated image. The method also includes processing a second image through a high pass filter to form a high pass features image, wherein the second image is at the target sampling rate. The method also includes extracting detail from the high pass features image relevant to the first image, merging the detail from the high pass features image with the interpolated image to form a prediction image at the target sampling rate, and outputting the prediction image.

Type: Grant

Filed: December 9, 2015

Date of Patent: November 7, 2017

Assignee: Goodrich Corporation

Inventors: Suhail Shabbir Saquib, Christopher Gittins
Multipoint offset sampling deformation

Patent number: 9786083

Abstract: A skin deformation system for use in computer animation is disclosed. The skin deformation system accesses the skeleton structure of a computer generated character, and accesses a user's identification of features of the skeleton structure that may affect a skin deformation. The system also accesses the user's identification of a weighting strategy. Using the identified weighting strategy and identified features of the skeleton structure, the skin deformation system determines the degree to which each feature identified by the user may influence the deformation of a skin of the computer generated character. The skin deformation system may incorporate secondary operations including bulge, slide, scale and twist into the deformation of a skin. Information relating to a deformed skin may be stored by the skin deformation system so that the information may be used to produce a visual image for a viewer.

Type: Grant

Filed: October 7, 2011

Date of Patent: October 10, 2017

Assignee: DreamWorks Animation L.L.C.

Inventors: Paul Carmen Dilorenzo, Matthew Christopher Gong, Arthur D. Gregory
Spectrum analysis and plant diagnostic tool for communications systems

Patent number: 9686594

Abstract: A system, method, and apparatus to allow an operator of a broadcast communication system, such as a cable television or satellite television service to provide some examples, to diagnose performance of this communication system remotely. The operator of a first communication device, such as a cable modem termination system (CMTS) to provide an example, may remotely diagnosis performance problems, or potential performance problems, occurring at a second communication device, such as a cable modem (CM) to provide an example, or a group of second communication devices. For example, the operator of the first communication device may view a spectrum analysis of communication signals being routed to, processed by, and/or provided by the second communication device, or group of second communication devices, to diagnose the performance problems, or the potential performance problems, in real time.

Type: Grant

Filed: March 30, 2012

Date of Patent: June 20, 2017

Assignee: Avago Technologies General IP (Singapore) Pte. Ltd.

Inventors: Ramon Alejandro Gomez, Leonard Dauphinee, Donald G. McMullin, Harold Raymond Whitehead
Context-aware speech processing

Patent number: 9502029

Abstract: Described herein are systems and methods for context-aware speech processing. A speech context is determined based on context data associated with a user uttering speech. The speech context and the speech uttered in that speech context may be used to build acoustic models for that speech context. An acoustic model for use in speech processing may be selected based on the determined speech context. A language model for use in speech processing may also be selected based on the determined speech context. Using the acoustic and language models, the speech may be processed to recognize the speech from the user.

Type: Grant

Filed: June 25, 2012

Date of Patent: November 22, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Matthew P. Bell, Yuzo Watanabe, Stephen M. Polansky
Method and device for processing a sound signal

Patent number: 9495978

Abstract: A method of processing a sound signal is disclosed. The method of processing a sound signal includes receiving a sound signal from the outside of a device, converting the sound signal into a first frequency domain signal, determining whether or not the sound signal is a voice signal using the first frequency domain signal acquired through the conversion, converting the first frequency domain signal into a second frequency domain signal based on the determination, and recognizing the sound signal using the second frequency domain signal acquired through the conversion.

Type: Grant

Filed: December 4, 2015

Date of Patent: November 15, 2016

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Seok-hwan Jo, Do-hyung Kim, Jae-hyun Kim, Shi-hwa Lee
Error concealment method and apparatus for audio signal and decoding method and apparatus for audio signal using the same

Patent number: 9373331

Abstract: An error concealment method and apparatus for an audio signal and a decoding method and apparatus for an audio signal using the error concealment method and apparatus. The error concealment method includes selecting one of an error concealment in a frequency domain and an error concealment in a time domain as an error concealment scheme for a current frame based on a predetermined criteria when an error occurs in the current frame, selecting one of a repetition scheme and an interpolation scheme in the frequency domain as the error concealment scheme for the current frame based on a predetermined criteria when the error concealment in the frequency domain is selected, and concealing the error of the current frame using the selected scheme.

Type: Grant

Filed: July 2, 2013

Date of Patent: June 21, 2016

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Eun-mi Oh, Ki-hyun Choo, Ho-sang Sung, Chang-yong Son, Jung-hoe Kim, Kang eun Lee
Systems and methods for determining an interpolation factor set for synthesizing a speech signal

Patent number: 9336789

Abstract: A method for determining an interpolation factor set by an electronic device is described. The method includes determining a value based on a current frame property and a previous frame property. The method also includes determining whether the value is outside of a range. The method further includes determining an interpolation factor set based on the value and a prediction mode indicator if the value is outside of the range. The method additionally includes synthesizing a speech signal.

Type: Grant

Filed: August 30, 2013

Date of Patent: May 10, 2016

Assignee: QUALCOMM Incorporated

Inventors: Vivek Rajendran, Subasingha Shaminda Subasingha, Venkatesh Krishnan
Voice synthesis apparatus using a plurality of phonetic piece data

Patent number: 9230537

Abstract: A voice signal is synthesized using a plurality of phonetic piece data each indicating a phonetic piece containing at least two phoneme sections corresponding to different phonemes. In the apparatus, a phonetic piece adjustor forms a target section from first and second phonetic pieces so as to connect the first and second phonetic pieces to each other such that the target section includes a rear phoneme section of the first piece and a front phoneme section of the second piece, and expands the target section by a target time length to form an adjustment section such that a central part is expanded at an expansion rate higher than that of front and rear parts of the target section, to thereby create synthesized phonetic piece data having the target time length. A voice synthesizer creates a voice signal from the synthesized phonetic piece data.

Type: Grant

Filed: May 31, 2012

Date of Patent: January 5, 2016

Assignee: Yamaha Corporation

Inventor: Keijiro Saino
Vocal source extraction by maximum phase detection

Patent number: 9105272

Abstract: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.

Type: Grant

Filed: June 4, 2012

Date of Patent: August 11, 2015

Assignees: The Lithuanian University of Health Sciences, INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Aharon Satt, Zvi Kons, Ron Hoory, Virgilijus Ulozas
Speech synthesizer, speech synthesis method and computer program product

Patent number: 9058807

Abstract: According to one embodiment, a first storage unit stores n band noise signals obtained by applying n band-pass filters to a noise signal. A second storage unit stores n band pulse signals. A parameter input unit inputs a fundamental frequency, n band noise intensities, and a spectrum parameter. A extraction unit extracts for each pitch mark the n band noise signals while shifting. An amplitude control unit changes amplitudes of the extracted band noise signals and band pulse signals in accordance with the band noise intensities. A generation unit generates a mixed sound source signal by adding the n band noise signals and the n band pulse signals. A generation unit generates the mixed sound source signal generated based on the pitch mark. A vocal tract filter unit generates a speech waveform by applying a vocal tract filter using the spectrum parameter to the generated mixed sound source signal.

Type: Grant

Filed: March 18, 2011

Date of Patent: June 16, 2015

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima
Audio signal processing method and device

Patent number: 9020812

Abstract: Disclosed is an audio signal processing method comprising the steps of: receiving an audio signal containing current frame data; generating a first temporary output signal for the current frame when an error occurs in the current frame data, by carrying out frame error concealment with respect to the current frame data a random codebook; generating a parameter by carrying out one or more of short-term prediction, long-term prediction and a fixed codebook search based on the first temporary output signal; and memory updating the parameter for the next frame; wherein the parameter comprises one or more of pitch gain, pitch delay, fixed codebook gain and a fixed codebook.

Type: Grant

Filed: November 24, 2010

Date of Patent: April 28, 2015

Assignees: LG Electronics Inc., Industry-Academic Cooperation Foundation, Yonsei University

Inventors: Hye Jeong Jeon, Dae Hwan Kim, Hong Goo Kang, Min Ki Lee, Byung Suk Lee, Gyu Hyeok Jeong
Systems and Methods for Reconstructing an Audio Signal from Transformed Audio Information

Publication number: 20150112688

Abstract: A system and method may be configured to reconstruct an audio signal from transformed audio information. The audio signal may be resynthesized based on individual harmonics and corresponding pitches determined from the transformed audio information. Noise may be subtracted from the transformed audio information by interpolating across peak points and across trough points of harmonic pitch paths through the transformed audio information, and subtracting values associated with the trough point interpolations from values associated with the peak point interpolations. Noise between harmonics of the sound may be suppressed in the transformed audio information by centering functions at individual harmonics in the transformed audio information, the functions serving to suppress noise between the harmonics.

Type: Application

Filed: December 22, 2014

Publication date: April 23, 2015

Applicant: THE INTELLISIS CORPORATION

Inventors: David C. BRADLEY, Daniel S. GOLDIN, Robert N. HILTON, Nicholas K. FISHER, Rodney GATEAU
Voice synthesis apparatus

Patent number: 8996378

Abstract: In a voice synthesis apparatus, a phoneme piece interpolator acquires first phoneme piece data corresponding to a first value of sound characteristic, and second phoneme piece data corresponding to a second value of the sound characteristic. The first and second phoneme piece data indicate a spectrum of each frame of a phoneme piece. The phoneme piece interpolator interpolates between each frame of the first phoneme piece data and each frame of the second phoneme piece data so as to create phoneme piece data of the phoneme piece corresponding to a target value of the sound characteristic which is different from either of the first and second values of the sound characteristic. A voice synthesizer generates a voice signal having the target value of the sound characteristic based on the created phoneme piece data.

Type: Grant

Filed: May 24, 2012

Date of Patent: March 31, 2015

Assignee: Yamaha Corporation

Inventors: Jordi Bonada, Merlijn Blaauw, Makoto Tachibana
Coding and decoding a transient frame

Patent number: 8990094

Abstract: An electronic device for coding a transient frame is described. The electronic device includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a current transient frame. The electronic device also obtains a residual signal based on the current transient frame. Additionally, the electronic device determines a set of peak locations based on the residual signal. The electronic device further determines whether to use a first coding mode or a second coding mode for coding the current transient frame based on at least the set of peak locations. The electronic device also synthesizes an excitation based on the first coding mode if the first coding mode is determined. The electronic device also synthesizes an excitation based on the second coding mode if the second coding mode is determined.

Type: Grant

Filed: September 8, 2011

Date of Patent: March 24, 2015

Assignee: QUALCOMM Incorporated

Inventors: Venkatesh Krishnan, Ananthapadmanabhan Arasanipalai Kandhadai
System and method for controlling access to resources with a spoken CAPTCHA test

Patent number: 8868423

Abstract: Systems and methods for controlling access to resources using spoken Completely Automatic Public Turing Tests To Tell Humans And Computers Apart (CAPTCHA) tests are disclosed. In these systems and methods, entities seeking access to resources are required to produce an input utterance that contains at least some audio. That utterance is compared with voice reference data for human and machine entities, and a determination is made as to whether the entity requesting access is a human or a machine. Access is then permitted or refused based on that determination.

Type: Grant

Filed: July 11, 2013

Date of Patent: October 21, 2014

Assignee: John Nicholas and Kristin Gross Trust

Inventor: John Nicholas Gross
Fraud detection using text analysis

Patent number: 8862461

Abstract: In one embodiment, a method executed by at least one processor includes receiving text from submitted by a user. The method also includes determining a text score for the received text by comparing a first set of phrases included in the received text to a second set of phrases. The second set of phrases includes phrases from stored text. The stored text includes stored text known to be genuine and stored text known to be fraudulent. The method also includes determining that the received text is fraudulent based on the text score.

Type: Grant

Filed: November 30, 2011

Date of Patent: October 14, 2014

Assignee: Match.com, LP

Inventors: Aaron J. de Zeeuw, Clark T. Rothrock, Jason L. Alexander
Method and apparatus for encoding and decoding high frequency signal

Patent number: 8825476

Abstract: Provided are a method and apparatus for encoding and decoding a high frequency signal by using a low frequency signal. The high frequency signal can be encoded by extracting a coefficient by linear predicting a high frequency signal, and encoding the coefficient, generating a signal by using the extracted coefficient and a low frequency signal, and encoding the high frequency signal by calculating a ratio between the high frequency signal and an energy value of the generated signal. Also, the high frequency signal can be decoded by decoding a coefficient, which is extracted by linear predicting a high frequency signal, and a low frequency signal, and generating a signal by using the decoded coefficient and the decoded low frequency signal, and adjusting the generated signal by decoding a ratio between the generated signal and an energy value of the high frequency signal.

Type: Grant

Filed: April 8, 2013

Date of Patent: September 2, 2014

Assignee: Samsung Electronics Co., Ltd.

Inventors: Ki-hyun Choo, Lei Miao, Eun-mi Oh
Speech recognition repair using contextual information

Patent number: 8812316

Abstract: A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.

Type: Grant

Filed: June 5, 2014

Date of Patent: August 19, 2014

Assignee: Apple Inc.

Inventor: Lik Harry Chen
Bandwidth expansion method and apparatus

Patent number: 8805695

Abstract: A bandwidth expansion method and apparatus are disclosed, where the method includes: estimating a bandwidth of at least one decoded frame of a whole-band signal, so as to obtain an estimated bandwidth, where the estimated bandwidth corresponds to a whole-band signal that a decoded lower-band signal needs to be extended into; performing first predictive decoding on a part of the lower-band signal in a band above an effective bandwidth of the lower-band signal and below the estimated bandwidth, so as to obtain the part of the lower-band signal above the effective bandwidth of the lower-band signal and below the estimated bandwidth; and performing second predictive decoding on a part of the lower-band signal in a band above the estimated bandwidth, so as to obtain the part of the lower-band signal above the estimated bandwidth.

Type: Grant

Filed: July 22, 2013

Date of Patent: August 12, 2014

Assignee: Huawei Technologies Co., Ltd.

Inventors: Zexin Liu, Lei Miao
Speech recognition repair using contextual information

Patent number: 8762156

Abstract: A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.

Type: Grant

Filed: September 28, 2011

Date of Patent: June 24, 2014

Assignee: Apple Inc.

Inventor: Lik Harry Chen
Scaled window overlap add for mixed signals

Patent number: 8731913

Abstract: A method for overlap-adding signals useful for performing frame loss concealment (FLC) in an audio decoder as well as in other applications. The method uses a dynamic mix of windows to overlap two signals whose normalized cross-correlation may vary from zero to one. If the overlapping signals are decomposed into a correlated component and an uncorrelated component, they are overlap-added separately using the appropriate window, and then added together. If the overlapping signals are not decomposed, a weighted mix of windows is used. The mix is determined by a measure estimating the amount of cross-correlation between overlapping signals, or the relative amount of correlated to uncorrelated signals.

Type: Grant

Filed: April 13, 2007

Date of Patent: May 20, 2014

Assignee: Broadcom Corporation

Inventors: Robert W. Zopf, Juin-Hwey Chen
Controllable prosody re-estimation system and method and computer program product thereof

Patent number: 8706493

Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.

Type: Grant

Filed: July 11, 2011

Date of Patent: April 22, 2014

Assignee: Industrial Technology Research Institute

Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
Audio signal interpolation device and audio signal interpolation method

Patent number: 8655663

Abstract: An audio signal interpolation device is presented, including an input unit for receiving an input audio signal, a phase splitting unit for splitting the input audio signal, a high range interpolation unit for interpolating a high range component into the signal, a phase combining unit for combining an in-phase component signal with a differential phase component, a high-pass filter for high-pass filtering the audio signal from by the phase combining unit, a delay unit for producing a delayed audio signal, and an addition processing unit for adding the delayed audio signal to the audio signal output from the high-pass filter.

Type: Grant

Filed: September 29, 2008

Date of Patent: February 18, 2014

Assignee: D&M Holdings, Inc.

Inventors: Masaki Matsuoka, Shigeki Namiki
Method and apparatus for digital up-down conversion using infinite impulse response filter

Patent number: 8626809

Abstract: A method and an apparatus for digital up-down conversion using an Infinite Impulse Response (IIR) filter are provided. The method for digital up-down conversion for frequency conversion in a mobile communication system using plural frequency converts, includes IIR-filtering, by a magnitude response IIR filter having the same magnitude response as in Finite Impulse Response (FIR) filtering, an input signal and a stable filter coefficient calculated according to a Levinson polynomial; and receiving, by the magnitude response IIR filter, the IIR filtered signal, and performing IIR filtering by a phase compensation IIR filter having a filter coefficient compensating for a non-linear phase to a linear phase.

Type: Grant

Filed: February 24, 2010

Date of Patent: January 7, 2014

Assignees: Samsung Electronics Co., Ltd, Soongsil University

Inventors: Jun-Seok Yang, Won-Cheol Lee, Hyung-Min Jang
System and method for tracking sound pitch across an audio signal using harmonic envelope

Patent number: 8620646

Abstract: A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and a representation of harmonic envelope at the estimated pitch. The estimated pitch and the representation of harmonic envelope may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.

Type: Grant

Filed: August 8, 2011

Date of Patent: December 31, 2013

Assignee: The Intellisis Corporation

Inventors: David C. Bradley, Rodney Gateau, Daniel S. Goldin, Robert N. Hilton, Nicholas K. Fisher
Speech content based packet loss concealment

Patent number: 8589166

Abstract: Systems and methods are described for performing packet loss concealment (PLC) to mitigate the effect of one or more lost frames within a series of frames that represent a speech signal. In accordance with the exemplary systems and methods, PLC is performed by searching a codebook of speech-related parameter profiles to identify content that is being spoken and by selecting a profile associated with the identified content for use in predicting or estimating speech-related parameter information associated with one or more lost frames of a speech signal. The predicted/estimated speech-related parameter information is then used to synthesize one or more frames to replace the lost frame(s) of the speech signal.

Type: Grant

Filed: September 21, 2010

Date of Patent: November 19, 2013

Assignee: Broadcom Corporation

Inventor: Robert W. Zopf
Voice-interactive marketplace providing promotion and promotion tracking, loyalty reward and redemption, and other features

Publication number: 20130282378

Abstract: The invention provides a system, method, and business model for an information system and service having business self-promotion, promotion and promotion tracking, loyalty or frequent participant rewards and redemption, audio coupon, ratings, and other features. A business or organization in which consumers call into a service using ordinary telephone, PC, PDA, or other information appliance, and make requests in plain speech for information on goods and/or services, and the service provides responses to the request in plain speech in real-time.

Type: Application

Filed: August 1, 2005

Publication date: October 24, 2013

Inventors: Ahmet Alpdemir, Arthur James
METHOD AND APPARATUS FOR DECODING AN AUDIO SIGNAL USING AN ADPATIVE CODEBOOK UPDATE

Publication number: 20130246068

Abstract: Disclosed are a method and apparatus for decoding a an audiospeech signal using an adaptive codebook update. The method for decoding speechan audio signal includes: receiving an N+1-th normal frame data that is a normal frame transmitted after an N-th frame that is a loss frame data loss; determining whether an adaptive codebook of a final subframe of the N-th frame is updated or notby using the N-th frame and the N+1-th frame; updating the adaptive codebook of the final subframe of the N-th frame by using athe pitch index of the N+1-the frame; and synthesizing an audio a speech signal of by using the N+1-th frame.

Type: Application

Filed: September 28, 2011

Publication date: September 19, 2013

Applicant: Electronics and Telecommunications Research Institute

Inventor: Mi-Suk Lee
CAPTCHA using challenges optimized for distinguishing between humans and machines

Patent number: 8494854

Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance using optimized challenge items selected for their discrimination capability to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.

Type: Grant

Filed: June 15, 2009

Date of Patent: July 23, 2013

Assignee: John Nicholas and Kristin Gross

Inventor: John Nicholas Gross

1 2 3 4 next