Neural Network Patents (Class 704/259)

Optimization of lip syncing in natural language translated video

Patent number: 12118323

Abstract: An approach for generating an optimized video of a speaker, translated from a source language into a target language with the speaker's lips synchronized to the translated speech, while balancing optimization of the translation into a target language. A source video may be fed into a neural machine translation model. The model may synthesize a plurality of potential translations. The translations may be received by a generative adversarial network which generates video for each translation and classifies the translations as in-sync or out of sync. A lip-syncing score may be for each of the generated videos that are classified as in-sync.

Type: Grant

Filed: September 23, 2021

Date of Patent: October 15, 2024

Assignee: International Business Machines Corporation

Inventors: Sathya Santhar, Sridevi Kannan, Sarbajit K. Rakshit, Samuel Mathew Jawaharlal
Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm

Patent number: 12033644

Abstract: Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.

Type: Grant

Filed: September 20, 2021

Date of Patent: July 9, 2024

Assignee: SMULE, INC.

Inventors: Parag Chordia, Mark Godfrey, Alexander Rae, Prerna Gupta, Perry R. Cook
Adaptive artificial neural network selection techniques

Patent number: 11847561

Abstract: Computer-implemented techniques can include obtaining, by a client computing device, a digital media item and a request for a processing task on the digital item and determining a set of operating parameters based on (i) available computing resources at the client computing device and (ii) a condition of a network. Based on the set of operating parameters, the client computing device or a server computing device can select one of a plurality of artificial neural networks (ANNs), each ANN defining which portions of the processing task are to be performed by the client and server computing devices. The client and server computing devices can coordinate processing of the processing task according to the selected ANN. The client computing device can also obtain final processing results corresponding to a final evaluation of the processing task and generate an output based on the final processing results.

Type: Grant

Filed: November 25, 2020

Date of Patent: December 19, 2023

Assignee: GOOGLE LLC

Inventors: Matthew Sharifi, Jakob Nicolaus Foerster
Speech synthesis method and system

Patent number: 11842722

Abstract: Disclosed is a speech synthesis method including: acquiring fundamental frequency information and acoustic feature information from original speech; generating an impulse train from the fundamental frequency information, and inputting it to a harmonic time-varying filter; inputting the acoustic feature information into a neural network filter estimator to obtain corresponding impulse response information; generating noise signal by a noise generator; determining, by the harmonic time-varying filter, harmonic component information through filtering processing on the impulse train and the impulse response information; determining, by a noise time-varying filter, noise component information based on the impulse response information and the noise; and generating a synthesized speech from the harmonic component information and the noise component information.

Type: Grant

Filed: June 9, 2021

Date of Patent: December 12, 2023

Assignee: AI SPEECH CO., LTD.

Inventors: Kai Yu, Zhijun Liu, Kuan Chen
Semantic coherence analysis of deep neural networks

Patent number: 11816565

Abstract: Methods and apparatus are disclosed for interpreting a deep neural network (DNN) using a Semantic Coherence Analysis (SCA)-based interpretation technique. In embodiments, a multi-layered DNN that was trained for one task is analyzed using the SCA technique to select one layer in the DNN that produces salient features for another task. In embodiments, the DNN layers are tested with test samples labeled with a set of concept labels. The output features of a DNN layer are gathered and analyzed according to the concepts. In embodiments, the output is scored with a semantic coherence score, which indicates how well the layer separates the concepts, and one layer is selected from the DNN based on its semantic coherence score. In some embodiments, a support vector machine (SVM) or additional neural network may be added to the selected layer and trained to generate classification results based on the outputs of the selected layer.

Type: Grant

Filed: February 17, 2020

Date of Patent: November 14, 2023

Assignee: Apple Inc.

Inventors: Moussa Doumbouya, Xavier Suau Cuadros, Luca Zappella, Nicholas E. Apostoloff
Method and apparatus of synthesizing speech, method and apparatus of training speech synthesis model, electronic device, and storage medium

Patent number: 11769482

Abstract: The present disclosure provides a method and apparatus of synthesizing a speech, a method and apparatus of training a speech synthesis model, an electronic device, and a storage medium. The method of synthesizing a speech includes acquiring a style information of a speech to be synthesized, a tone information of the speech to be synthesized, and a content information of a text to be processed; generating an acoustic feature information of the text to be processed, by using a pre-trained speech synthesis model, based on the style information, the tone information, and the content information of the text to be processed; and synthesizing the speech for the text to be processed, based on the acoustic feature information of the text to be processed.

Type: Grant

Filed: September 29, 2021

Date of Patent: September 26, 2023

Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Wenfu Wang, Tao Sun, Xilei Wang, Junteng Zhang, Zhengkun Gao, Lei Jia
Electronic device, method and computer program

Patent number: 11670292

Abstract: An electronic device comprising circuitry configured to perform a transcript based voice enhancement based on a transcript to obtain an enhanced audio signal.

Type: Grant

Filed: February 6, 2020

Date of Patent: June 6, 2023

Assignee: SONY CORPORATION

Inventors: Fabien Cardinaux, Marc Ferras Font
Machine-learning models for tagging video frames

Patent number: 11625880

Abstract: According to a first aspect of this specification, there is described a computer-implemented method of tagging video frames. The method comprises generating, using a frame tagging model, a tag for each of a plurality of frames of an animation sequence. The frame tagging model comprises: a first neural network portion configured to process, for each frame of the plurality of frames, a plurality of features associated with the frame and generate an encoded representation for the frame. The frame tagging model further comprises a second neural network portion configured to receive input comprising the encoded representations of each frame and generate output indicative of a tag for each of the plurality of frames.

Type: Grant

Filed: February 9, 2021

Date of Patent: April 11, 2023

Assignee: Electronic Arts Inc.

Inventor: Elaheh Akhoundi
Artificial intelligence device and method for executing an operation based on predicted biometric state of a user

Patent number: 11449790

Abstract: A computer-implemented method for controlling a device based on an ensemble model can include receiving sensing information associated with a user's biometric state; inputting first sensing information to a first model, determining a first uncertainty of the first model, and generating a first weight value for weighting a first result value; inputting second sensing information into a second model, determining a second uncertainty of the second model, and generating a second weight value for weighting a second result value; generating a final result value based on combining the first result value weighted by the first weight value and the second result value weighted by the second weight value; generating a predicted biometric state of the user based on the final result value; and executing an operation of the device based on the predicted biometric state.

Type: Grant

Filed: October 23, 2018

Date of Patent: September 20, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Gyuseog Hong, Taehwan Kim, Byunghun Choi
Clockwork hierarchical variational encoder

Patent number: 11393453

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with a corresponding prosodic syllable embedding for the syllable; predicting a pitch contour of the syllable based on the predicted duration for the syllable; and generating a plurality of fixed-length predicted pitch frames based on the predicted duration for the syllable. Each fixed-length predicted pitch frame represents part of the predicted pitch contour of the syllable.

Type: Grant

Filed: January 13, 2021

Date of Patent: July 19, 2022

Assignee: Google LLC

Inventors: Robert Clark, Chun-an Chan, Vincent Wan
Learning device, learning method, voice synthesis device, and voice synthesis method

Patent number: 11335322

Abstract: The present technology relates to a learning device, a learning method, a voice synthesis device, and a voice synthesis method configured so that information can be provided via voice allowing easy understanding of contents by a user as a speech destination. A learning device according to one embodiment of the present technology performs voice recognition of speech voice of a plurality of users, estimates statuses when a speech is made, and learns, on the basis of speech voice data, a voice recognition result, and the statuses when the speech is made, voice synthesis data to be used for generation of synthesized voice according to statuses upon voice synthesis. Moreover, a voice synthesis device estimates statuses, and uses the voice synthesis data to generate synthesized voice indicating the contents of predetermined text data and obtained according to the estimated statuses. The present technology can be applied to an agent device.

Type: Grant

Filed: February 27, 2018

Date of Patent: May 17, 2022

Assignee: SONY CORPORATION

Inventors: Hiro Iwase, Mari Saito, Shinichi Kawano
Generating acoustic sequences via neural networks using combined prosody info

Patent number: 11322135

Abstract: An example system includes a processor to receive a linguistic sequence and a prosody info offset. The processor can generate, via a trained prosody info predictor, combined prosody info including a number of observations based on the linguistic sequence. The number of observations include linear combinations of statistical measures evaluating a prosodic component over a predetermined period of time. The processor can generate, via a trained neural network, an acoustic sequence based on the combined prosody info, the prosody info offset, and the linguistic sequence.

Type: Grant

Filed: September 12, 2019

Date of Patent: May 3, 2022

Assignee: International Business Machines Corporation

Inventor: Vyacheslav Shechtman
Method, device, and computer-readable storage medium for speech synthesis in parallel

Patent number: 11289068

Abstract: The disclosure provides a method, an apparatus, a device, and a computer-readable storage medium for speech synthesis in parallel. The method includes: splitting a piece of text into a plurality of segments; based on the piece of text, obtaining a plurality of initial hidden states of the plurality of segments for a recurrent neural network. The method further includes: synthesizing the plurality of segments in parallel based on the plurality of initial hidden states and input features of the plurality of segments.

Type: Grant

Filed: May 14, 2020

Date of Patent: March 29, 2022

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Wenfu Wang, Chenxi Sun, Tao Sun, Xi Chen, Guibin Wang, Lei Jia
Clockwork hierarchical variational encoder

Patent number: 11264010

Abstract: A method for providing a frame-based mel spectral representation of speech includes receiving a text utterance having at least one word, and selecting a mel spectral embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. For each phoneme, using the selected mel spectral embedding, the method also includes: predicting a duration of the corresponding phoneme by encoding linguistic features of the corresponding phoneme with a corresponding syllable embedding for the syllable that includes the corresponding phoneme; and generating a plurality of fixed-length predicted mel-frequency spectrogram frames based on the predicted duration for the corresponding phoneme. Each fixed-length predicted mel-frequency spectrogram frame representing mel-spectral information of the corresponding phoneme.

Type: Grant

Filed: November 8, 2019

Date of Patent: March 1, 2022

Assignee: Google LLC

Inventors: Robert Andrew James Clark, Chun-an Chan, Vincent Ping Leung Wan
Efficient audio description systems and methods

Patent number: 11238899

Abstract: A computer system configured to generate an audio description of a media file is provided. The system includes a display, a memory, and a processor coupled to the display and the memory. The memory stores a media file, including video data that is accessible via a time index and audio data synchronized with the video data by the time index and a transcript of the audio data, including transcription data synchronized with the video data via the time index. The processor is configured to render, via the display, images from portions of the video data; render text from portions of the transcription data in synchrony with the images; receive input identifying a point within the time index; receive input specifying audio description data to associate with the point; store, in the memory, the audio description data; and store an association between the audio description data and the point.

Type: Grant

Filed: January 10, 2020

Date of Patent: February 1, 2022

Assignee: 3Play Media Inc.

Inventors: Joshua Miller, Christopher S. Antunes, Lily Megan Berthold-Bond, Andrew H. Schwartz, Kelly J. Savietta, Lanya Lee Butler, Sharon Lee Tomasulo, Jeremy E. Barron, Christopher E. Johnson, Roger S. Zimmerman
Automatic machine recognition of sign language gestures

Patent number: 11080520

Abstract: Computer-implemented techniques are provided for machine recognition of gestures and transformation of recognized gestures to text or speech, for gestures communicate in a sign language such as American Sign Language (ASL). In an embodiment, a computer-implemented method comprises: storing a training dataset comprising a plurality of digital images of sign language gestures and an alphabetical letter assigned to each digital image of the plurality of digital images, training a neural network using the plurality of digital images of sign language gestures as input and the alphabetical letter assigned to each digital image as output, receiving a particular digital image comprising a particular sign language gesture, and using the trained neural network to classify the particular digital image as a particular alphabetical letter.

Type: Grant

Filed: June 26, 2019

Date of Patent: August 3, 2021

Assignees: ATLASSIAN PTY LTD., ATLASSIAN INC.

Inventor: Xiaoyuan Gu
Neural models for keyphrase extraction

Patent number: 11030394

Abstract: A keyphrase extraction service implements techniques for determining a set of keyphrases associated with set of words. A word is selected from the set of words and a neural model is used to determine a label for the word based on features of the word and labels corresponding to other words of the set of words. The set of keyphrases is determined from the labels associated with the set of words.

Type: Grant

Filed: May 4, 2017

Date of Patent: June 8, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Zornitsa Petrova Kozareva, Sheng Zha, Hyokun Yun
Method and apparatus for generating speech synthesis model

Patent number: 10978042

Abstract: The present disclosure discloses a method and apparatus for generating a speech synthesis model. A specific embodiment of the method comprises: acquiring a text characteristic of a text and an acoustic characteristic of a speech corresponding to the text used for training a neural network corresponding to a speech synthesis model, fundamental frequency data in the acoustic characteristic of the speech corresponding to the text used for the training being extracted through a fundamental frequency data extraction model, and the fundamental frequency data extraction model being generated based on pre-training a neural network corresponding to the fundamental frequency data extraction model using the speech including each frame of speech having corresponding fundamental frequency data; and training the neural network corresponding to the speech synthesis model using the text characteristic of the text and the acoustic characteristic of the speech corresponding to the text.

Type: Grant

Filed: August 3, 2018

Date of Patent: April 13, 2021

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventor: Hao Li
Synthesizing speech from text using neural networks

Patent number: 10971170

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Type: Grant

Filed: August 8, 2018

Date of Patent: April 6, 2021

Assignee: Google LLC

Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
Method and apparatus for generating speech synthesis model

Patent number: 10971131

Abstract: The present disclosure discloses a method and apparatus for generating a speech synthesis model. A specific embodiment of the method comprises: acquiring a plurality of types of training samples, each of the plurality of types of training samples including a text of the type, and a speech of the text having a style of speech corresponding to the type read by an announcer corresponding to the type; and training a neural network corresponding to a speech synthesis model using the plurality of types of training samples and an annotation of the style of speech in the each of the plurality of types of training samples to obtain the speech synthesis model, the speech synthesis model being used to synthesize speech of the announcer corresponding to each of the plurality of types having a plurality of styles.

Type: Grant

Filed: August 3, 2018

Date of Patent: April 6, 2021

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventor: Yongguo Kang
Classification of audio segments using a classification network

Patent number: 10963781

Abstract: In one embodiment, an audio signal for an audio track is received and segmented into a plurality of segments of the audio signal. The plurality of segments of audio are input into a classification network that is configured to predict output values based on a plurality of genre and mood combinations formed from different combinations of a plurality of genres and a plurality of moods. The classification network predicts a set of output values for the plurality of segments, each of the set of output values corresponding to one or more the plurality of genre and mood combinations. One or more of the plurality of genre and mood combinations are assigned to the audio track based on the set of output values for one or more of the plurality of segments.

Type: Grant

Filed: August 14, 2017

Date of Patent: March 30, 2021

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Oren Barkan, Noam Koenigstein, Nir Nice
Clockwork hierarchical variational encoder

Patent number: 10923107

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with a corresponding prosodic syllable embedding for the syllable; predicting a pitch contour of the syllable based on the predicted duration for the syllable; and generating a plurality of fixed-length predicted pitch frames based on the predicted duration for the syllable. Each fixed-length predicted pitch frame represents part of the predicted pitch contour of the syllable.

Type: Grant

Filed: April 12, 2019

Date of Patent: February 16, 2021

Assignee: Google LLC

Inventors: Robert Clark, Chun-an Chan, Vincent Wan
Real-time speaker-dependent neural vocoder

Patent number: 10770063

Abstract: Techniques for a recursive deep-learning approach for performing speech synthesis using a repeatable structure that splits an input tensor into a left half and right half similar to the operation of the Fast Fourier Transform, performs a 1-D convolution on each respective half, performs a summation and then applies a post-processing function. The repeatable structure may be utilized in a series configuration to operate as a vocoder or perform other speech processing functions.

Type: Grant

Filed: August 22, 2018

Date of Patent: September 8, 2020

Assignees: Adobe Inc., The Trustees of Princeton University

Inventors: Zeyu Jin, Gautham J. Mysore, Jingwan Lu, Adam Finkelstein
Method and system for automatically classifying data expressed by a plurality of factors with values of text word and symbol sequence by using deep learning

Patent number: 10643109

Abstract: Disclosed are a method and a system for automatically classifying data expressed as a plurality of factors with values of a text word and a symbol sequence by using deep learning. The method comprises the steps of: inputting the data expressed by the plurality of factors so as to express a word vector including sequence information of the factors through sequence learning of words corresponding to the factors with respect to each factor constituting the data in a first model; inputting an output of the first model so as to calculate points of each category for classifying the categories of the data by using the word vector including the sequence information of the factor in a second model; and determining at least one category for the data by using the points of each category.

Type: Grant

Filed: April 2, 2018

Date of Patent: May 5, 2020

Assignee: NAVER Corporation

Inventors: Jung Woo Ha, Hyuna Pyo, Jeong Hee Kim
Hybrid phoneme, diphone, morpheme, and word-level deep neural networks

Patent number: 10235991

Abstract: A hybrid frame, phone, diphone, morpheme, and word-level Deep Neural Networks (DNN) in model training and applications-is based on training a regular ASR system, which can be based on Gaussian Mixture Models (GMM) or DNN. All the training data (in the format of features) are aligned with the transcripts in terms of phonemes and words with the timing information and new features are formed in terms of phonemes, diphones, morphemes, and up to words. Regular ASR produces a result lattice with timing information for each word. A feature is then extracted and sent to the word-level DNN for scoring Phoneme features are sent to corresponding DNNs for training. Scores are combined to form the word level scores, a rescored lattice and a new recognition result.

Type: Grant

Filed: August 9, 2017

Date of Patent: March 19, 2019

Assignee: AppTek, Inc.

Inventors: Jintao Jiang, Hassan Sawaf, Mudar Yaghi
Visual liveness detection

Patent number: 9626575

Abstract: In an approach for visual liveness detection, a video-audio signal related to a speaker speaking a text is obtained. The video-audio signal is split into a video signal which records images of the speaker and an audio signal which records a speech spoken by the speaker. Then a first sequence indicating visual mouth openness is obtained from the video signal, and a second sequence indicating acoustic mouth openness is obtained based on the text and the audio signal. Synchrony between the first and second sequences is measured, and the liveness of the speaker is determined based on the synchrony.

Type: Grant

Filed: August 7, 2015

Date of Patent: April 18, 2017

Assignee: International Business Machines Corporation

Inventors: Min Li, Wen Liu, Yong Qin, Zhong Su, Shi Lei Zhang, Shiwan Zhao
Systems and methods for providing non-lexical cues in synthesized speech

Patent number: 9542929

Abstract: Systems and methods for providing non-lexical cues in synthesized speech are described herein. Original text is analyzed to determine characteristics of the text and/or to derive or augment an intent (e.g., an intent code). Non-lexical cue insertion points are determined based on the characteristics of the text and/or the intent. One or more nonlexical cues are inserted at insertion points to generate augmented text. The augmented text is synthesized into speech, including converting the non-lexical cues to speech output.

Type: Grant

Filed: September 26, 2014

Date of Patent: January 10, 2017

Assignee: INTEL CORPORATION

Inventors: Jessica M. Christian, Peter Graff, Crystal A. Nakatsu, Beth Ann Hockey
Multilingual, acoustic deep neural networks

Patent number: 9460711

Abstract: Methods and systems for processing multilingual DNN acoustic models are described. An example method may include receiving training data that includes a respective training data set for each of two or more or languages. A multilingual deep neural network (DNN) acoustic model may be processed based on the training data. The multilingual DNN acoustic model may include a feedforward neural network having multiple layers of one or more nodes. Each node of a given layer may connect with a respective weight to each node of a subsequent layer, and the multiple layers of one or more nodes may include one or more shared hidden layers of nodes and a language-specific output layer of nodes corresponding to each of the two or more languages. Additionally, weights associated with the multiple layers of one or more nodes of the processed multilingual DNN acoustic model may be stored in a database.

Type: Grant

Filed: April 15, 2013

Date of Patent: October 4, 2016

Assignee: Google Inc.

Inventors: Vincent Olivier Vanhoucke, Jeffrey Adgate Dean, Georg Heigold, Marc'aurelio Ranzato, Matthieu Devin, Patrick An Phu Nguyen, Andrew William Senior
Training punctuation models

Patent number: 9135231

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for customizing the punctuation style of a transcription. A method includes receiving an utterance from a user, obtaining an unpunctuated transcription of the utterance, identifying an instance within the unpunctuated transcription where a punctuation mark may be placed, identifying, using data associated with the user, one or more past instances that are similar to the identified instance, punctuating the unpunctuated transcription based at least on the one or more past instances, and presenting the punctuated transcription to the user.

Type: Grant

Filed: January 31, 2013

Date of Patent: September 15, 2015

Assignee: Google Inc.

Inventors: Hugo B. Barra, Robert W. Hamilton
DEEP NETWORKS FOR UNIT SELECTION SPEECH SYNTHESIS

Publication number: 20150073804

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.

Type: Application

Filed: September 6, 2013

Publication date: March 12, 2015

Applicant: Google Inc.

Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
Neural translator

Patent number: 8949129

Abstract: A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.

Type: Grant

Filed: August 12, 2013

Date of Patent: February 3, 2015

Assignee: Ambient Corporation

Inventors: Michael Callahan, Thomas Coleman
HYBRID PREDICTIVE MODEL FOR ENHANCING PROSODIC EXPRESSIVENESS

Publication number: 20140358547

Abstract: Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium.

Type: Application

Filed: September 17, 2013

Publication date: December 4, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Raul Fernandez, Asaf Rendel
HYBRID PREDICTIVE MODEL FOR ENHANCING PROSODIC EXPRESSIVENESS

Publication number: 20140358546

Abstract: Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium.

Type: Application

Filed: August 28, 2013

Publication date: December 4, 2014

Applicant: International Business Machines Corporation

Inventors: Raul Fernandez, Asaf Rendel
Method and apparatus of transforming speech feature vectors using an auto-associative neural network

Patent number: 8838446

Abstract: Provided is a method and apparatus for transforming a speech feature vector. The method includes extracting a feature vector required for speech recognition from a speech signal and transforming the extracted feature vector using an auto-associative neural network (AANN).

Type: Grant

Filed: August 31, 2007

Date of Patent: September 16, 2014

Assignee: Samsung Electronics Co., Ltd.

Inventors: So-young Jeong, Kwang-cheol Oh, Jae-hoon Jeong, Jeong-su Kim
Multiple language voice recognition

Patent number: 8788256

Abstract: Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.

Type: Grant

Filed: February 2, 2010

Date of Patent: July 22, 2014

Assignee: Sony Computer Entertainment Inc.

Inventors: Ruxin Chen, Gustavo Hernandez-Abrego, Masanori Omote, Xavier Menendez-Pidal
Speech synthesis using deep neural networks

Patent number: 8527276

Abstract: A method and system for is disclosed for speech synthesis using deep neural networks. A neural network may be trained to map input phonetic transcriptions of training-time text strings into sequences of acoustic feature vectors, which yield predefined speech waveforms when processed by a signal generation module. The training-time text strings may correspond to written transcriptions of speech carried in the predefined speech waveforms. Subsequent to training, a run-time text string may be translated to a run-time phonetic transcription, which may include a run-time sequence of phonetic-context descriptors, each of which contains a phonetic speech unit, data indicating phonetic context, and data indicating time duration of the respective phonetic speech unit. The trained neural network may then map the run-time sequence of the phonetic-context descriptors to run-time predicted feature vectors, which may in turn be translated into synthesized speech by the signal generation module.

Type: Grant

Filed: October 25, 2012

Date of Patent: September 3, 2013

Assignee: Google Inc.

Inventors: Andrew William Senior, Byungha Chun, Michael Schuster
Systems and methods for determining the N-best strings

Patent number: 8527273

Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.

Type: Grant

Filed: July 30, 2012

Date of Patent: September 3, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mehryar Mohri, Michael Dennis Riley
Prosody modification device, prosody modification method, and recording medium storing prosody modification program

Patent number: 8433573

Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody in

Type: Grant

Filed: February 11, 2008

Date of Patent: April 30, 2013

Assignee: Fujitsu Limited

Inventors: Kentaro Murase, Nobuyuki Katae
Filtering of beamformed speech signals

Patent number: 8392184

Abstract: The invention relates to speech signal processing that detects a speech signal from more than one microphone and obtains microphone signals that are processed by a beamformer to obtain a beamformed signal that is post-filtered signal with a filter that employs adaptable filter weights to obtain an enhanced beamformed signal with the post-filter adapting the filter weights with previously learned filter weights.

Type: Grant

Filed: January 21, 2009

Date of Patent: March 5, 2013

Assignee: Nuance Communications, Inc.

Inventors: Markus Buck, Klaus Scheufele
Method and apparatus for speech analysis and synthesis

Patent number: 8280739

Abstract: The present invention provides a speech analysis method comprising steps of obtaining a speech signal and a corresponding DEGG/EGG signal; regarding the speech signal as the output of a vocal tract filter in a source-filter model taking the DEGG/EGG signal as the input; and estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the features of the vocal tract filter are expressed by the state vectors of the vocal tract filter at selected time points, and the step of estimating is performed using Kalman filtering.

Type: Grant

Filed: April 3, 2008

Date of Patent: October 2, 2012

Assignee: Nuance Communications, Inc.

Inventors: Dan Ning Jiang, Fan Ping Meng, Yong Qin, Zhi Wei Shuang
Generalized object recognition for portable reading machine

Patent number: 8160880

Abstract: Techniques for operating a reading machine are disclosed. The techniques include forming an N-dimensional features vector based on features of an image, the features corresponding to characteristics of at least one object depicted in the image, representing the features vector as a point in n-dimensional space, where n corresponds to N, the number of features in the features vector and comparing the point in n-dimensional space to a centroid that represents a cluster of points in the n-dimensional space corresponding to a class of objects to determine whether the point belongs in the class of objects corresponding to the centroid.

Type: Grant

Filed: April 28, 2008

Date of Patent: April 17, 2012

Assignee: K-NFB Reading Technology, Inc.

Inventors: Paul Albrecht, Rafael Maya Zetune, Lucy Gibson, Raymond C. Kurzweil
Speech synthesizer

Patent number: 7991616

Abstract: The present invention is a speech synthesizer that generates speech data of text including a fixed part and a variable part, in combination with recorded speech and rule-based synthetic speech. The speech synthesizer is a high-quality one in which recorded speech and synthetic speech are concatenated with the discontinuity of timbres and prosodies not perceived.

Type: Grant

Filed: October 22, 2007

Date of Patent: August 2, 2011

Assignee: Hitachi, Ltd.

Inventors: Yusuke Fujita, Ryota Kamoshida, Kenji Nagamatsu
Methods and systems for participant sourcing indication in multi-party conferencing and for audio source discrimination

Patent number: 7916848

Abstract: Indications of which participant is providing information during a multi-party conference. Each participant has equipment to display information being transferred during the conference. A sourcing signaler residing in the participant equipment provides a signal that indicates the identity of its participant when this participant is providing information to the conference. The source indicators of the other participant equipment receive the signal and cause a UI to indicate that the participant identified by the received signal is providing information (e.g. the UI can causes the identifier to change appearance). An audio discriminator is used to distinguish between an acoustic signal that was generated by a person speaking from that generated in a band-limited manner. The audio discriminator analyzes the spectrum of detected audio signals and generates several parameters from the spectrum and from past determinations to determine the source of an audio signal on a frame-by-frame basis.

Type: Grant

Filed: October 1, 2003

Date of Patent: March 29, 2011

Assignee: Microsoft Corporation

Inventors: Yong Rui, Anoop Gupta
Activity-centric domain scoping

Patent number: 7836002

Abstract: A system that can automatically narrow the search space or recognition scope within an activity-centric environment based upon a current activity or set of activities is provided. In addition, the activity and context data can also be used to rank the results of the recognition or search activity. In accordance with the domain scoping, natural language processing (NLP) as well as other types of conversion and recognition systems can dynamically adjust to the scope of the activity or group of activities thereby increasing the recognition systems accuracy and usefulness. In operation, a user context, activity context, environment context and/or device profile can be employed to effectuate the scoping. As well, the system can combine context with extrinsic data, including but not limited to, calendar, profile, historical activity data, etc. in order to define the parameters for an appropriate scoping.

Type: Grant

Filed: June 27, 2006

Date of Patent: November 16, 2010

Assignee: Microsoft Corporation

Inventors: Steven W. Macbeth, Roland L. Fernandez, Brian R. Meyers, Desney S. Tan, George G. Robertson, Nuria M. Oliver, Oscar E. Murillo
Rule based speech synthesis method and apparatus

Patent number: 7765103

Abstract: A rule based speech synthesis apparatus by which concatenation distortion may be less than a preset value without dependency on utterance, wherein a parameter correction unit reads out a target parameter for a vowel from a target parameter storage, responsive to the phoneme at a leading end and at a trailing end of a speech element and acoustic feature parameters output from a speech element selector, and accordingly corrects the acoustic feature parameters of the speech element. The parameter correction unit corrects the parameters, so that the parameters ahead and behind the speech element are equal to the target parameter for the vowel of the corresponding phoneme, and outputs the corrected parameters.

Type: Grant

Filed: June 9, 2004

Date of Patent: July 27, 2010

Assignee: Sony Corporation

Inventor: Nobuhide Yamazaki
METHOD AND APPARATUS FOR TRAINING DIFFERENCE PROSODY ADAPTATION MODEL, METHOD AND APPARATUS FOR GENERATING DIFFERENCE PROSODY ADAPTATION MODEL, METHOD AND APPARATUS FOR PROSODY PREDICTION, METHOD AND APPARATUS FOR SPEECH SYNTHESIS

Publication number: 20090157409

Abstract: A method includes, generating, for each parameter of the prosody vector, an initial parameter prediction model with a plurality of attributes related to difference prosody prediction and at least part of attribute combinations of the plurality of attributes, in which each of the plurality of attributes and the attribute combinations is included as an item, calculating importance of each item in the parameter prediction model, deleting the item having the lowest importance calculated, re-generating a parameter prediction model with the remaining items, determining whether the re-generated parameter prediction model is an optimal model, and repeating the step of calculating importance and the steps following the step of calculating importance with the re-generated parameter prediction model, if the re-generated parameter prediction model is determined as not an optimal model, wherein the difference prosody vector and all parameter prediction models of the difference prosody vector constitute the difference pros

Type: Application

Filed: December 4, 2008

Publication date: June 18, 2009

Inventors: Yi Lifu, Li Jian, Lou Xiaoyan, Hao Jie
Method of setting optimum-partitioned classified neural network and method and apparatus for automatic labeling using optimum-partitioned classified neural network

Patent number: 7444282

Abstract: A method of automatic labeling using an optimum-partitioned classified neural network includes searching for neural networks having minimum errors with respect to a number of L phoneme combinations from a number of K neural network combinations generated at an initial stage or updated, updating weights during learning of the K neural networks by K phoneme combination groups searched with the same neural networks, and composing an optimum-partitioned classified neural network combination using the K neural networks of which a total error sum has converged; and tuning a phoneme boundary of a first label file by using the phoneme combination group classification result and the optimum-partitioned classified neural network combination, and generating a final label file reflecting the tuning result.

Type: Grant

Filed: March 1, 2004

Date of Patent: October 28, 2008

Assignee: Samsung Electronics Co., Ltd.

Inventors: Ki-hyun Choo, Jeong-su Kim, Jae-won Lee, Ki-seung Lee
OVERLAPPING SCREEN READING OF NON-SEQUENTIAL TEXT

Publication number: 20080243510

Abstract: Embodiments of the present invention address deficiencies of the art in respect to screen reading non-sequential text and provide a method, system and computer program product for overlapping screen reading of non-sequential text, such as a tag cloud or Web page header. In an embodiment of the invention, an overlapping screen reading method for a non-sequential list of words can include computing different speech synthesis parameters for different words in a non-sequential list of words, generating different audio forms for each of the different words according to the different speech synthesis parameters, and overlappingly merging the generated different audio forms into a single audio stream. The speech synthesis parameters can include, for instance, separation, volume, tone and location speech synthesis parameters. Thereafter, the method can include playing back the single audio stream to simulate a natural visual scanning of the non-sequential list of words.

Type: Application

Filed: March 28, 2007

Publication date: October 2, 2008

Inventor: Lawrence C. Smith
Method and device for determining prosodic markers by neural autoassociators

Patent number: 7409340

Abstract: A neural network is used to obtain more robust performance in determining prosodic markers on the basis of linguistic categories.

Type: Grant

Filed: January 27, 2003

Date of Patent: August 5, 2008

Assignee: Siemens Aktiengesellschaft

Inventors: Martin Holzapfel, Achim Mueller
System and method for audio/video speaker detection

Patent number: 7343289

Abstract: A system and method for detecting speech utilizing audio and video inputs. In one aspect, the invention collects audio data generated from a microphone device. In another aspect, the invention collects video data and processes the data to determine a mouth location for a given speaker. The audio and video are inputted into a time-delay neural network that processes the data to determine which target is speaking. The neural network processing is based upon a correlation to detected mouth movement from the video data and audio sounds detected by the microphone.

Type: Grant

Filed: June 25, 2003

Date of Patent: March 11, 2008

Assignee: Microsoft Corp.

Inventors: Ross Cutler, Ashish Kapoor

1 2 next