Patents by Inventor Frank Soong

Frank Soong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Photo-realistic synthesis of image sequences with lip movements synchronized with speech

Patent number: 9728203

Abstract: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.

Type: Grant

Filed: May 2, 2011

Date of Patent: August 8, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Lijuan Wang, Frank Soong
Photo-realistic synthesis of three dimensional animation with facial features synchronized with speech

Patent number: 9613450

Abstract: Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.

Type: Grant

Filed: May 3, 2011

Date of Patent: April 4, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Lijuan Wang, Frank Soong, Qiang Huo, Zhengyou Zhang
PHOTO-REALISTIC SYNTHESIS OF IMAGE SEQUENCES WITH LIP MOVEMENTS SYNCHRONIZED WITH SPEECH

Publication number: 20120284029

Abstract: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.

Type: Application

Filed: May 2, 2011

Publication date: November 8, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Lijuan Wang, Frank Soong
PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH

Publication number: 20120280974

Abstract: Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.

Type: Application

Filed: May 3, 2011

Publication date: November 8, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Lijuan Wang, Frank Soong, Qiang Huo, Zhengyou Zhang
SYNTHESIZED SINGING VOICE WAVEFORM GENERATOR

Publication number: 20110231193

Abstract: Various technologies for generating a synthesized singing voice waveform. In one implementation, the computer program may receive a request from a user to create a synthesized singing voice using the lyrics of a song and a digital file containing its melody as inputs. The computer program may then dissect the lyrics' text and its melody file into its corresponding sub-phonemic units and musical score respectively. The musical score may be further dissected into a sequence of musical notes and duration times for each musical note. The computer program may then determine a fundamental frequency (F0), or pitch, of each musical note.

Type: Application

Filed: June 2, 2011

Publication date: September 22, 2011

Applicant: Microsoft Corporation

Inventors: Yao Qian, Frank Soong
Synthesized singing voice waveform generator

Patent number: 7977562

Abstract: Various technologies for generating a synthesized singing voice waveform. In one implementation, the computer program may receive a request from a user to create a synthesized singing voice using the lyrics of a song and a digital file containing its melody as inputs. The computer program may then dissect the lyrics' text and its melody file into its corresponding sub-phonemic units and musical score respectively. The musical score may be further dissected into a sequence of musical notes and duration times for each musical note. The computer program may then determine a fundamental frequency (F0), or pitch, of each musical note.

Type: Grant

Filed: June 20, 2008

Date of Patent: July 12, 2011

Assignee: Microsoft Corporation

Inventors: Yao Qian, Frank Soong
Radical-based HMM modeling for handwritten East Asian characters

Patent number: 7903877

Abstract: Exemplary methods, systems, and computer-readable media for developing, training and/or using models for online handwriting recognition of characters are described. An exemplary method for building a trainable radical-based HMM for use in character recognition includes defining radical nodes, where a radical node represents a structural element of an character, and defining connection nodes, where a connection node represents a spatial relationship between two or more radicals. Such a method may include determining a number of paths in the radical-based HMM using subsequence direction histogram vector (SDHV) clustering and determining a number of states in the radical-based HMM using curvature scale space-based (CSS) corner detection.

Type: Grant

Filed: March 6, 2007

Date of Patent: March 8, 2011

Assignee: Microsoft Corporation

Inventors: Shi Han, Yu Zou, Ming Chang, Peng Liu, Yi-Jian Wu, Lei Ma, Frank Soong, Dongmei Zhang, Jian Wang
Radical set determination for HMM based east asian character recognition

Patent number: 7805004

Abstract: Exemplary techniques are described for selecting radical sets for use in probabilistic East Asian character recognition algorithms. An exemplary technique includes applying a decomposition rule to each East Asian character of the set to generate a progressive splitting graph where the progressive splitting graph comprises radicals as nodes, formulating an optimization problem to find an optimal set of radicals to represent the set of East Asian characters using maximum likelihood and minimum description length and solving the optimization problem for the optimal set of radicals. Another exemplary technique includes selecting an optimal set of radicals by using a general function that characterizes a radical with respect to other East Asian characters and a complex function that characterizes complexity of a radical.

Type: Grant

Filed: February 28, 2007

Date of Patent: September 28, 2010

Assignee: Microsoft Corporation

Inventors: Shi Han, Yu Zou, Ming Chang, Peng Liu, Yi-Jian Wu, Lei Ma, Frank Soong, Dongmei Zhang, Jian Wang
SYNTHESIZED SINGING VOICE WAVEFORM GENERATOR

Publication number: 20090314155

Abstract: Various technologies for generating a synthesized singing voice waveform. In one implementation, the computer program may receive a request from a user to create a synthesized singing voice using the lyrics of a song and a digital file containing its melody as inputs. The computer program may then dissect the lyrics' text and its melody file into its corresponding sub-phonemic units and musical score respectively. The musical score may be further dissected into a sequence of musical notes and duration times for each musical note. The computer program may then determine a fundamental frequency (F0), or pitch, of each musical note.

Type: Application

Filed: June 20, 2008

Publication date: December 24, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Yao Qian, Frank Soong
Template constrained posterior probability

Publication number: 20090099847

Abstract: Detailed herein is a technology which, among other things, reduces errors introduced in recording and transcription data. In one approach to this technology, a method of detecting audio transcription errors is utilized. This method includes selected a focus unit, and selecting a context template corresponding to the focus unit. A hypothesis set is then determined, with reference to the context template and the focus unit. A probability is calculated corresponding to the focus unit, across the hypothesis set.

Type: Application

Filed: October 10, 2007

Publication date: April 16, 2009

Applicant: Microsoft Corporation

Inventors: Frank Soong, Lijuan Wang
Radical-Based HMM Modeling for Handwritten East Asian Characters

Publication number: 20080219556

Abstract: Exemplary methods, systems, and computer-readable media for developing, training and/or using models for online handwriting recognition of characters are described. An exemplary method for building a trainable radical-based HMM for use in character recognition includes defining radical nodes, where a radical node represents a structural element of an character, and defining connection nodes, where a connection node represents a spatial relationship between two or more radicals. Such a method may include determining a number of paths in the radical-based HMM using subsequence direction histogram vector (SDHV) clustering and determining a number of states in the radical-based HMM using curvature scale space-based (CSS) corner detection.

Type: Application

Filed: March 6, 2007

Publication date: September 11, 2008

Applicant: Microsoft Corporation

Inventors: Shi Han, Yu Zou, Ming Chang, Peng Liu, Yi-Jian Wu, Lei Ma, Frank Soong, Dongmei Zhang, Jian Wang
Radical Set Determination For HMM Based East Asian Character Recognition

Publication number: 20080205761

Abstract: Exemplary techniques are described for selecting radical sets for use in probabilistic East Asian character recognition algorithms. An exemplary technique includes applying a decomposition rule to each East Asian character of the set to generate a progressive splitting graph where the progressive splitting graph comprises radicals as nodes, formulating an optimization problem to find an optimal set of radicals to represent the set of East Asian characters using maximum likelihood and minimum description length and solving the optimization problem for the optimal set of radicals. Another exemplary technique includes selecting an optimal set of radicals by using a general function that characterizes a radical with respect to other East Asian characters and a complex function that characterizes complexity of a radical.

Type: Application

Filed: February 28, 2007

Publication date: August 28, 2008

Applicant: Microsoft Corporation

Inventors: Shi Han, Yu Zou, Ming Chang, Peng Liu, Yi-Jian Wu, Lei Ma, Frank Soong, Dongmei Zhang, Jian Wang
Common word graph based multimodal input

Publication number: 20070239432

Abstract: Multiple input modalities are selectively used by a user or process to prune a word graph. Pruning initiates rescoring in order to generate a new word graph with a revised best path.

Type: Application

Filed: March 30, 2006

Publication date: October 11, 2007

Applicant: Microsoft Corporation

Inventors: Frank Soong, Jian-Lai Zhou, Peng Liu
Weighted likelihood ratio for pattern recognition

Publication number: 20070219796

Abstract: A Weighted Likelihood Ratio Hidden Markov Model is utilized for speech processing. The model emphasizes spectral peaks when comparing spectra. Probability density functions for states in the model can be developed with weights based on the comparison.

Type: Application

Filed: March 20, 2006

Publication date: September 20, 2007

Applicant: Microsoft Corporation

Inventors: Chao Huang, Frank Soong, Jian-lai Zhou
Subword unit posterior probability for measuring confidence

Publication number: 20070219797

Abstract: Speech recognition such as command and control speech recognition generally use a context free grammar to constrain the decoding process. Word or subword background model are constructed to repopulate dynamic hypothesis space, especially when word spareness is at issue. The background models can be later used in speech recognition. During speech recognition, background and conventional context free grammar decoding are used to measure confidence. The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.

Type: Application

Filed: March 16, 2006

Publication date: September 20, 2007

Applicant: Microsoft Corporation

Inventors: Peng Liu, Ye Tian, Jian-Lai Zhou, Frank Soong
Combined input processing for a computing device

Publication number: 20060290656

Abstract: Input is received from at least two different input sources. Information from these sources are combined together to provide a result. In a particular example, input from one source corresponds to potential recognition candidates, and input from another source corresponds to other potential candidates. These candidates are combined together to select a result.

Type: Application

Filed: June 28, 2005

Publication date: December 28, 2006

Applicant: Microsoft Corporation

Inventors: Frank Soong, Jian-Lai Zhou, Ye Tian