Patents by Inventor Frank Soong
Frank Soong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9728203Abstract: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.Type: GrantFiled: May 2, 2011Date of Patent: August 8, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Lijuan Wang, Frank Soong
-
Patent number: 9613450Abstract: Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.Type: GrantFiled: May 3, 2011Date of Patent: April 4, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Lijuan Wang, Frank Soong, Qiang Huo, Zhengyou Zhang
-
Publication number: 20120284029Abstract: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.Type: ApplicationFiled: May 2, 2011Publication date: November 8, 2012Applicant: MICROSOFT CORPORATIONInventors: Lijuan Wang, Frank Soong
-
Publication number: 20120280974Abstract: Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.Type: ApplicationFiled: May 3, 2011Publication date: November 8, 2012Applicant: MICROSOFT CORPORATIONInventors: Lijuan Wang, Frank Soong, Qiang Huo, Zhengyou Zhang
-
Publication number: 20110231193Abstract: Various technologies for generating a synthesized singing voice waveform. In one implementation, the computer program may receive a request from a user to create a synthesized singing voice using the lyrics of a song and a digital file containing its melody as inputs. The computer program may then dissect the lyrics' text and its melody file into its corresponding sub-phonemic units and musical score respectively. The musical score may be further dissected into a sequence of musical notes and duration times for each musical note. The computer program may then determine a fundamental frequency (F0), or pitch, of each musical note.Type: ApplicationFiled: June 2, 2011Publication date: September 22, 2011Applicant: Microsoft CorporationInventors: Yao Qian, Frank Soong
-
Patent number: 7977562Abstract: Various technologies for generating a synthesized singing voice waveform. In one implementation, the computer program may receive a request from a user to create a synthesized singing voice using the lyrics of a song and a digital file containing its melody as inputs. The computer program may then dissect the lyrics' text and its melody file into its corresponding sub-phonemic units and musical score respectively. The musical score may be further dissected into a sequence of musical notes and duration times for each musical note. The computer program may then determine a fundamental frequency (F0), or pitch, of each musical note.Type: GrantFiled: June 20, 2008Date of Patent: July 12, 2011Assignee: Microsoft CorporationInventors: Yao Qian, Frank Soong
-
Patent number: 7903877Abstract: Exemplary methods, systems, and computer-readable media for developing, training and/or using models for online handwriting recognition of characters are described. An exemplary method for building a trainable radical-based HMM for use in character recognition includes defining radical nodes, where a radical node represents a structural element of an character, and defining connection nodes, where a connection node represents a spatial relationship between two or more radicals. Such a method may include determining a number of paths in the radical-based HMM using subsequence direction histogram vector (SDHV) clustering and determining a number of states in the radical-based HMM using curvature scale space-based (CSS) corner detection.Type: GrantFiled: March 6, 2007Date of Patent: March 8, 2011Assignee: Microsoft CorporationInventors: Shi Han, Yu Zou, Ming Chang, Peng Liu, Yi-Jian Wu, Lei Ma, Frank Soong, Dongmei Zhang, Jian Wang
-
Patent number: 7805004Abstract: Exemplary techniques are described for selecting radical sets for use in probabilistic East Asian character recognition algorithms. An exemplary technique includes applying a decomposition rule to each East Asian character of the set to generate a progressive splitting graph where the progressive splitting graph comprises radicals as nodes, formulating an optimization problem to find an optimal set of radicals to represent the set of East Asian characters using maximum likelihood and minimum description length and solving the optimization problem for the optimal set of radicals. Another exemplary technique includes selecting an optimal set of radicals by using a general function that characterizes a radical with respect to other East Asian characters and a complex function that characterizes complexity of a radical.Type: GrantFiled: February 28, 2007Date of Patent: September 28, 2010Assignee: Microsoft CorporationInventors: Shi Han, Yu Zou, Ming Chang, Peng Liu, Yi-Jian Wu, Lei Ma, Frank Soong, Dongmei Zhang, Jian Wang
-
Publication number: 20090314155Abstract: Various technologies for generating a synthesized singing voice waveform. In one implementation, the computer program may receive a request from a user to create a synthesized singing voice using the lyrics of a song and a digital file containing its melody as inputs. The computer program may then dissect the lyrics' text and its melody file into its corresponding sub-phonemic units and musical score respectively. The musical score may be further dissected into a sequence of musical notes and duration times for each musical note. The computer program may then determine a fundamental frequency (F0), or pitch, of each musical note.Type: ApplicationFiled: June 20, 2008Publication date: December 24, 2009Applicant: MICROSOFT CORPORATIONInventors: Yao Qian, Frank Soong
-
Publication number: 20090099847Abstract: Detailed herein is a technology which, among other things, reduces errors introduced in recording and transcription data. In one approach to this technology, a method of detecting audio transcription errors is utilized. This method includes selected a focus unit, and selecting a context template corresponding to the focus unit. A hypothesis set is then determined, with reference to the context template and the focus unit. A probability is calculated corresponding to the focus unit, across the hypothesis set.Type: ApplicationFiled: October 10, 2007Publication date: April 16, 2009Applicant: Microsoft CorporationInventors: Frank Soong, Lijuan Wang
-
Publication number: 20080219556Abstract: Exemplary methods, systems, and computer-readable media for developing, training and/or using models for online handwriting recognition of characters are described. An exemplary method for building a trainable radical-based HMM for use in character recognition includes defining radical nodes, where a radical node represents a structural element of an character, and defining connection nodes, where a connection node represents a spatial relationship between two or more radicals. Such a method may include determining a number of paths in the radical-based HMM using subsequence direction histogram vector (SDHV) clustering and determining a number of states in the radical-based HMM using curvature scale space-based (CSS) corner detection.Type: ApplicationFiled: March 6, 2007Publication date: September 11, 2008Applicant: Microsoft CorporationInventors: Shi Han, Yu Zou, Ming Chang, Peng Liu, Yi-Jian Wu, Lei Ma, Frank Soong, Dongmei Zhang, Jian Wang
-
Publication number: 20080205761Abstract: Exemplary techniques are described for selecting radical sets for use in probabilistic East Asian character recognition algorithms. An exemplary technique includes applying a decomposition rule to each East Asian character of the set to generate a progressive splitting graph where the progressive splitting graph comprises radicals as nodes, formulating an optimization problem to find an optimal set of radicals to represent the set of East Asian characters using maximum likelihood and minimum description length and solving the optimization problem for the optimal set of radicals. Another exemplary technique includes selecting an optimal set of radicals by using a general function that characterizes a radical with respect to other East Asian characters and a complex function that characterizes complexity of a radical.Type: ApplicationFiled: February 28, 2007Publication date: August 28, 2008Applicant: Microsoft CorporationInventors: Shi Han, Yu Zou, Ming Chang, Peng Liu, Yi-Jian Wu, Lei Ma, Frank Soong, Dongmei Zhang, Jian Wang
-
Publication number: 20070239432Abstract: Multiple input modalities are selectively used by a user or process to prune a word graph. Pruning initiates rescoring in order to generate a new word graph with a revised best path.Type: ApplicationFiled: March 30, 2006Publication date: October 11, 2007Applicant: Microsoft CorporationInventors: Frank Soong, Jian-Lai Zhou, Peng Liu
-
Publication number: 20070219796Abstract: A Weighted Likelihood Ratio Hidden Markov Model is utilized for speech processing. The model emphasizes spectral peaks when comparing spectra. Probability density functions for states in the model can be developed with weights based on the comparison.Type: ApplicationFiled: March 20, 2006Publication date: September 20, 2007Applicant: Microsoft CorporationInventors: Chao Huang, Frank Soong, Jian-lai Zhou
-
Publication number: 20070219797Abstract: Speech recognition such as command and control speech recognition generally use a context free grammar to constrain the decoding process. Word or subword background model are constructed to repopulate dynamic hypothesis space, especially when word spareness is at issue. The background models can be later used in speech recognition. During speech recognition, background and conventional context free grammar decoding are used to measure confidence. The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.Type: ApplicationFiled: March 16, 2006Publication date: September 20, 2007Applicant: Microsoft CorporationInventors: Peng Liu, Ye Tian, Jian-Lai Zhou, Frank Soong
-
Publication number: 20060290656Abstract: Input is received from at least two different input sources. Information from these sources are combined together to provide a result. In a particular example, input from one source corresponds to potential recognition candidates, and input from another source corresponds to other potential candidates. These candidates are combined together to select a result.Type: ApplicationFiled: June 28, 2005Publication date: December 28, 2006Applicant: Microsoft CorporationInventors: Frank Soong, Jian-Lai Zhou, Ye Tian