Patents by Inventor Chang-Ning Huang

Chang-Ning Huang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and system for retrieving confirming sentences

Patent number: 7974963

Abstract: A method, computer readable medium and system are provided which retrieve confirming sentences from a sentence database in response to a query. A search engine retrieves confirming sentences from the sentence database in response to the query. IN retrieving the confirming sentences, the search engine defines indexing units based upon the query, with the indexing units including both lemma from the query and extended indexing units associated with the query. The search engine then retrieves a plurality of sentences from the sentence database using the defined indexing units as search parameters. A similarity between each of the plurality of retrieved sentences and the query is determined by the search engine, wherein each similarity is determined as a function of a linguistic weight of a term in the query. The search engine then ranks the plurality of retrieved sentences based upon the determined similarities.

Type: Grant

Filed: July 22, 2005

Date of Patent: July 5, 2011

Inventors: Ming Zhou, Hua Wu, Yue Zhang, Jianfeng Gao, Chang-Ning Huang
Standardized natural language chunking utility

Patent number: 7672832

Abstract: A method is disclosed for providing a chunking utility that supports robust natural language processing. A corpus is chunked in accordance with a draft chunking specification. Chunk inconsistencies in the corpus are automatically flagged for resolution, and a chunking utility is provided in which at least some of the flagged inconsistencies are resolved. The chunking utility provides a single, consistent global chunking standard, ensuring compatibility among various applications. The chunking utility is particularly advantageous for non-alphabetic languages, such as Chinese.

Type: Grant

Filed: February 1, 2006

Date of Patent: March 2, 2010

Assignee: Microsoft Corporation

Inventors: Chang-Ning Huang, Hong-Qiao Li, Jianfeng Gao
System and method for identifying base noun phrases

Patent number: 7496501

Abstract: A system and method identify base noun phrases (baseNP) in a linguistic input. A part-of-speech tagger identifies N-best part-of-speech tag sequences corresponding to the linguistic input. A baseNP identifier identifies baseNPs in the linguistic input using a unified statistical model that identifies the baseNPs, given the N-best POS sequences.

Type: Grant

Filed: October 29, 2004

Date of Patent: February 24, 2009

Assignee: Microsoft Corporation

Inventors: Endong Xun, Ming Zhou, Chang-Ning Huang
Using source-channel models for word segmentation

Patent number: 7493251

Abstract: A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.

Type: Grant

Filed: May 30, 2003

Date of Patent: February 17, 2009

Assignee: Microsoft Corporation

Inventors: Jianfeng Gao, Mu Li, Chang-Ning Huang, Jian Sun, Lei Zhang, Ming Zhou
Processing noisy data and determining word similarity

Patent number: 7343280

Abstract: The present invention deals with noisy data not by eliminating low frequency dependency structures, but rather by weighting the dependency structures. The dependency structures are weighted to give less weight to dependency structures which are more likely incorrect and to give more weight to dependency structures which are more likely correct.

Type: Grant

Filed: July 1, 2003

Date of Patent: March 11, 2008

Assignee: Microsoft Corporation

Inventors: Hua Wu, Ming Zhou, Chang-Ning Huang
Standardized natural language chunking utility

Publication number: 20070282592

Abstract: A method is disclosed for providing a chunking utility that supports robust natural language processing. A corpus is chunked in accordance with a draft chunking specification. Chunk inconsistencies in the corpus are automatically flagged for resolution, and a chunking utility is provided in which at least some of the flagged inconsistencies are resolved. The chunking utility provides a single, consistent global chunking standard, ensuring compatibility among various applications. The chunking utility is particularly advantageous for non-alphabetic languages, such as Chinese.

Type: Application

Filed: February 1, 2006

Publication date: December 6, 2007

Applicant: Microsoft Corporation

Inventors: Chang-Ning Huang, Hong-Qiao Li, Jianfeng Gao
Detecting segmentation errors in an annotated corpus

Publication number: 20070078644

Abstract: Segmentation error candidates are detected using segmentation variations found in an annotated corpus.

Type: Application

Filed: September 30, 2005

Publication date: April 5, 2007

Applicant: Microsoft Corporation

Inventors: Chang-Ning Huang, Jianfeng Gao, Mu Li
Method and system for retrieving confirming sentences

Patent number: 7194455

Abstract: A method, computer readable medium and system are provided which retrieve confirming sentences from a sentence database in response to a query. A search engine retrieves confirming sentences from the sentence database in response to the query. IN retrieving the confirming sentences, the search engine defines indexing units based upon the query, with the indexing units including both lemma from the query and extended indexing units associated with the query. The search engine then retrieves a plurality of sentences from the sentence database using the defined indexing units as search parameters. A similarity between each of the plurality of retrieved sentences and the query is determined by the search engine, wherein each similarity is determined as a function of a linguistic weight of a term in the query. The search engine then ranks the plurality of retrieved sentences based upon the determined similarities.

Type: Grant

Filed: September 19, 2002

Date of Patent: March 20, 2007

Assignee: Microsoft Corporation

Inventors: Ming Zhou, Hua Wu, Yue Zhang, Jianfeng Gao, Chang-Ning Huang
Method and system for retrieving confirming sentences

Publication number: 20050273318

Abstract: A method, computer readable medium and system are provided which retrieve confirming sentences from a sentence database in response to a query. A search engine retrieves confirming sentences from the sentence database in response to the query. IN retrieving the confirming sentences, the search engine defines indexing units based upon the query, with the indexing units including both lemma from the query and extended indexing units associated with the query. The search engine then retrieves a plurality of sentences from the sentence database using the defined indexing units as search parameters. A similarity between each of the plurality of retrieved sentences and the query is determined by the search engine, wherein each similarity is determined as a function of a linguistic weight of a term in the query. The search engine then ranks the plurality of retrieved sentences based upon the determined similarities.

Type: Application

Filed: July 22, 2005

Publication date: December 8, 2005

Applicant: Microsoft Corporation

Inventors: Ming Zhou, Hua Wu, Yue Zhang, Jianfeng Gao, Chang-Ning Huang
System and iterative method for lexicon, segmentation and language model joint optimization

Patent number: 6904402

Abstract: A method for optimizing a language model is presented comprising developing an initial language model from a lexicon and segmentation derived from a received corpus using a maximum match technique, and iteratively refining the initial language model by dynamically updating the lexicon and re-segmenting the corpus according to statistical principles until a threshold of predictive capability is achieved.

Type: Grant

Filed: June 30, 2000

Date of Patent: June 7, 2005

Assignee: Microsoft Corporation

Inventors: Hai-Feng Wang, Chang-Ning Huang, Kai-Fu Lee, Shuo Di, Jianfeng Gao, Dong-Feng Cai, Lee-Feng Chien
System and method for identifying base noun phrases

Publication number: 20050071149

Abstract: A system and method identify base noun phrases (baseNP) in a linguistic input. A part-of-speech tagger identifies N-best part-of-speech tag sequences corresponding to the linguistic input. A baseNP identifier identifies baseNPs in the linguistic input using a unified statistical model that identifies the baseNPs, given the N-best POS sequences.

Type: Application

Filed: October 29, 2004

Publication date: March 31, 2005

Applicant: Microsoft Corporation

Inventors: Endong Xun, Ming Zhou, Chang-Ning Huang
Chinese word segmentation

Publication number: 20050071148

Abstract: The present invention relates to a corpus for use in training a language model. The corpus includes a plurality of characters and a plurality of morphological tags associated with a plurality of sequences of characters. The plurality of morphological tags indicate a morphological type of an associated sequence of characters and a combination of parts forming a morphological subtype.

Type: Application

Filed: September 15, 2003

Publication date: March 31, 2005

Applicant: Microsoft Corporation

Inventors: Chang-Ning Huang, Jianfeng Gao, Mu Li, Ashley Chang
System and method for identifying base noun phrases

Patent number: 6859771

Abstract: A system and method identify base noun phrases (baseNP) in a linguistic input. A part-of-speech tagger identifies N-best part-of-speech tag sequences corresponding to the linguistic input. A baseNP identifier identifies baseNPs in the linguistic input using a unified statistical model that identifies the baseNPs, given the N-best POS sequences.

Type: Grant

Filed: June 4, 2001

Date of Patent: February 22, 2005

Assignee: Microsoft Corporation

Inventors: Endong Xun, Ming Zhou, Chang-Ning Huang
Processing noisy data and determining word similarity

Publication number: 20050004790

Abstract: The present invention deals with noisy data not by eliminating low frequency dependency structures, but rather by weighting the dependency structures. The dependency structures are weighted to give less weight to dependency structures which are more likely incorrect and to give more weight to dependency structures which are more likely correct.

Type: Application

Filed: July 1, 2003

Publication date: January 6, 2005

Applicant: Microsoft Corporation

Inventors: Hua Wu, Ming Zhou, Chang-Ning Huang
Method and apparatus using source-channel models for word segmentation

Publication number: 20040243408

Abstract: A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.

Type: Application

Filed: May 30, 2003

Publication date: December 2, 2004

Applicant: Microsoft Corporation

Inventors: Jianfeng Gao, Mu Li, Chang-Ning Huang, Jian Sun, Lei Zhang, Ming Zhou
System and iterative method for lexicon, segmentation and language model joint optimization

Publication number: 20040210434

Abstract: A method for optimizing a language model is presented comprising developing an initial language model from a lexicon and segmentation derived from a received corpus using a maximum match technique, and iteratively refining the initial language model by dynamically updating the lexicon and re-segmenting the corpus according to statistical principles until a threshold of predictive capability is achieved.

Type: Application

Filed: May 10, 2004

Publication date: October 21, 2004

Applicant: Microsoft Corporation

Inventors: Hai-Feng Wang, Chang-Ning Huang, Kai-Fu Lee, Shuo Di, Jianfeng Gao, Dong-Feng Cai, Lee-Feng Chien
Method and system for retrieving confirming sentences

Publication number: 20040059718

Abstract: A method, computer readable medium and system are provided which retrieve confirming sentences from a sentence database in response to a query. A search engine retrieves confirming sentences from the sentence database in response to the query. IN retrieving the confirming sentences, the search engine defines indexing units based upon the query, with the indexing units including both lemma from the query and extended indexing units associated with the query. The search engine then retrieves a plurality of sentences from the sentence database using the defined indexing units as search parameters. A similarity between each of the plurality of retrieved sentences and the query is determined by the search engine, wherein each similarity is determined as a function of a linguistic weight of a term in the query. The search engine then ranks the plurality of retrieved sentences based upon the determined similarities.

Type: Application

Filed: September 19, 2002

Publication date: March 25, 2004

Inventors: Ming Zhou, Hua Wu, Yue Zhang, Jianfeng Gao, Chang-Ning Huang
Example based machine translation system

Publication number: 20040002848

Abstract: The present invention performs machine translation by matching fragments of a source language sentence to be translated to source language portions of an example in example base. When all relevant examples have been identified in the example base, the examples are subjected to phrase alignment in which fragments of the target language sentence in each example are aligned against the matched fragments of the source language sentence in the same example. A translation component then substitutes the aligned target language phrases from the matched examples for the matched fragments in the source language sentence.

Type: Application

Filed: June 28, 2002

Publication date: January 1, 2004

Inventors: Ming Zhou, Jin-Xia Huang, Chang Ning Huang, Wei Wang
System and method for identifying base noun phrases

Publication number: 20030014238

Abstract: A system and method identify base noun phrases (baseNP) in a linguistic input. A part-of-speech tagger identifies N-best part-of-speech tag sequences corresponding to the linguistic input. A baseNP identifier identifies baseNPs in the linguistic input using a unified statistical model that identifies the baseNPs, given the N-best POS sequences.

Type: Application

Filed: June 4, 2001

Publication date: January 16, 2003

Inventors: Endong Xun, Ming Zhou, Chang-Ning Huang