Patents by Inventor Xiaoqiang Luo

Xiaoqiang Luo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

FINE-TUNING LARGE LANGUAGE MODELS FOR DOMAIN-SPECIFIC ENVIRONMENTS

Publication number: 20250077792

Abstract: Embodiments of the disclosed technologies are capable of a training pipeline to fine-tune a machine learning model given a limited set of domain-specific data. The embodiments describe using a first machine learning model to generate a pseudo label associated with a domain-specific training document. The pseudo label comprises a machine-generated text of a content type extracted from the domain-specific training document. The embodiments further describe fine-tuning a second machine learning model using the pseudo label, the domain-specific training document, a first low-rank weight matrix, and a second low-rank weight matrix. The fine-tuned second machine learning model generates text of the content type from a domain-specific document.

Type: Application

Filed: August 31, 2023

Publication date: March 6, 2025

Inventors: Xilun Chen, Tzu Ming Kuo, Xiaoqiang Luo, Ilya Dan Melamed, Ji Yan, Peide Zhong
Secure storage and processing of data for generating training data

Patent number: 12197539

Abstract: Techniques for securely storing and processing data for training data generation are provided. In one technique, multiple encrypted records are retrieved from a first persistent storage. For each encrypted record, that record is decrypted in memory to generate a decrypted record that comprises multiple attribute values. Then, based on the attribute values and a definition of multiple features of a machine-learned model, multiple feature values are generated and stored, along with a label, in a training instance, which is then stored in a second persistent storage. One or more machine learning techniques are used to train the machine-learned model based on training data that includes the training instances that are stored in the second persistent storage.

Type: Grant

Filed: February 5, 2021

Date of Patent: January 14, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yunpeng Xu, Tianhao Lu, Xiaoqiang Luo, Jiashuo Wang, Chencheng Wu
SECURE STORAGE AND PROCESSING OF DATA FOR GENERATING TRAINING DATA

Publication number: 20220253540

Abstract: Techniques for securely storing and processing data for training data generation are provided. In one technique, multiple encrypted records are retrieved from a first persistent storage. For each encrypted record, that record is decrypted in memory to generate a decrypted record that comprises multiple attribute values. Then, based on the attribute values and a definition of multiple features of a machine-learned model, multiple feature values are generated and stored, along with a label, in a training instance, which is then stored in a second persistent storage. One or more machine learning techniques are used to train the machine-learned model based on training data that includes the training instances that are stored in the second persistent storage.

Type: Application

Filed: February 5, 2021

Publication date: August 11, 2022

Inventors: Yunpeng XU, Tianhao LU, Xiaoqiang LUO, Jiashuo WANG, Chencheng WU
System and method for automatically detecting and interactively displaying information about entities, activities, and events from multiple-modality natural language sources

Patent number: 10698964

Abstract: A method for automatically extracting and organizing information by a processing device from a plurality of data sources is provided. A natural language processing information extraction pipeline that includes an automatic detection of entities is applied to the data sources. Information about detected entities is identified by analyzing products of the natural language processing pipeline. Identified information is grouped into equivalence classes containing equivalent information. At least one displayable representation of the equivalence classes is created. An order in which the at least one displayable representation is displayed is computed. A combined representation of the equivalence classes that respects the order in which the displayable representation is displayed is produced.

Type: Grant

Filed: January 30, 2017

Date of Patent: June 30, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Vittorio Castelli, Radu Florian, Xiaoqiang Luo, Hema Raghavan
MACHINE LEARNING TECHNIQUES FOR AUTOMATIC VALIDATION OF EVENTS

Publication number: 20200097605

Abstract: A system and method are provided for automatic identification, extraction, and validation of data pertaining to receiving entity events (REE). Feature (or attribute) values associated with web content are identified. The web content may contain news and features on current/past affairs. The identified feature values are considered by a rule-based or a machine-learned model and, based upon output of the model, a determination as to whether the set of data comprises a REE is made. If the determination is positive, then multiple data items are extracted from the set of data and, optionally, from other data from the source.

Type: Application

Filed: September 25, 2018

Publication date: March 26, 2020

Inventors: Jingyuan Liu, Xiaoqiang Luo, Tzu Ming Kuo, Marcello Oliva, Yunpeng Xu
IDENTIFYING RELATIONSHIPS BETWEEN ENTITIES USING MACHINE LEARNING

Publication number: 20190197176

Abstract: Techniques for identifying relationships between entities using machine learning are disclosed herein.

Type: Application

Filed: December 21, 2017

Publication date: June 27, 2019

Inventors: Xiaoqiang Luo, Yunpeng Xu, Marcello Oliva
SYSTEM AND METHOD FOR AUTOMATICALLY DETECTING AND INTERACTIVELY DISPLAYING INFORMATION ABOUT ENTITIES, ACTIVITIES, AND EVENTS FROM MULTIPLE-MODALITY NATURAL LANGUAGE SOURCES

Publication number: 20170140057

Abstract: A method for automatically extracting and organizing information by a processing device from a plurality of data sources is provided. A natural language processing information extraction pipeline that includes an automatic detection of entities is applied to the data sources. Information about detected entities is identified by analyzing products of the natural language processing pipeline. Identified information is grouped into equivalence classes containing equivalent information. At least one displayable representation of the equivalence classes is created. An order in which the at least one displayable representation is displayed is computed. A combined representation of the equivalence classes that respects the order in which the displayable representation is displayed is produced.

Type: Application

Filed: January 30, 2017

Publication date: May 18, 2017

Inventors: VITTORIO CASTELLI, Radu Florian, Xiaoqiang Luo, Hema Raghavan
Deep analysis of natural language questions for question answering system

Patent number: 9471559

Abstract: Creating training data for a natural language processing system may comprise obtaining natural language input, the natural language input annotated with one or more important phrases; and generating training instances comprising a syntactic parse tree of nodes representing elements of the natural language input augmented with the annotated important phrases. In another aspect, a classifier may be trained based on the generated training instances. The classifier may be used to predict one or more potential important phrases in a query.

Type: Grant

Filed: March 15, 2013

Date of Patent: October 18, 2016

Assignee: International Business Machines Corporation

Inventors: Vittorio Castelli, Radu Florian, Xiaoqiang Luo, Sameer Maskey, Hema Raghavan
Predicting pronouns of dropped pronoun style languages for natural language translation

Patent number: 8903707

Abstract: A method, an apparatus and an article of manufacture for determining a dropped pronoun from a source language. The method includes collecting parallel sentences from a source and a target language, creating at least one word alignment between the parallel sentences in the source and the target language, mapping at least one pronoun from the target language sentence onto the source language sentence, computing at least one feature from the mapping, wherein the at least one feature is extracted from both the source language and the at least one pronoun projected from the target language, and using the at least one feature to train a classifier to predict position and spelling of at least one pronoun in the target language when the at least one pronoun is dropped in the source language.

Type: Grant

Filed: January 12, 2012

Date of Patent: December 2, 2014

Assignee: International Business Machines Corporation

Inventors: Bing Zhao, Imed Zitouni, Xiaoqiang Luo, Vittorio Castelli
DEEP ANALYSIS OF NATURAL LANGUAGE QUESTIONS FOR QUESTION ANSWERING SYSTEM

Publication number: 20140163962

Abstract: Creating training data for a natural language processing system may comprise obtaining natural language input, the natural language input annotated with one or more important phrases; and generating training instances comprising a syntactic parse tree of nodes representing elements of the natural language input augmented with the annotated important phrases. In another aspect, a classifier may be trained based on the generated training instances. The classifier may be used to predict one or more potential important phrases in a query.

Type: Application

Filed: March 15, 2013

Publication date: June 12, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Vittorio Castelli, Radu Florian, Xiaoqiang Luo, Sameer Maskey, Hema Raghavan
Mention-synchronous entity tracking: system and method for chaining mentions

Patent number: 8620961

Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.

Type: Grant

Filed: May 5, 2008

Date of Patent: December 31, 2013

Assignee: International Business Machines Corporation

Inventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim E. Roukos
System and Method for Automatically Detecting and Interactively Displaying Information About Entities, Activities, and Events from Multiple-Modality Natural Language Sources

Publication number: 20130332450

Abstract: A method for automatically extracting and organizing information by a processing device from a plurality of data sources is provided. A natural language processing information extraction pipeline that includes an automatic detection of entities is applied to the data sources. Information about detected entities is identified by analyzing products of the natural language processing pipeline. Identified information is grouped into equivalence classes containing equivalent information. At least one displayable representation of the equivalence classes is created. An order in which the at least one displayable representation is displayed is computed. A combined representation of the equivalence classes that respects the order in which the displayable representation is displayed is produced.

Type: Application

Filed: June 11, 2012

Publication date: December 12, 2013

Applicant: International Business Machines Corporation

Inventors: Vittorio Castelli, Radu Florian, Xiaoqiang Luo, Hema Raghavan
Predicting Pronouns for Pro-Drop Style Languages for Natural Language Translation

Publication number: 20130185049

Abstract: A method, an apparatus and an article of manufacture for determining a dropped pronoun from a source language. The method includes collecting parallel sentences from a source and a target language, creating at least one word alignment between the parallel sentences in the source and the target language, mapping at least one pronoun from the target language sentence onto the source language sentence, computing at least one feature from the mapping, wherein the at least one feature is extracted from both the source language and the at least one pronoun projected from the target language, and using the at least one feature to train a classifier to predict position and spelling of at least one pronoun in the target language when the at least one pronoun is dropped in the source language.

Type: Application

Filed: January 12, 2012

Publication date: July 18, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bing Zhao, Imed Zitouni, Xiaoqiang Luo, Vittorio Castelli
Chinese character-based parser

Patent number: 7464024

Abstract: A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.

Type: Grant

Filed: April 16, 2004

Date of Patent: December 9, 2008

Assignee: International Business Machines Corporation

Inventors: Xiaoqiang Luo, Robert Todd Ward
Mention-Synchronous Entity Tracking: System and Method for Chaining Mentions

Publication number: 20080243888

Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.

Type: Application

Filed: May 5, 2008

Publication date: October 2, 2008

Inventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim E. Roukos
Mention-synchronous entity tracking system and method for chaining mentions

Patent number: 7398274

Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.

Type: Grant

Filed: April 27, 2004

Date of Patent: July 8, 2008

Assignee: International Business Machines Corporation

Inventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim E. Roukos
Adaptation of statistical parsers based on mathematical transform

Patent number: 7308400

Abstract: An arrangement for adapting statistical parsers to new data using a mathematical transform, particularly a Markov transform. In particular, it is assumed that an initial statistical parser is available and a batch of new data is given. The initial model is mapped to a new model by a Markov matrix, each of whose rows sums to one. In the unsupervised setup, where “true” parses are missing, the transform matrix is obtained by maximizing the log likelihood of the parses of test data decoded using the model before adaptation. The proposed algorithm can be applied to supervised adaptation, as well.

Type: Grant

Filed: December 14, 2000

Date of Patent: December 11, 2007

Assignee: International Business Machines Corporation

Inventors: Xiaoqiang Luo, Salim E. Roukos, Robert T. Ward
Mention-synchronous entity tracking system and method for chaining mentions

Publication number: 20050237227

Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.

Type: Application

Filed: April 27, 2004

Publication date: October 27, 2005

Applicant: International Business Machines Corporation

Inventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim Roukos
Chinese character-based parser

Publication number: 20050234707

Abstract: A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.

Type: Application

Filed: April 16, 2004

Publication date: October 20, 2005

Applicant: International Business Machines Corporation

Inventors: Xiaoqiang Luo, Robert Ward
System and method for rapid development of natural language understanding using active learning

Publication number: 20040111253

Abstract: A method, computer program product, and data processing system for training a statistical parser by utilizing active learning techniques to reduce the size of the corpus of human-annotated training samples (e.g., sentences) needed is disclosed. According to a preferred embodiment of the present invention, the statistical parser under training is used to compare the grammatical structure of the samples according to the parser's current level of training. The samples are then divided into clusters, with each cluster representing samples having a similar structure as ascertained by the statistical parser. Uncertainty metrics are applied to the clustered samples to select samples from each cluster that reflect uncertainty in the statistical parser's grammatical model. These selected samples may then be annotated by a human trainer for training the statistical parser.

Type: Application

Filed: December 10, 2002

Publication date: June 10, 2004

Applicant: International Business Machines Corporation

Inventors: Xiaoqiang Luo, Salim Roukos, Min Tang

1 2 next