Patents by Inventor Xiaoqiang Luo
Xiaoqiang Luo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20220253540Abstract: Techniques for securely storing and processing data for training data generation are provided. In one technique, multiple encrypted records are retrieved from a first persistent storage. For each encrypted record, that record is decrypted in memory to generate a decrypted record that comprises multiple attribute values. Then, based on the attribute values and a definition of multiple features of a machine-learned model, multiple feature values are generated and stored, along with a label, in a training instance, which is then stored in a second persistent storage. One or more machine learning techniques are used to train the machine-learned model based on training data that includes the training instances that are stored in the second persistent storage.Type: ApplicationFiled: February 5, 2021Publication date: August 11, 2022Inventors: Yunpeng XU, Tianhao LU, Xiaoqiang LUO, Jiashuo WANG, Chencheng WU
-
Patent number: 10698964Abstract: A method for automatically extracting and organizing information by a processing device from a plurality of data sources is provided. A natural language processing information extraction pipeline that includes an automatic detection of entities is applied to the data sources. Information about detected entities is identified by analyzing products of the natural language processing pipeline. Identified information is grouped into equivalence classes containing equivalent information. At least one displayable representation of the equivalence classes is created. An order in which the at least one displayable representation is displayed is computed. A combined representation of the equivalence classes that respects the order in which the displayable representation is displayed is produced.Type: GrantFiled: January 30, 2017Date of Patent: June 30, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Vittorio Castelli, Radu Florian, Xiaoqiang Luo, Hema Raghavan
-
Publication number: 20200097605Abstract: A system and method are provided for automatic identification, extraction, and validation of data pertaining to receiving entity events (REE). Feature (or attribute) values associated with web content are identified. The web content may contain news and features on current/past affairs. The identified feature values are considered by a rule-based or a machine-learned model and, based upon output of the model, a determination as to whether the set of data comprises a REE is made. If the determination is positive, then multiple data items are extracted from the set of data and, optionally, from other data from the source.Type: ApplicationFiled: September 25, 2018Publication date: March 26, 2020Inventors: Jingyuan Liu, Xiaoqiang Luo, Tzu Ming Kuo, Marcello Oliva, Yunpeng Xu
-
Publication number: 20190197176Abstract: Techniques for identifying relationships between entities using machine learning are disclosed herein.Type: ApplicationFiled: December 21, 2017Publication date: June 27, 2019Inventors: Xiaoqiang Luo, Yunpeng Xu, Marcello Oliva
-
Publication number: 20170140057Abstract: A method for automatically extracting and organizing information by a processing device from a plurality of data sources is provided. A natural language processing information extraction pipeline that includes an automatic detection of entities is applied to the data sources. Information about detected entities is identified by analyzing products of the natural language processing pipeline. Identified information is grouped into equivalence classes containing equivalent information. At least one displayable representation of the equivalence classes is created. An order in which the at least one displayable representation is displayed is computed. A combined representation of the equivalence classes that respects the order in which the displayable representation is displayed is produced.Type: ApplicationFiled: January 30, 2017Publication date: May 18, 2017Inventors: VITTORIO CASTELLI, Radu Florian, Xiaoqiang Luo, Hema Raghavan
-
Patent number: 9471559Abstract: Creating training data for a natural language processing system may comprise obtaining natural language input, the natural language input annotated with one or more important phrases; and generating training instances comprising a syntactic parse tree of nodes representing elements of the natural language input augmented with the annotated important phrases. In another aspect, a classifier may be trained based on the generated training instances. The classifier may be used to predict one or more potential important phrases in a query.Type: GrantFiled: March 15, 2013Date of Patent: October 18, 2016Assignee: International Business Machines CorporationInventors: Vittorio Castelli, Radu Florian, Xiaoqiang Luo, Sameer Maskey, Hema Raghavan
-
Patent number: 8903707Abstract: A method, an apparatus and an article of manufacture for determining a dropped pronoun from a source language. The method includes collecting parallel sentences from a source and a target language, creating at least one word alignment between the parallel sentences in the source and the target language, mapping at least one pronoun from the target language sentence onto the source language sentence, computing at least one feature from the mapping, wherein the at least one feature is extracted from both the source language and the at least one pronoun projected from the target language, and using the at least one feature to train a classifier to predict position and spelling of at least one pronoun in the target language when the at least one pronoun is dropped in the source language.Type: GrantFiled: January 12, 2012Date of Patent: December 2, 2014Assignee: International Business Machines CorporationInventors: Bing Zhao, Imed Zitouni, Xiaoqiang Luo, Vittorio Castelli
-
Publication number: 20140163962Abstract: Creating training data for a natural language processing system may comprise obtaining natural language input, the natural language input annotated with one or more important phrases; and generating training instances comprising a syntactic parse tree of nodes representing elements of the natural language input augmented with the annotated important phrases. In another aspect, a classifier may be trained based on the generated training instances. The classifier may be used to predict one or more potential important phrases in a query.Type: ApplicationFiled: March 15, 2013Publication date: June 12, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Vittorio Castelli, Radu Florian, Xiaoqiang Luo, Sameer Maskey, Hema Raghavan
-
Patent number: 8620961Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.Type: GrantFiled: May 5, 2008Date of Patent: December 31, 2013Assignee: International Business Machines CorporationInventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim E. Roukos
-
Publication number: 20130332450Abstract: A method for automatically extracting and organizing information by a processing device from a plurality of data sources is provided. A natural language processing information extraction pipeline that includes an automatic detection of entities is applied to the data sources. Information about detected entities is identified by analyzing products of the natural language processing pipeline. Identified information is grouped into equivalence classes containing equivalent information. At least one displayable representation of the equivalence classes is created. An order in which the at least one displayable representation is displayed is computed. A combined representation of the equivalence classes that respects the order in which the displayable representation is displayed is produced.Type: ApplicationFiled: June 11, 2012Publication date: December 12, 2013Applicant: International Business Machines CorporationInventors: Vittorio Castelli, Radu Florian, Xiaoqiang Luo, Hema Raghavan
-
Publication number: 20130185049Abstract: A method, an apparatus and an article of manufacture for determining a dropped pronoun from a source language. The method includes collecting parallel sentences from a source and a target language, creating at least one word alignment between the parallel sentences in the source and the target language, mapping at least one pronoun from the target language sentence onto the source language sentence, computing at least one feature from the mapping, wherein the at least one feature is extracted from both the source language and the at least one pronoun projected from the target language, and using the at least one feature to train a classifier to predict position and spelling of at least one pronoun in the target language when the at least one pronoun is dropped in the source language.Type: ApplicationFiled: January 12, 2012Publication date: July 18, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bing Zhao, Imed Zitouni, Xiaoqiang Luo, Vittorio Castelli
-
Patent number: 7464024Abstract: A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.Type: GrantFiled: April 16, 2004Date of Patent: December 9, 2008Assignee: International Business Machines CorporationInventors: Xiaoqiang Luo, Robert Todd Ward
-
Publication number: 20080243888Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.Type: ApplicationFiled: May 5, 2008Publication date: October 2, 2008Inventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim E. Roukos
-
Patent number: 7398274Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.Type: GrantFiled: April 27, 2004Date of Patent: July 8, 2008Assignee: International Business Machines CorporationInventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim E. Roukos
-
Patent number: 7308400Abstract: An arrangement for adapting statistical parsers to new data using a mathematical transform, particularly a Markov transform. In particular, it is assumed that an initial statistical parser is available and a batch of new data is given. The initial model is mapped to a new model by a Markov matrix, each of whose rows sums to one. In the unsupervised setup, where “true” parses are missing, the transform matrix is obtained by maximizing the log likelihood of the parses of test data decoded using the model before adaptation. The proposed algorithm can be applied to supervised adaptation, as well.Type: GrantFiled: December 14, 2000Date of Patent: December 11, 2007Assignee: International Business Machines CorporationInventors: Xiaoqiang Luo, Salim E. Roukos, Robert T. Ward
-
Publication number: 20050237227Abstract: A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.Type: ApplicationFiled: April 27, 2004Publication date: October 27, 2005Applicant: International Business Machines CorporationInventors: Abraham Ittycheriah, Hongyan Jing, Nandakishore Kambhatla, Xiaoqiang Luo, Salim Roukos
-
Publication number: 20050234707Abstract: A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.Type: ApplicationFiled: April 16, 2004Publication date: October 20, 2005Applicant: International Business Machines CorporationInventors: Xiaoqiang Luo, Robert Ward
-
Publication number: 20040111253Abstract: A method, computer program product, and data processing system for training a statistical parser by utilizing active learning techniques to reduce the size of the corpus of human-annotated training samples (e.g., sentences) needed is disclosed. According to a preferred embodiment of the present invention, the statistical parser under training is used to compare the grammatical structure of the samples according to the parser's current level of training. The samples are then divided into clusters, with each cluster representing samples having a similar structure as ascertained by the statistical parser. Uncertainty metrics are applied to the clustered samples to select samples from each cluster that reflect uncertainty in the statistical parser's grammatical model. These selected samples may then be annotated by a human trainer for training the statistical parser.Type: ApplicationFiled: December 10, 2002Publication date: June 10, 2004Applicant: International Business Machines CorporationInventors: Xiaoqiang Luo, Salim Roukos, Min Tang
-
Publication number: 20020111793Abstract: An arrangement for adapting statistical parsers to new data using a mathematical transform, particularly a Markov transform. In particular, it is assumed that an initial statistical parser is available and a batch of new data is given. The initial model is mapped to a new model by a Markov matrix, each of whose rows sums to one. In the unsupervised setup, where “true” parses are missing, the transform matrix is obtained by maximizing the log likelihood of the parses of test data decoded using the model before adaptation. The proposed algorithm can be applied to supervised adaptation, as well.Type: ApplicationFiled: December 14, 2000Publication date: August 15, 2002Applicant: IBM CorporationInventors: Xiaoqiang Luo, Salim E. Roukos, Robert T. Ward