Patents by Inventor Geoffrey D. Nunberg

Geoffrey D. Nunberg has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Patent number: 7188117

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.

Type: Grant

Filed: September 3, 2002

Date of Patent: March 6, 2007

Assignee: Xerox Corporation

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Patent number: 7167871

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.

Type: Grant

Filed: September 3, 2002

Date of Patent: January 23, 2007

Assignee: Xerox Corporation

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Article and method of automatically determining text genre using surface features of untagged texts

Patent number: 6973423

Abstract: A processor implemented method of identifying the text genre of a machine-readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.

Type: Grant

Filed: June 18, 1998

Date of Patent: December 6, 2005

Assignee: Xerox Corporation

Inventors: Geoffrey D. Nunberg, Hinrich Schuetze, Jan O. Pedersen, Brett L. Kessler, Gregory Grefenstette
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Publication number: 20030225750

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.

Type: Application

Filed: September 3, 2002

Publication date: December 4, 2003

Applicant: XEROX CORPORATION

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Publication number: 20030226100

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.

Type: Application

Filed: September 3, 2002

Publication date: December 4, 2003

Applicant: XEROX CORPORATION

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Publication number: 20030221166

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of ran-ordered lists with numerically-ordered lists.

Type: Application

Filed: September 3, 2002

Publication date: November 27, 2003

Applicant: XEROX CORPORATION

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Article and method of automatically filtering information retrieval results using test genre

Patent number: 6505150

Abstract: A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.

Type: Grant

Filed: June 18, 1998

Date of Patent: January 7, 2003

Assignee: Xerox Corporation

Inventors: Geoffrey D. Nunberg, Hinrich Schuetze, Jan O. Pedersen, Brett L. Kessler
ARTICLE AND METHOD OF AUTOMATICALLY FILTERING INFORMATION RETRIEVAL RESULTS USING TEXT GENRE

Publication number: 20020002450

Abstract: A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.

Type: Application

Filed: June 18, 1998

Publication date: January 3, 2002

Applicant: Xerox Corp.

Inventors: GEOFFREY D. NUNBERG, HINRICH SCHUETZE, JAN O. PEDERSEN, BRETT L. KESSLER
Processing natural language text using autonomous punctuational structure

Patent number: 5111398

Abstract: A technique for processing natural language text uses a data structure that includes structure data in the text data. The structure data indicates an autonomous punctuational structure of the text, a punctuational structure that is independent of the lexical content of the text and therefore can be manipulated without considering the meaning of the words in the text. The data structure can be a tree in which each node has a textual type such as a paragraph, sentence, clause, phrase, or word. The data structure could alternatively be parallel data sequences, one with codes indicating the text's characters and the other with codes indicating textual types. The data structure is produced and maintained using a grammar of textual types, indicating for each textual type the textual types of units into which it can properly be divided. During editing, a text sequence is generated by applying rendering rules to the data structure, and the text is presented to the user based on the text sequence.

Type: Grant

Filed: November 21, 1988

Date of Patent: May 5, 1992

Assignee: Xerox Corporation

Inventors: Geoffrey D. Nunberg, H. Tayloe Stansbury, Curtis Abbott, Brian C. Smith