Patents by Inventor Francine R. Chen

Francine R. Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Patent number: 7188117

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.

Type: Grant

Filed: September 3, 2002

Date of Patent: March 6, 2007

Assignee: Xerox Corporation

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Patent number: 7167871

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.

Type: Grant

Filed: September 3, 2002

Date of Patent: January 23, 2007

Assignee: Xerox Corporation

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
User profile classification by web usage analysis

Patent number: 7162522

Abstract: Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.

Type: Grant

Filed: November 2, 2001

Date of Patent: January 9, 2007

Assignee: Xerox Corporation

Inventors: Eytan Adar, Lada A. Adamic, Francine R. Chen
Systems and methods for determining the topic structure of a portion of text

Patent number: 7130837

Abstract: Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to “topics”, latent variables in the PLSA model, and “topics” to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.

Type: Grant

Filed: March 22, 2002

Date of Patent: October 31, 2006

Assignee: Xerox Corporation

Inventors: Ioannis Tsochantaridis, Thorsten H. Brants, Francine R. Chen
Systems and methods for displaying interactive topic-based text summaries

Patent number: 7117437

Abstract: Techniques for displaying interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text to provide contextualized access to an interactive topic-based text summary and to an original text.

Type: Grant

Filed: December 16, 2002

Date of Patent: October 3, 2006

Assignee: Palo Alto Research Center Incorporated

Inventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
Method and apparatus for clustering hierarchically related information

Patent number: 7007069

Abstract: A method for partitioning a tree-structured discussion or other tree structured collections of texts into clusters dealing with identifiable subtopics, if such subtopics exist, or into manageable partitions if not. Each document is represented by a vector and is initially placed in a cluster containing only that document. Then a sequence of cluster combinations is performed, at each step combining the most similar two clusters, where the most similar two clusters are the clusters related by the most similar pair of document vectors, into a new cluster. The process can be halted before all clusters are combined based on application-specific criteria.

Type: Grant

Filed: December 16, 2002

Date of Patent: February 28, 2006

Assignee: Palo Alto Research Center Inc.

Inventors: Paula S. Newman, Francine R. Chen
System and method for identifying similarities among objects in a collection

Patent number: 6941321

Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.

Type: Grant

Filed: October 19, 1999

Date of Patent: September 6, 2005

Assignee: Xerox Corporation

Inventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li
System and method for quantitatively representing data objects in vector space

Patent number: 6922699

Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.

Type: Grant

Filed: October 19, 1999

Date of Patent: July 26, 2005

Assignee: Xerox Corporation

Inventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li, Ullas Gargi
Systems and methods for interactive topic-based text summarization

Publication number: 20040122657

Abstract: Techniques for determining interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text provides contextualized access to an interactive topic-based text summary and to an original text.

Type: Application

Filed: December 16, 2002

Publication date: June 24, 2004

Inventors: Thorsten H. Brants, Francine R. Chen, Annie E. Zaenen
Method and apparatus for clustering hierarchically related information

Publication number: 20040117448

Abstract: A method for partitioning a tree-structured discussion or other tree structured collections of texts into clusters dealing with identifiable subtopics, if such subtopics exist, or into manageable partitions if not. Each document is represented by a vector and is initially placed in a cluster containing only that-document. Then a sequence of cluster combinations is performed, at each step combining the most similar two clusters, where the most similar two clusters are the clusters related by the most similar pair of document vectors, into a new cluster. The process can be halted before all clusters are combined based on application-specific criteria.

Type: Application

Filed: December 16, 2002

Publication date: June 17, 2004

Applicant: Palo Alto Research Center, Incorporated

Inventors: Paula S. Newman, Francine R. Chen
Systems and methods for displaying interactive topic-based text summaries

Publication number: 20040117740

Abstract: Techniques for displaying interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text to provide contextualized access to an interactive topic-based text summary and to an original text.

Type: Application

Filed: December 16, 2002

Publication date: June 17, 2004

Inventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
Systems and methods for sentence based interactive topic-based text summarization

Publication number: 20040117725

Abstract: Techniques for determining sentence based interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text.

Type: Application

Filed: December 16, 2002

Publication date: June 17, 2004

Inventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
System and method for information browsing using multi-modal features

Patent number: 6728752

Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.

Type: Grant

Filed: October 19, 1999

Date of Patent: April 27, 2004

Assignee: Xerox Corporation

Inventors: Francine R. Chen, Hinrich Schuetze, Ullas Gargi
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Publication number: 20030226100

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.

Type: Application

Filed: September 3, 2002

Publication date: December 4, 2003

Applicant: XEROX CORPORATION

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Publication number: 20030225750

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.

Type: Application

Filed: September 3, 2002

Publication date: December 4, 2003

Applicant: XEROX CORPORATION

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections

Publication number: 20030221166

Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of ran-ordered lists with numerically-ordered lists.

Type: Application

Filed: September 3, 2002

Publication date: November 27, 2003

Applicant: XEROX CORPORATION

Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
Systems and methods for determining the topic structure of a portion of text

Publication number: 20030182631

Abstract: Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to “topics”, latent variables in the PLSA model, and “topics” to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.

Type: Application

Filed: March 22, 2002

Publication date: September 25, 2003

Applicant: XEROX CORPORATION

Inventors: Ioannis Tsochantaridis, Thorsten H. Brants, Francine R. Chen
User profile classification by web usage analysis

Publication number: 20030101024

Abstract: Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.

Type: Application

Filed: November 2, 2001

Publication date: May 29, 2003

Inventors: Eytan Adar, Lada A. Adamic, Francine R. Chen
SYSTEM AND METHOD FOR QUANTITATIVELY REPRESENTING DATA OBJECTS IN VECTOR SPACE

Publication number: 20030074368

Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.

Type: Application

Filed: October 19, 1999

Publication date: April 17, 2003

Inventors: HINRICH SCHUETZE, FRANCINE R. CHEN, PETER L. PIROLLI, JAMES E. PITKOW, ED H. CHI, JUN LI, ULLAS GARGI
SYSTEM AND METHOD FOR IDENTIFYING SIMILARITIES AMONG OBJECTS IN A COLLECTION

Publication number: 20030074369

Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.

Type: Application

Filed: October 19, 1999

Publication date: April 17, 2003

Inventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li

prev 1 2 3 next