Patents by Inventor Francine R. Chen
Francine R. Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7188117Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.Type: GrantFiled: September 3, 2002Date of Patent: March 6, 2007Assignee: Xerox CorporationInventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
-
Patent number: 7167871Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.Type: GrantFiled: September 3, 2002Date of Patent: January 23, 2007Assignee: Xerox CorporationInventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
-
Patent number: 7162522Abstract: Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.Type: GrantFiled: November 2, 2001Date of Patent: January 9, 2007Assignee: Xerox CorporationInventors: Eytan Adar, Lada A. Adamic, Francine R. Chen
-
Patent number: 7130837Abstract: Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to “topics”, latent variables in the PLSA model, and “topics” to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.Type: GrantFiled: March 22, 2002Date of Patent: October 31, 2006Assignee: Xerox CorporationInventors: Ioannis Tsochantaridis, Thorsten H. Brants, Francine R. Chen
-
Patent number: 7117437Abstract: Techniques for displaying interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text to provide contextualized access to an interactive topic-based text summary and to an original text.Type: GrantFiled: December 16, 2002Date of Patent: October 3, 2006Assignee: Palo Alto Research Center IncorporatedInventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
-
Patent number: 7007069Abstract: A method for partitioning a tree-structured discussion or other tree structured collections of texts into clusters dealing with identifiable subtopics, if such subtopics exist, or into manageable partitions if not. Each document is represented by a vector and is initially placed in a cluster containing only that document. Then a sequence of cluster combinations is performed, at each step combining the most similar two clusters, where the most similar two clusters are the clusters related by the most similar pair of document vectors, into a new cluster. The process can be halted before all clusters are combined based on application-specific criteria.Type: GrantFiled: December 16, 2002Date of Patent: February 28, 2006Assignee: Palo Alto Research Center Inc.Inventors: Paula S. Newman, Francine R. Chen
-
Patent number: 6941321Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.Type: GrantFiled: October 19, 1999Date of Patent: September 6, 2005Assignee: Xerox CorporationInventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li
-
Patent number: 6922699Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.Type: GrantFiled: October 19, 1999Date of Patent: July 26, 2005Assignee: Xerox CorporationInventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li, Ullas Gargi
-
Publication number: 20040122657Abstract: Techniques for determining interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text provides contextualized access to an interactive topic-based text summary and to an original text.Type: ApplicationFiled: December 16, 2002Publication date: June 24, 2004Inventors: Thorsten H. Brants, Francine R. Chen, Annie E. Zaenen
-
Publication number: 20040117448Abstract: A method for partitioning a tree-structured discussion or other tree structured collections of texts into clusters dealing with identifiable subtopics, if such subtopics exist, or into manageable partitions if not. Each document is represented by a vector and is initially placed in a cluster containing only that-document. Then a sequence of cluster combinations is performed, at each step combining the most similar two clusters, where the most similar two clusters are the clusters related by the most similar pair of document vectors, into a new cluster. The process can be halted before all clusters are combined based on application-specific criteria.Type: ApplicationFiled: December 16, 2002Publication date: June 17, 2004Applicant: Palo Alto Research Center, IncorporatedInventors: Paula S. Newman, Francine R. Chen
-
Publication number: 20040117740Abstract: Techniques for displaying interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text to provide contextualized access to an interactive topic-based text summary and to an original text.Type: ApplicationFiled: December 16, 2002Publication date: June 17, 2004Inventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
-
Publication number: 20040117725Abstract: Techniques for determining sentence based interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text.Type: ApplicationFiled: December 16, 2002Publication date: June 17, 2004Inventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
-
Patent number: 6728752Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.Type: GrantFiled: October 19, 1999Date of Patent: April 27, 2004Assignee: Xerox CorporationInventors: Francine R. Chen, Hinrich Schuetze, Ullas Gargi
-
Publication number: 20030226100Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.Type: ApplicationFiled: September 3, 2002Publication date: December 4, 2003Applicant: XEROX CORPORATIONInventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
-
Publication number: 20030225750Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.Type: ApplicationFiled: September 3, 2002Publication date: December 4, 2003Applicant: XEROX CORPORATIONInventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
-
Publication number: 20030221166Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of ran-ordered lists with numerically-ordered lists.Type: ApplicationFiled: September 3, 2002Publication date: November 27, 2003Applicant: XEROX CORPORATIONInventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
-
Publication number: 20030182631Abstract: Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to “topics”, latent variables in the PLSA model, and “topics” to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.Type: ApplicationFiled: March 22, 2002Publication date: September 25, 2003Applicant: XEROX CORPORATIONInventors: Ioannis Tsochantaridis, Thorsten H. Brants, Francine R. Chen
-
Publication number: 20030101024Abstract: Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.Type: ApplicationFiled: November 2, 2001Publication date: May 29, 2003Inventors: Eytan Adar, Lada A. Adamic, Francine R. Chen
-
Publication number: 20030074368Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.Type: ApplicationFiled: October 19, 1999Publication date: April 17, 2003Inventors: HINRICH SCHUETZE, FRANCINE R. CHEN, PETER L. PIROLLI, JAMES E. PITKOW, ED H. CHI, JUN LI, ULLAS GARGI
-
Publication number: 20030074369Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.Type: ApplicationFiled: October 19, 1999Publication date: April 17, 2003Inventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li