Patents by Inventor Francine R. Chen

Francine R. Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7188117
    Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.
    Type: Grant
    Filed: September 3, 2002
    Date of Patent: March 6, 2007
    Assignee: Xerox Corporation
    Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
  • Patent number: 7167871
    Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.
    Type: Grant
    Filed: September 3, 2002
    Date of Patent: January 23, 2007
    Assignee: Xerox Corporation
    Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
  • Patent number: 7162522
    Abstract: Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.
    Type: Grant
    Filed: November 2, 2001
    Date of Patent: January 9, 2007
    Assignee: Xerox Corporation
    Inventors: Eytan Adar, Lada A. Adamic, Francine R. Chen
  • Patent number: 7130837
    Abstract: Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to “topics”, latent variables in the PLSA model, and “topics” to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.
    Type: Grant
    Filed: March 22, 2002
    Date of Patent: October 31, 2006
    Assignee: Xerox Corporation
    Inventors: Ioannis Tsochantaridis, Thorsten H. Brants, Francine R. Chen
  • Patent number: 7117437
    Abstract: Techniques for displaying interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text to provide contextualized access to an interactive topic-based text summary and to an original text.
    Type: Grant
    Filed: December 16, 2002
    Date of Patent: October 3, 2006
    Assignee: Palo Alto Research Center Incorporated
    Inventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
  • Patent number: 7007069
    Abstract: A method for partitioning a tree-structured discussion or other tree structured collections of texts into clusters dealing with identifiable subtopics, if such subtopics exist, or into manageable partitions if not. Each document is represented by a vector and is initially placed in a cluster containing only that document. Then a sequence of cluster combinations is performed, at each step combining the most similar two clusters, where the most similar two clusters are the clusters related by the most similar pair of document vectors, into a new cluster. The process can be halted before all clusters are combined based on application-specific criteria.
    Type: Grant
    Filed: December 16, 2002
    Date of Patent: February 28, 2006
    Assignee: Palo Alto Research Center Inc.
    Inventors: Paula S. Newman, Francine R. Chen
  • Patent number: 6941321
    Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
    Type: Grant
    Filed: October 19, 1999
    Date of Patent: September 6, 2005
    Assignee: Xerox Corporation
    Inventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li
  • Patent number: 6922699
    Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
    Type: Grant
    Filed: October 19, 1999
    Date of Patent: July 26, 2005
    Assignee: Xerox Corporation
    Inventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li, Ullas Gargi
  • Publication number: 20040122657
    Abstract: Techniques for determining interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text provides contextualized access to an interactive topic-based text summary and to an original text.
    Type: Application
    Filed: December 16, 2002
    Publication date: June 24, 2004
    Inventors: Thorsten H. Brants, Francine R. Chen, Annie E. Zaenen
  • Publication number: 20040117448
    Abstract: A method for partitioning a tree-structured discussion or other tree structured collections of texts into clusters dealing with identifiable subtopics, if such subtopics exist, or into manageable partitions if not. Each document is represented by a vector and is initially placed in a cluster containing only that-document. Then a sequence of cluster combinations is performed, at each step combining the most similar two clusters, where the most similar two clusters are the clusters related by the most similar pair of document vectors, into a new cluster. The process can be halted before all clusters are combined based on application-specific criteria.
    Type: Application
    Filed: December 16, 2002
    Publication date: June 17, 2004
    Applicant: Palo Alto Research Center, Incorporated
    Inventors: Paula S. Newman, Francine R. Chen
  • Publication number: 20040117740
    Abstract: Techniques for displaying interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text to provide contextualized access to an interactive topic-based text summary and to an original text.
    Type: Application
    Filed: December 16, 2002
    Publication date: June 17, 2004
    Inventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
  • Publication number: 20040117725
    Abstract: Techniques for determining sentence based interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and/or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text.
    Type: Application
    Filed: December 16, 2002
    Publication date: June 17, 2004
    Inventors: Francine R. Chen, Thorsten H. Brants, Annie E. Zaenen
  • Patent number: 6728752
    Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
    Type: Grant
    Filed: October 19, 1999
    Date of Patent: April 27, 2004
    Assignee: Xerox Corporation
    Inventors: Francine R. Chen, Hinrich Schuetze, Ullas Gargi
  • Publication number: 20030226100
    Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.
    Type: Application
    Filed: September 3, 2002
    Publication date: December 4, 2003
    Applicant: XEROX CORPORATION
    Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
  • Publication number: 20030225750
    Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.
    Type: Application
    Filed: September 3, 2002
    Publication date: December 4, 2003
    Applicant: XEROX CORPORATION
    Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
  • Publication number: 20030221166
    Abstract: Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of ran-ordered lists with numerically-ordered lists.
    Type: Application
    Filed: September 3, 2002
    Publication date: November 27, 2003
    Applicant: XEROX CORPORATION
    Inventors: Ayman O. Farahat, Francine R. Chen, Charles R. Mathis, Geoffrey D. Nunberg
  • Publication number: 20030182631
    Abstract: Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to “topics”, latent variables in the PLSA model, and “topics” to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.
    Type: Application
    Filed: March 22, 2002
    Publication date: September 25, 2003
    Applicant: XEROX CORPORATION
    Inventors: Ioannis Tsochantaridis, Thorsten H. Brants, Francine R. Chen
  • Publication number: 20030101024
    Abstract: Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.
    Type: Application
    Filed: November 2, 2001
    Publication date: May 29, 2003
    Inventors: Eytan Adar, Lada A. Adamic, Francine R. Chen
  • Publication number: 20030074368
    Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
    Type: Application
    Filed: October 19, 1999
    Publication date: April 17, 2003
    Inventors: HINRICH SCHUETZE, FRANCINE R. CHEN, PETER L. PIROLLI, JAMES E. PITKOW, ED H. CHI, JUN LI, ULLAS GARGI
  • Publication number: 20030074369
    Abstract: A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
    Type: Application
    Filed: October 19, 1999
    Publication date: April 17, 2003
    Inventors: Hinrich Schuetze, Francine R. Chen, Peter L. Pirolli, James E. Pitkow, Ed H. Chi, Jun Li