Patents by Inventor Jan O. Pedersen

Jan O. Pedersen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190303375
    Abstract: A new architecture is provided to support a precise information retrieval system on a web scale. The architecture provides algorithms to generate candidates and select the top N results via ranking models (e.g., Semantic ranking models, Aggregation ranking models) to capture term relationships between query and result contents at search-time.
    Type: Application
    Filed: May 23, 2016
    Publication date: October 3, 2019
    Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Jing BAI, Yue-Sheng LIU, Jan O. PEDERSEN, Mao YANG, Qi LU
  • Patent number: 6973423
    Abstract: A processor implemented method of identifying the text genre of a machine-readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.
    Type: Grant
    Filed: June 18, 1998
    Date of Patent: December 6, 2005
    Assignee: Xerox Corporation
    Inventors: Geoffrey D. Nunberg, Hinrich Schuetze, Jan O. Pedersen, Brett L. Kessler, Gregory Grefenstette
  • Patent number: 6505150
    Abstract: A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.
    Type: Grant
    Filed: June 18, 1998
    Date of Patent: January 7, 2003
    Assignee: Xerox Corporation
    Inventors: Geoffrey D. Nunberg, Hinrich Schuetze, Jan O. Pedersen, Brett L. Kessler
  • Publication number: 20020002450
    Abstract: A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.
    Type: Application
    Filed: June 18, 1998
    Publication date: January 3, 2002
    Applicant: Xerox Corp.
    Inventors: GEOFFREY D. NUNBERG, HINRICH SCHUETZE, JAN O. PEDERSEN, BRETT L. KESSLER
  • Patent number: 5999927
    Abstract: The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations. The present invention is further described in terms of two possible methods for expanding document clusters so as to achieve the overlap, and a method for increasing precision through the use of the overlapped clusters.
    Type: Grant
    Filed: April 24, 1998
    Date of Patent: December 7, 1999
    Assignee: Xerox Corporation
    Inventors: John W. Tukey, Jan O. Pedersen
  • Patent number: 5918240
    Abstract: A method of automatically generating document extracts. The method makes use of feature value probabilities generated from a statistical analysis of manually generated summaries to extract the same set of sentences an expert might. The method is based upon an iterative approach. First, the computer system designates a sentence of the document as a selected sentence. Second, the computer system determine values for the selected sentence of each feature of a feature set. Third, the computer system increases a score for the selected sentence based upon the value of the feature for the selected sentence and upon the probability associated with that value. Fourth, after scoring all of the sentences of the document the computer system, the computer system selects a subset of the highest scoring sentences to be extracted.
    Type: Grant
    Filed: June 28, 1995
    Date of Patent: June 29, 1999
    Assignee: Xerox Corporation
    Inventors: Julian M. Kupiec, Jan O. Pedersen, Francine R. Chen, Daniel C. Brotsky, Steven B. Putz
  • Patent number: 5911140
    Abstract: A method of automatically ordering the presentation of documents clusters generated from a ranked corpus of documents. First, the corpus is ordered into a plurality of clusters. Next, a rank is determined for each cluster based upon the rank of a document within that cluster. Afterward, the clusters are presented to a computer user in the order determined by their rank.
    Type: Grant
    Filed: December 14, 1995
    Date of Patent: June 8, 1999
    Assignee: Xerox Corporation
    Inventors: John W. Tukey, Jan O. Pedersen
  • Patent number: 5787420
    Abstract: A computerized method of ordering document clusters for presentation after browsing a corpus of documents that presents document clusters in a logical fashion in the absence of any indication of the computer user's interests. The method begins by grouping the corpus into a plurality of clusters, each having a centroid and including at least one document. Next, for each cluster a degree of similarity between that cluster and every other cluster is by finding a dot product between each cluster centroid and every other cluster centroid. The similarity information is then used to determine an order of presentation for the plurality of in a way that maximizes the degree of similarity between adjacent clusters.
    Type: Grant
    Filed: December 14, 1995
    Date of Patent: July 28, 1998
    Assignee: Xerox Corporation
    Inventors: John W. Tukey, Jan O. Pedersen
  • Patent number: 5787422
    Abstract: The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations. The present invention is further described in terms of two possible methods for expanding document clusters so as to achieve the overlap, and a method for increasing precision through the use of the overlapped clusters.
    Type: Grant
    Filed: January 11, 1996
    Date of Patent: July 28, 1998
    Assignee: Xerox Corporation
    Inventors: John W. Tukey, Jan O. Pedersen
  • Patent number: 5778397
    Abstract: A method of automatically generating feature probabilities that allow later automatic generation of document extracts. The computer system generates the probabilities by analyzing each document a document at a time. First, the computer system designates one of the documents as a selected document. Next, the computer system analyzes each sentence of the selected document to determine the value of the paragraph feature and the value of the uppercase feature. The computer system repeats this effort for each document of the document corpus. Afterward, the number of occurrences of each value of each feature is calculated and is used to calculate feature value probabilities for all of the features.
    Type: Grant
    Filed: June 28, 1995
    Date of Patent: July 7, 1998
    Assignee: Xerox Corporation
    Inventors: Julian M. Kupiec, Jan O. Pedersen, Francine R. Chen, Daniel C. Brotsky, Steven B. Putz
  • Patent number: 5625554
    Abstract: The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The techniques described apply generally across the languages of the world and are not just limited to simple suffixing languages like English. Although the resulting transducers can have many states and transitions or arcs, they can be compacted by finite-state compression algorithms so that they can be used effectively in resource-limited applications.
    Type: Grant
    Filed: July 20, 1992
    Date of Patent: April 29, 1997
    Assignee: Xerox Corporation
    Inventors: Douglass R. Cutting, Per-Kristian G. Halvorsen, Ronald M. Kaplan, Lauri Karttunen, Martin Kay, Jan O. Pedersen
  • Patent number: 5486686
    Abstract: Machine readable electronic domain definitions of part or all of the electronic domain descriptions of hardcopy documents and/or of part or all of the transforms that are performed to produce and reproduce such hardcopies documents are encoded in codes that are printed on such documents, thereby permitting the electronic domain descriptions of such documents and/or such transforms to be recovered more robustly and reliably when the information carried by such documents is transformed from the hardcopy domain to the electronic domain.
    Type: Grant
    Filed: May 18, 1992
    Date of Patent: January 23, 1996
    Assignee: Xerox Corporation
    Inventors: Frank Zdybel, Jr., Henry W. Sang, Jr., Jan O. Pedersen, Z. E. Smith, III, D. A. Henderson, Jr., David L. Hecht, Dan S. Bloomberg
  • Patent number: 5483650
    Abstract: Arbitrarily large document collections are processed by expanding a focus set having at least one initial metadocument into a plurality of subsequent metadocuments. The number of subsequent metadocuments is approximately equal to a predetermined maximum number. The subsequent metadocuments are then clustered into a predetermined number of new metadocuments, which are summarized and presented to a user. The focus set is redefined to include only user-selected new metadocuments.
    Type: Grant
    Filed: June 21, 1993
    Date of Patent: January 9, 1996
    Assignee: Xerox Corporation
    Inventors: Jan O. Pedersen, David R. Karger, Douglass R. Cutting
  • Patent number: 5442778
    Abstract: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.
    Type: Grant
    Filed: November 12, 1991
    Date of Patent: August 15, 1995
    Assignee: Xerox Corporation
    Inventors: Jan. O. Pedersen, David Karger, Douglass R. Cutting, John W. Tukey
  • Patent number: 5278980
    Abstract: An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the "new" content words.
    Type: Grant
    Filed: August 16, 1991
    Date of Patent: January 11, 1994
    Assignee: Xerox Corporation
    Inventors: Jan O. Pedersen, Per-Kristian Halvorsen, Douglass R. Cutting, John W. Tukey, Eric A. Bier, Daniel G. Bobrow