Patents by Inventor Jan O. Pedersen

Jan O. Pedersen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Relevant passage retrieval system

Patent number: 12174839

Abstract: A new architecture is provided to support a precise information retrieval system on a web scale. The architecture provides algorithms to generate candidates and select the top N results via ranking models (e.g., Semantic ranking models, Aggregation ranking models) to capture term relationships between query and result contents at search-time.

Type: Grant

Filed: May 23, 2016

Date of Patent: December 24, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jing Bai, Yue-Sheng Liu, Jan O. Pedersen, Mao Yang, Qi Lu
RELEVANT PASSAGE RETRIEVAL SYSTEM

Publication number: 20190303375

Abstract: A new architecture is provided to support a precise information retrieval system on a web scale. The architecture provides algorithms to generate candidates and select the top N results via ranking models (e.g., Semantic ranking models, Aggregation ranking models) to capture term relationships between query and result contents at search-time.

Type: Application

Filed: May 23, 2016

Publication date: October 3, 2019

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Jing BAI, Yue-Sheng LIU, Jan O. PEDERSEN, Mao YANG, Qi LU
Article and method of automatically determining text genre using surface features of untagged texts

Patent number: 6973423

Abstract: A processor implemented method of identifying the text genre of a machine-readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.

Type: Grant

Filed: June 18, 1998

Date of Patent: December 6, 2005

Assignee: Xerox Corporation

Inventors: Geoffrey D. Nunberg, Hinrich Schuetze, Jan O. Pedersen, Brett L. Kessler, Gregory Grefenstette
Article and method of automatically filtering information retrieval results using test genre

Patent number: 6505150

Abstract: A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.

Type: Grant

Filed: June 18, 1998

Date of Patent: January 7, 2003

Assignee: Xerox Corporation

Inventors: Geoffrey D. Nunberg, Hinrich Schuetze, Jan O. Pedersen, Brett L. Kessler
ARTICLE AND METHOD OF AUTOMATICALLY FILTERING INFORMATION RETRIEVAL RESULTS USING TEXT GENRE

Publication number: 20020002450

Abstract: A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.

Type: Application

Filed: June 18, 1998

Publication date: January 3, 2002

Applicant: Xerox Corp.

Inventors: GEOFFREY D. NUNBERG, HINRICH SCHUETZE, JAN O. PEDERSEN, BRETT L. KESSLER
Method and apparatus for information access employing overlapping clusters

Patent number: 5999927

Abstract: The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations. The present invention is further described in terms of two possible methods for expanding document clusters so as to achieve the overlap, and a method for increasing precision through the use of the overlapped clusters.

Type: Grant

Filed: April 24, 1998

Date of Patent: December 7, 1999

Assignee: Xerox Corporation

Inventors: John W. Tukey, Jan O. Pedersen
Automatic method of extracting summarization using feature probabilities

Patent number: 5918240

Abstract: A method of automatically generating document extracts. The method makes use of feature value probabilities generated from a statistical analysis of manually generated summaries to extract the same set of sentences an expert might. The method is based upon an iterative approach. First, the computer system designates a sentence of the document as a selected sentence. Second, the computer system determine values for the selected sentence of each feature of a feature set. Third, the computer system increases a score for the selected sentence based upon the value of the feature for the selected sentence and upon the probability associated with that value. Fourth, after scoring all of the sentences of the document the computer system, the computer system selects a subset of the highest scoring sentences to be extracted.

Type: Grant

Filed: June 28, 1995

Date of Patent: June 29, 1999

Assignee: Xerox Corporation

Inventors: Julian M. Kupiec, Jan O. Pedersen, Francine R. Chen, Daniel C. Brotsky, Steven B. Putz
Method of ordering document clusters given some knowledge of user interests

Patent number: 5911140

Abstract: A method of automatically ordering the presentation of documents clusters generated from a ranked corpus of documents. First, the corpus is ordered into a plurality of clusters. Next, a rank is determined for each cluster based upon the rank of a document within that cluster. Afterward, the clusters are presented to a computer user in the order determined by their rank.

Type: Grant

Filed: December 14, 1995

Date of Patent: June 8, 1999

Assignee: Xerox Corporation

Inventors: John W. Tukey, Jan O. Pedersen
Method of ordering document clusters without requiring knowledge of user interests

Patent number: 5787420

Abstract: A computerized method of ordering document clusters for presentation after browsing a corpus of documents that presents document clusters in a logical fashion in the absence of any indication of the computer user's interests. The method begins by grouping the corpus into a plurality of clusters, each having a centroid and including at least one document. Next, for each cluster a degree of similarity between that cluster and every other cluster is by finding a dot product between each cluster centroid and every other cluster centroid. The similarity information is then used to determine an order of presentation for the plurality of in a way that maximizes the degree of similarity between adjacent clusters.

Type: Grant

Filed: December 14, 1995

Date of Patent: July 28, 1998

Assignee: Xerox Corporation

Inventors: John W. Tukey, Jan O. Pedersen
Method and apparatus for information accesss employing overlapping clusters

Patent number: 5787422

Abstract: The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations. The present invention is further described in terms of two possible methods for expanding document clusters so as to achieve the overlap, and a method for increasing precision through the use of the overlapped clusters.

Type: Grant

Filed: January 11, 1996

Date of Patent: July 28, 1998

Assignee: Xerox Corporation

Inventors: John W. Tukey, Jan O. Pedersen
Automatic method of generating feature probabilities for automatic extracting summarization

Patent number: 5778397

Abstract: A method of automatically generating feature probabilities that allow later automatic generation of document extracts. The computer system generates the probabilities by analyzing each document a document at a time. First, the computer system designates one of the documents as a selected document. Next, the computer system analyzes each sentence of the selected document to determine the value of the paragraph feature and the value of the uppercase feature. The computer system repeats this effort for each document of the document corpus. Afterward, the number of occurrences of each value of each feature is calculated and is used to calculate feature value probabilities for all of the features.

Type: Grant

Filed: June 28, 1995

Date of Patent: July 7, 1998

Assignee: Xerox Corporation

Inventors: Julian M. Kupiec, Jan O. Pedersen, Francine R. Chen, Daniel C. Brotsky, Steven B. Putz
Finite-state transduction of related word forms for text indexing and retrieval

Patent number: 5625554

Abstract: The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The techniques described apply generally across the languages of the world and are not just limited to simple suffixing languages like English. Although the resulting transducers can have many states and transitions or arcs, they can be compacted by finite-state compression algorithms so that they can be used effectively in resource-limited applications.

Type: Grant

Filed: July 20, 1992

Date of Patent: April 29, 1997

Assignee: Xerox Corporation

Inventors: Douglass R. Cutting, Per-Kristian G. Halvorsen, Ronald M. Kaplan, Lauri Karttunen, Martin Kay, Jan O. Pedersen
Hardcopy lossless data storage and communications for electronic document processing systems

Patent number: 5486686

Abstract: Machine readable electronic domain definitions of part or all of the electronic domain descriptions of hardcopy documents and/or of part or all of the transforms that are performed to produce and reproduce such hardcopies documents are encoded in codes that are printed on such documents, thereby permitting the electronic domain descriptions of such documents and/or such transforms to be recovered more robustly and reliably when the information carried by such documents is transformed from the hardcopy domain to the electronic domain.

Type: Grant

Filed: May 18, 1992

Date of Patent: January 23, 1996

Assignee: Xerox Corporation

Inventors: Frank Zdybel, Jr., Henry W. Sang, Jr., Jan O. Pedersen, Z. E. Smith, III, D. A. Henderson, Jr., David L. Hecht, Dan S. Bloomberg
Method of constant interaction-time clustering applied to document browsing

Patent number: 5483650

Abstract: Arbitrarily large document collections are processed by expanding a focus set having at least one initial metadocument into a plurality of subsequent metadocuments. The number of subsequent metadocuments is approximately equal to a predetermined maximum number. The subsequent metadocuments are then clustered into a predetermined number of new metadocuments, which are summarized and presented to a user. The focus set is redefined to include only user-selected new metadocuments.

Type: Grant

Filed: June 21, 1993

Date of Patent: January 9, 1996

Assignee: Xerox Corporation

Inventors: Jan O. Pedersen, David R. Karger, Douglass R. Cutting
Scatter-gather: a cluster-based method and apparatus for browsing large document collections

Patent number: 5442778

Abstract: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.

Type: Grant

Filed: November 12, 1991

Date of Patent: August 15, 1995

Assignee: Xerox Corporation

Inventors: Jan. O. Pedersen, David Karger, Douglass R. Cutting, John W. Tukey
Iterative technique for phrase query formation and an information retrieval system employing same

Patent number: 5278980

Abstract: An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the "new" content words.

Type: Grant

Filed: August 16, 1991

Date of Patent: January 11, 1994

Assignee: Xerox Corporation

Inventors: Jan O. Pedersen, Per-Kristian Halvorsen, Douglass R. Cutting, John W. Tukey, Eric A. Bier, Daniel G. Bobrow