Patents by Inventor John W. Tukey

John W. Tukey has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 5999927
    Abstract: The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations. The present invention is further described in terms of two possible methods for expanding document clusters so as to achieve the overlap, and a method for increasing precision through the use of the overlapped clusters.
    Type: Grant
    Filed: April 24, 1998
    Date of Patent: December 7, 1999
    Assignee: Xerox Corporation
    Inventors: John W. Tukey, Jan O. Pedersen
  • Patent number: 5911140
    Abstract: A method of automatically ordering the presentation of documents clusters generated from a ranked corpus of documents. First, the corpus is ordered into a plurality of clusters. Next, a rank is determined for each cluster based upon the rank of a document within that cluster. Afterward, the clusters are presented to a computer user in the order determined by their rank.
    Type: Grant
    Filed: December 14, 1995
    Date of Patent: June 8, 1999
    Assignee: Xerox Corporation
    Inventors: John W. Tukey, Jan O. Pedersen
  • Patent number: 5850476
    Abstract: A method of automatically identifying drop words in a document image without performing character recognition to generate an ASCII representation of the document text. First, the document image is analyzed to identify word equivalence classes, each of which represents at least one word of the multiplicity of words included in the document. Second, for each word equivalence class, the likelihood that it is not a drop word is determined. Third, document length is analyzed to determine whether the document is short. For a short document, the number of word equivalence classes identified as drop words based upon their likelihood is proportional to document length. For long documents, a fixed number of word equivalence classes are identified as drop words based upon the likelihood that they are not drop words.
    Type: Grant
    Filed: December 14, 1995
    Date of Patent: December 15, 1998
    Assignee: Xerox Corporation
    Inventors: Francine R. Chen, John W. Tukey
  • Patent number: 5848191
    Abstract: A method of automatically generating a thematic summary from a document image without performing character recognition to generate an ASCII representation of the document text. The method begins with decomposition of the document image into text blocks, and text lines. Using the median x-height of text blocks the main body of text is identified. Afterward, word image equivalence classes and sentence boundaries within the blocks of the main body of text are determined. The word image equivalence classes are used to identify thematic words. These, in turn are used to score the sentences within the main body of text, and the highest scoring sentences are selected for extraction.
    Type: Grant
    Filed: December 14, 1995
    Date of Patent: December 8, 1998
    Assignee: Xerox Corporation
    Inventors: Francine R. Chen, Dan S. Bloomberg, John W. Tukey
  • Patent number: 5787420
    Abstract: A computerized method of ordering document clusters for presentation after browsing a corpus of documents that presents document clusters in a logical fashion in the absence of any indication of the computer user's interests. The method begins by grouping the corpus into a plurality of clusters, each having a centroid and including at least one document. Next, for each cluster a degree of similarity between that cluster and every other cluster is by finding a dot product between each cluster centroid and every other cluster centroid. The similarity information is then used to determine an order of presentation for the plurality of in a way that maximizes the degree of similarity between adjacent clusters.
    Type: Grant
    Filed: December 14, 1995
    Date of Patent: July 28, 1998
    Assignee: Xerox Corporation
    Inventors: John W. Tukey, Jan O. Pedersen
  • Patent number: 5787422
    Abstract: The present invention is a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations. The present invention is further described in terms of two possible methods for expanding document clusters so as to achieve the overlap, and a method for increasing precision through the use of the overlapped clusters.
    Type: Grant
    Filed: January 11, 1996
    Date of Patent: July 28, 1998
    Assignee: Xerox Corporation
    Inventors: John W. Tukey, Jan O. Pedersen
  • Patent number: 5455871
    Abstract: A method and apparatus detects function words in a first image of a scanned document without first converting the image to character codes. Function words include determiners, prepositions, articles, and other words that play a largely grammatical role, as opposed to words such as nouns and verbs that convey topic information. Non-content based morphological characteristics of image units are predetermined as well as the presence or omission of character ascenders and descenders in image units. Predetermined characteristics of function word image units are compared with the image units of an image and when a match occurs, the image unit is identified as a function word. Conversely when no matching characteristics occur, the image unit is identified as a non-function word. Additionally, image units are classified and identified as containing only upper case characters, only lower case characters, only digits, and mixed character types.
    Type: Grant
    Filed: May 16, 1994
    Date of Patent: October 3, 1995
    Assignee: Xerox Corporation
    Inventors: Dan S. Bloomberg, John W. Tukey, M. Margaret Withgott
  • Patent number: 5442778
    Abstract: Scatter-Gather is a computer based document browsing method which operates in time proportional to a number of documents in a target corpus. The Scatter-Gather method includes: preparing an initial ordering of the corpus using, for example, an off-line computational method; determining a summary of the initial ordering of the corpus for interactive utility; and providing a further ordering of the corpus using, for example, an on-line non-deterministic method. The step of an off-line preparation of an initial ordering of a corpus is non-time-dependent, thus an accurate initial ordering is prepared. The step of determining a summary includes determining a summary for presentation to a user without scrolling on a CRT. The step of providing a further ordering includes truncated group average agglomerate clustering, merging disjointed document sets, center finding, assign-to-nearest and other refinement methods.
    Type: Grant
    Filed: November 12, 1991
    Date of Patent: August 15, 1995
    Assignee: Xerox Corporation
    Inventors: Jan. O. Pedersen, David Karger, Douglass R. Cutting, John W. Tukey
  • Patent number: 5278980
    Abstract: An information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpus of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching word data and the next adjacent non-stop word. The operator, after reviewing one or more of the returned phrases can then use one or more of the next adjacent non-stop-words as new query words to reformulate the search key and perform a subsequent search through the document corpus. This process can be conducted iteratively, until the appropriate documents of interest are located. The additional non-stop-words from each phrase are preferably aligned with each other (e.g., by columnation) to ease viewing of the "new" content words.
    Type: Grant
    Filed: August 16, 1991
    Date of Patent: January 11, 1994
    Assignee: Xerox Corporation
    Inventors: Jan O. Pedersen, Per-Kristian Halvorsen, Douglass R. Cutting, John W. Tukey, Eric A. Bier, Daniel G. Bobrow