Patents by Inventor James R. Stinger

James R. Stinger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7836059
    Abstract: A system or method for minimally predictive feature identification is disclosed. For information management, an information collection including a set of features is received. A set of prediction values indicating a degree to which a first feature within the set of features predicts other features in the set is generated. The first feature as a minimally predictive feature is identified if each of the prediction values is within a predetermined range of threshold values.
    Type: Grant
    Filed: October 26, 2004
    Date of Patent: November 16, 2010
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: George H. Forman, Henri Jacques Suermondt, James R. Stinger
  • Patent number: 7325005
    Abstract: A system and method for category discovery is disclosed. The method discloses: receiving an information collection including a set of strings; identifying positively predictive pairs of strings; identifying negatively predictive pairs of strings; joining positively predictive pairs of strings into a category; and splitting negatively predictive pairs of strings into different categories. The system discloses various elements, means and instructions for performing the method.
    Type: Grant
    Filed: July 30, 2004
    Date of Patent: January 29, 2008
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: George H. Forman, Henri Jacques Suermondt, James R. Stinger
  • Patent number: 7325006
    Abstract: A system and method for category organization is disclosed. The method discloses: receiving an information collection including a set of strings; identifying positively predictive pairs of strings; identifying negatively predictive pairs of strings; joining positively predictive pairs of strings into common categories; splitting negatively predictive pairs of strings into different categories; and organizing the categories using the negatively predictive pairs of strings. The system discloses various elements, means and instructions for performing the method.
    Type: Grant
    Filed: July 30, 2004
    Date of Patent: January 29, 2008
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: George H. Fortnan, Henri Jacques Suermondt, James R. Stinger
  • Patent number: 7003725
    Abstract: A method and system of normalizing dirty text in a document. The present invention creates a thesaurus that evolves over time as new document collections are analyzed. This thesaurus, which is used by an editor, contains standard terms and phrases, and their corresponding variations of these standard terms and phrases. Documents are run through this editor and misspelled words or phrases, joined words, and ad hoc abbreviations are replaced with standard terms from the thesaurus. The present invention also enables normalization of documents in cases where a list of standard terms must be inferred from the corpus of the document. The normalizer will facilitate data mining applications which can not function properly with dirty text, resulting in more accurate analysis of documents. Over time, as the thesaurus evolves, collecting more words and phrases, the process of generating the thesaurus will become more automated.
    Type: Grant
    Filed: July 13, 2001
    Date of Patent: February 21, 2006
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Maria Castellanos, James R. Stinger
  • Patent number: 6978275
    Abstract: A method and system for mining a document containing dirty text. Dirty text is removed or replaced and the document is processed using a variety of text mining techniques. In one embodiment, dirty text removal and replacement occurs in two stages. In the first stage, a general cleaning occurs on all documents without regard to what domain they belong to or the mining task to be performed. In the second stage, document cleaning is more specific to the anomalies of the domain and the mining task to be performed. In the third stage, the document is processed using a variety of data mining techniques according to the mining task. In one embodiment, the present invention scores and ranks sentences in a document according to their relevance, extracts the highest ranked sentences, and presents a summary. The present invention allows users to leverage existing domain knowledge and can be customized according the domain and task requirements.
    Type: Grant
    Filed: August 31, 2001
    Date of Patent: December 20, 2005
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Maria Castellanos, James R. Stinger
  • Patent number: 6912555
    Abstract: Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.
    Type: Grant
    Filed: January 18, 2002
    Date of Patent: June 28, 2005
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Michael J. Lemon, Maria Castellanos, James R. Stinger
  • Patent number: 6757870
    Abstract: A method for automatically detecting table data in a document that is described by a page definition language and converting the table data into a markup language representation. The document may have one or more pages. The page definition language description of the document provides a list of words, the position of the each on a page with respect to a predetermined reference point, and the size of each word. The present invention automatically identifies table data in the document by utilizing one or more table-identifying features. A first table-identifying feature may be the number of word clusters on a line. A second table-identifying feature may be the vertical alignment of word clusters between lines. A third table-identifying feature may be the changes in text density or space density between lines.
    Type: Grant
    Filed: March 22, 2000
    Date of Patent: June 29, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: James R. Stinger
  • Publication number: 20040093355
    Abstract: A method for automatically detecting table data in a document that is described by a page definition language and converting the table data into a markup language representation. The document may have one or more pages. The page definition language description of the document provides a list of words, the position of the each on a page with respect to a predetermined reference point, and the size of each word. The present invention automatically identifies table data in the document by utilizing one or more table-identifying features. A first table-identifying feature may be the number of word clusters on a line. A second table-identifying feature may be the vertical alignment of word clusters between lines. A third table-identifying feature may be the changes in text density or space density between lines.
    Type: Application
    Filed: October 24, 2003
    Publication date: May 13, 2004
    Inventor: James R. Stinger
  • Publication number: 20030140311
    Abstract: Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.
    Type: Application
    Filed: January 18, 2002
    Publication date: July 24, 2003
    Inventors: Michael J. Lemon, Maria Castellanos, James R. Stinger
  • Publication number: 20030046263
    Abstract: A method and system for mining a document containing dirty text. Dirty text is removed or replaced and the document is processed using a variety of text mining techniques. In one embodiment, dirty text removal and replacement occurs in two stages. In the first stage, a general cleaning occurs on all documents without regard to what domain they belong to or the mining task to be performed. In the second stage, document cleaning is more specific to the anomalies of the domain and the mining task to be performed. In the third stage, the document is processed using a variety of data mining techniques according to the mining task. In one embodiment, the present invention scores and ranks sentences in a document according to their relevance, extracts the highest ranked sentences, and presents a summary. The present invention allows users to leverage existing domain knowledge and can be customized according the domain and task requirements.
    Type: Application
    Filed: August 31, 2001
    Publication date: March 6, 2003
    Inventors: Maria Castellanos, James R. Stinger
  • Publication number: 20030014448
    Abstract: A method and system of normalizing dirty text in a document. The present invention creates a thesaurus that evolves over time as new document collections are analyzed. This thesaurus, which is used by an editor, contains standard terms and phrases, and their corresponding variations of these standard terms and phrases. Documents are run through this editor and misspelled words or phrases, joined words, and ad hoc abbreviations are replaced with standard terms from the thesaurus. The present invention also enables normalization of documents in cases where a list of standard terms must be inferred from the corpus of the document. The normalizer will facilitate data mining applications which can not function properly with dirty text, resulting in more accurate analysis of documents. Over time, as the thesaurus evolves, collecting more words and phrases, the process of generating the thesaurus will become more automated.
    Type: Application
    Filed: July 13, 2001
    Publication date: January 16, 2003
    Inventors: Maria Castellanos, James R. Stinger