Patents by Inventor James R. Stinger
James R. Stinger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7836059Abstract: A system or method for minimally predictive feature identification is disclosed. For information management, an information collection including a set of features is received. A set of prediction values indicating a degree to which a first feature within the set of features predicts other features in the set is generated. The first feature as a minimally predictive feature is identified if each of the prediction values is within a predetermined range of threshold values.Type: GrantFiled: October 26, 2004Date of Patent: November 16, 2010Assignee: Hewlett-Packard Development Company, L.P.Inventors: George H. Forman, Henri Jacques Suermondt, James R. Stinger
-
Patent number: 7325005Abstract: A system and method for category discovery is disclosed. The method discloses: receiving an information collection including a set of strings; identifying positively predictive pairs of strings; identifying negatively predictive pairs of strings; joining positively predictive pairs of strings into a category; and splitting negatively predictive pairs of strings into different categories. The system discloses various elements, means and instructions for performing the method.Type: GrantFiled: July 30, 2004Date of Patent: January 29, 2008Assignee: Hewlett-Packard Development Company, L.P.Inventors: George H. Forman, Henri Jacques Suermondt, James R. Stinger
-
Patent number: 7325006Abstract: A system and method for category organization is disclosed. The method discloses: receiving an information collection including a set of strings; identifying positively predictive pairs of strings; identifying negatively predictive pairs of strings; joining positively predictive pairs of strings into common categories; splitting negatively predictive pairs of strings into different categories; and organizing the categories using the negatively predictive pairs of strings. The system discloses various elements, means and instructions for performing the method.Type: GrantFiled: July 30, 2004Date of Patent: January 29, 2008Assignee: Hewlett-Packard Development Company, L.P.Inventors: George H. Fortnan, Henri Jacques Suermondt, James R. Stinger
-
Patent number: 7003725Abstract: A method and system of normalizing dirty text in a document. The present invention creates a thesaurus that evolves over time as new document collections are analyzed. This thesaurus, which is used by an editor, contains standard terms and phrases, and their corresponding variations of these standard terms and phrases. Documents are run through this editor and misspelled words or phrases, joined words, and ad hoc abbreviations are replaced with standard terms from the thesaurus. The present invention also enables normalization of documents in cases where a list of standard terms must be inferred from the corpus of the document. The normalizer will facilitate data mining applications which can not function properly with dirty text, resulting in more accurate analysis of documents. Over time, as the thesaurus evolves, collecting more words and phrases, the process of generating the thesaurus will become more automated.Type: GrantFiled: July 13, 2001Date of Patent: February 21, 2006Assignee: Hewlett-Packard Development Company, L.P.Inventors: Maria Castellanos, James R. Stinger
-
Patent number: 6978275Abstract: A method and system for mining a document containing dirty text. Dirty text is removed or replaced and the document is processed using a variety of text mining techniques. In one embodiment, dirty text removal and replacement occurs in two stages. In the first stage, a general cleaning occurs on all documents without regard to what domain they belong to or the mining task to be performed. In the second stage, document cleaning is more specific to the anomalies of the domain and the mining task to be performed. In the third stage, the document is processed using a variety of data mining techniques according to the mining task. In one embodiment, the present invention scores and ranks sentences in a document according to their relevance, extracts the highest ranked sentences, and presents a summary. The present invention allows users to leverage existing domain knowledge and can be customized according the domain and task requirements.Type: GrantFiled: August 31, 2001Date of Patent: December 20, 2005Assignee: Hewlett-Packard Development Company, L.P.Inventors: Maria Castellanos, James R. Stinger
-
Patent number: 6912555Abstract: Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.Type: GrantFiled: January 18, 2002Date of Patent: June 28, 2005Assignee: Hewlett-Packard Development Company, L.P.Inventors: Michael J. Lemon, Maria Castellanos, James R. Stinger
-
Patent number: 6757870Abstract: A method for automatically detecting table data in a document that is described by a page definition language and converting the table data into a markup language representation. The document may have one or more pages. The page definition language description of the document provides a list of words, the position of the each on a page with respect to a predetermined reference point, and the size of each word. The present invention automatically identifies table data in the document by utilizing one or more table-identifying features. A first table-identifying feature may be the number of word clusters on a line. A second table-identifying feature may be the vertical alignment of word clusters between lines. A third table-identifying feature may be the changes in text density or space density between lines.Type: GrantFiled: March 22, 2000Date of Patent: June 29, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventor: James R. Stinger
-
Publication number: 20040093355Abstract: A method for automatically detecting table data in a document that is described by a page definition language and converting the table data into a markup language representation. The document may have one or more pages. The page definition language description of the document provides a list of words, the position of the each on a page with respect to a predetermined reference point, and the size of each word. The present invention automatically identifies table data in the document by utilizing one or more table-identifying features. A first table-identifying feature may be the number of word clusters on a line. A second table-identifying feature may be the vertical alignment of word clusters between lines. A third table-identifying feature may be the changes in text density or space density between lines.Type: ApplicationFiled: October 24, 2003Publication date: May 13, 2004Inventor: James R. Stinger
-
Publication number: 20030140311Abstract: Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.Type: ApplicationFiled: January 18, 2002Publication date: July 24, 2003Inventors: Michael J. Lemon, Maria Castellanos, James R. Stinger
-
Publication number: 20030046263Abstract: A method and system for mining a document containing dirty text. Dirty text is removed or replaced and the document is processed using a variety of text mining techniques. In one embodiment, dirty text removal and replacement occurs in two stages. In the first stage, a general cleaning occurs on all documents without regard to what domain they belong to or the mining task to be performed. In the second stage, document cleaning is more specific to the anomalies of the domain and the mining task to be performed. In the third stage, the document is processed using a variety of data mining techniques according to the mining task. In one embodiment, the present invention scores and ranks sentences in a document according to their relevance, extracts the highest ranked sentences, and presents a summary. The present invention allows users to leverage existing domain knowledge and can be customized according the domain and task requirements.Type: ApplicationFiled: August 31, 2001Publication date: March 6, 2003Inventors: Maria Castellanos, James R. Stinger
-
Publication number: 20030014448Abstract: A method and system of normalizing dirty text in a document. The present invention creates a thesaurus that evolves over time as new document collections are analyzed. This thesaurus, which is used by an editor, contains standard terms and phrases, and their corresponding variations of these standard terms and phrases. Documents are run through this editor and misspelled words or phrases, joined words, and ad hoc abbreviations are replaced with standard terms from the thesaurus. The present invention also enables normalization of documents in cases where a list of standard terms must be inferred from the corpus of the document. The normalizer will facilitate data mining applications which can not function properly with dirty text, resulting in more accurate analysis of documents. Over time, as the thesaurus evolves, collecting more words and phrases, the process of generating the thesaurus will become more automated.Type: ApplicationFiled: July 13, 2001Publication date: January 16, 2003Inventors: Maria Castellanos, James R. Stinger