Patents by Inventor James R. Stinger

James R. Stinger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for minimally predictive feature identification

Patent number: 7836059

Abstract: A system or method for minimally predictive feature identification is disclosed. For information management, an information collection including a set of features is received. A set of prediction values indicating a degree to which a first feature within the set of features predicts other features in the set is generated. The first feature as a minimally predictive feature is identified if each of the prediction values is within a predetermined range of threshold values.

Type: Grant

Filed: October 26, 2004

Date of Patent: November 16, 2010

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: George H. Forman, Henri Jacques Suermondt, James R. Stinger
System and method for category discovery

Patent number: 7325005

Abstract: A system and method for category discovery is disclosed. The method discloses: receiving an information collection including a set of strings; identifying positively predictive pairs of strings; identifying negatively predictive pairs of strings; joining positively predictive pairs of strings into a category; and splitting negatively predictive pairs of strings into different categories. The system discloses various elements, means and instructions for performing the method.

Type: Grant

Filed: July 30, 2004

Date of Patent: January 29, 2008

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: George H. Forman, Henri Jacques Suermondt, James R. Stinger
System and method for category organization

Patent number: 7325006

Abstract: A system and method for category organization is disclosed. The method discloses: receiving an information collection including a set of strings; identifying positively predictive pairs of strings; identifying negatively predictive pairs of strings; joining positively predictive pairs of strings into common categories; splitting negatively predictive pairs of strings into different categories; and organizing the categories using the negatively predictive pairs of strings. The system discloses various elements, means and instructions for performing the method.

Type: Grant

Filed: July 30, 2004

Date of Patent: January 29, 2008

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: George H. Fortnan, Henri Jacques Suermondt, James R. Stinger
Method and system for normalizing dirty text in a document

Patent number: 7003725

Abstract: A method and system of normalizing dirty text in a document. The present invention creates a thesaurus that evolves over time as new document collections are analyzed. This thesaurus, which is used by an editor, contains standard terms and phrases, and their corresponding variations of these standard terms and phrases. Documents are run through this editor and misspelled words or phrases, joined words, and ad hoc abbreviations are replaced with standard terms from the thesaurus. The present invention also enables normalization of documents in cases where a list of standard terms must be inferred from the corpus of the document. The normalizer will facilitate data mining applications which can not function properly with dirty text, resulting in more accurate analysis of documents. Over time, as the thesaurus evolves, collecting more words and phrases, the process of generating the thesaurus will become more automated.

Type: Grant

Filed: July 13, 2001

Date of Patent: February 21, 2006

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Maria Castellanos, James R. Stinger
Method and system for mining a document containing dirty text

Patent number: 6978275

Abstract: A method and system for mining a document containing dirty text. Dirty text is removed or replaced and the document is processed using a variety of text mining techniques. In one embodiment, dirty text removal and replacement occurs in two stages. In the first stage, a general cleaning occurs on all documents without regard to what domain they belong to or the mining task to be performed. In the second stage, document cleaning is more specific to the anomalies of the domain and the mining task to be performed. In the third stage, the document is processed using a variety of data mining techniques according to the mining task. In one embodiment, the present invention scores and ranks sentences in a document according to their relevance, extracts the highest ranked sentences, and presents a summary. The present invention allows users to leverage existing domain knowledge and can be customized according the domain and task requirements.

Type: Grant

Filed: August 31, 2001

Date of Patent: December 20, 2005

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Maria Castellanos, James R. Stinger
Method for content mining of semi-structured documents

Patent number: 6912555

Abstract: Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.

Type: Grant

Filed: January 18, 2002

Date of Patent: June 28, 2005

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Michael J. Lemon, Maria Castellanos, James R. Stinger
Automatic table detection method and system

Patent number: 6757870

Abstract: A method for automatically detecting table data in a document that is described by a page definition language and converting the table data into a markup language representation. The document may have one or more pages. The page definition language description of the document provides a list of words, the position of the each on a page with respect to a predetermined reference point, and the size of each word. The present invention automatically identifies table data in the document by utilizing one or more table-identifying features. A first table-identifying feature may be the number of word clusters on a line. A second table-identifying feature may be the vertical alignment of word clusters between lines. A third table-identifying feature may be the changes in text density or space density between lines.

Type: Grant

Filed: March 22, 2000

Date of Patent: June 29, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: James R. Stinger
Automatic table detection method and system

Publication number: 20040093355

Abstract: A method for automatically detecting table data in a document that is described by a page definition language and converting the table data into a markup language representation. The document may have one or more pages. The page definition language description of the document provides a list of words, the position of the each on a page with respect to a predetermined reference point, and the size of each word. The present invention automatically identifies table data in the document by utilizing one or more table-identifying features. A first table-identifying feature may be the number of word clusters on a line. A second table-identifying feature may be the vertical alignment of word clusters between lines. A third table-identifying feature may be the changes in text density or space density between lines.

Type: Application

Filed: October 24, 2003

Publication date: May 13, 2004

Inventor: James R. Stinger
Method for content mining of semi-structured documents

Publication number: 20030140311

Abstract: Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.

Type: Application

Filed: January 18, 2002

Publication date: July 24, 2003

Inventors: Michael J. Lemon, Maria Castellanos, James R. Stinger
Method and system for mining a document containing dirty text

Publication number: 20030046263

Abstract: A method and system for mining a document containing dirty text. Dirty text is removed or replaced and the document is processed using a variety of text mining techniques. In one embodiment, dirty text removal and replacement occurs in two stages. In the first stage, a general cleaning occurs on all documents without regard to what domain they belong to or the mining task to be performed. In the second stage, document cleaning is more specific to the anomalies of the domain and the mining task to be performed. In the third stage, the document is processed using a variety of data mining techniques according to the mining task. In one embodiment, the present invention scores and ranks sentences in a document according to their relevance, extracts the highest ranked sentences, and presents a summary. The present invention allows users to leverage existing domain knowledge and can be customized according the domain and task requirements.

Type: Application

Filed: August 31, 2001

Publication date: March 6, 2003

Inventors: Maria Castellanos, James R. Stinger
Method and system for normalizing dirty text in a document

Publication number: 20030014448

Abstract: A method and system of normalizing dirty text in a document. The present invention creates a thesaurus that evolves over time as new document collections are analyzed. This thesaurus, which is used by an editor, contains standard terms and phrases, and their corresponding variations of these standard terms and phrases. Documents are run through this editor and misspelled words or phrases, joined words, and ad hoc abbreviations are replaced with standard terms from the thesaurus. The present invention also enables normalization of documents in cases where a list of standard terms must be inferred from the corpus of the document. The normalizer will facilitate data mining applications which can not function properly with dirty text, resulting in more accurate analysis of documents. Over time, as the thesaurus evolves, collecting more words and phrases, the process of generating the thesaurus will become more automated.

Type: Application

Filed: July 13, 2001

Publication date: January 16, 2003

Inventors: Maria Castellanos, James R. Stinger