Patents by Inventor William S Spangler

William S Spangler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

APPROXIMATE NAMED-ENTITY EXTRACTION

Publication number: 20140163964

Abstract: According to one embodiment, a method is provided for approximate named-entity extraction from a dictionary that includes entries, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names.

Type: Application

Filed: August 20, 2013

Publication date: June 12, 2014

Applicant: International Business Machines Corporation

Inventors: Ying Chen, William S. Spangler, Su Yan
APPROXIMATE NAMED-ENTITY EXTRACTION

Publication number: 20140163958

Abstract: According to one embodiment, approximate named-entity extraction from a dictionary that includes entries is provided, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names.

Type: Application

Filed: December 12, 2012

Publication date: June 12, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ying Chen, William S. Spangler, Su Yan
Systems, methods, and computer program products for fast and scalable proximal search for search queries

Patent number: 8745062

Abstract: Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.

Type: Grant

Filed: August 16, 2012

Date of Patent: June 3, 2014

Assignee: International Business Machines Corporation

Inventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
Classifying documents according to readership

Patent number: 8600985

Abstract: A system for classifying documents in a collection of documents according to their intended readerships includes: a computer configured to select a document in the collection of documents; and a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. A computer classifies the selected document as misleading, commercial, or personal according to its determined characteristic; and a computer repeats the steps of select document, determines a characteristic of the selected document, and classifies the selected document for additional documents in the collection. At least some documents are classified as misleading, some as commercial, and at least some as personal.

Type: Grant

Filed: May 16, 2012

Date of Patent: December 3, 2013

Assignee: International Business Machines Corporation

Inventors: Ying Chen, Bin He, William S. Spangler
SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR FAST AND SCALABLE PROXIMAL SEARCH FOR SEARCH QUERIES

Publication number: 20130318090

Abstract: Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.

Type: Application

Filed: May 24, 2012

Publication date: November 28, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR FAST AND SCALABLE PROXIMAL SEARCH FOR SEARCH QUERIES

Publication number: 20130318091

Abstract: Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.

Type: Application

Filed: August 16, 2012

Publication date: November 28, 2013

Applicant: International Business Machines Corporation

Inventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR DISCOVERING A TEXT QUERY FROM EXAMPLE DOCUMENTS

Publication number: 20130132418

Abstract: Discovering a keyword query corresponding to an input collection of documents taken from a candidate pool includes selecting a document from a working set as the input set, and extracting a list of snippets in the selected document. For each snippet, executing a set of proximity queries based on selected terms in that snippet, and finding all possible proximity queries that return less than N query results from the candidate pool. A query is selected from said proximity queries, based on the selected query returning the greatest number of working set documents, and returning the smallest number of documents not in the working set. Documents returned by the selected query are removed from the working set, and the above steps are repeated until no documents remain in the working set. The disjunction of selected queries is returned as the discovered query.

Type: Application

Filed: November 18, 2011

Publication date: May 23, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: William S. Spangler
COMPOSING ANALYTIC SOLUTIONS

Publication number: 20130104134

Abstract: An approach for composing an analytic solution is provided. After associating descriptive schemas with web services and web-based applets, a set of input data sources is enumerated for selection. A desired output type is received. Based on the descriptive schemas that specify required inputs and outputs of the web services and web-based applets, combinations of web services and web-based applets are generated. The generated combinations achieve a result of the desired output type from one of the enumerated input data sources. Each combination is derived from available web services and web-based applets. The combinations include one or more workflows that provide an analytic solution. A workflow whose result satisfies the business objective may be saved. Steps in a workflow may be iteratively refined to generate a workflow whose result satisfies the business objective.

Type: Application

Filed: August 28, 2012

Publication date: April 25, 2013

Applicant: International Business Machines Corporation

Inventors: Ying Chen, Thilina Gunarathne, Eugene M. Maximilien, William S. Spangler
COMPOSING ANALYTIC SOLUTIONS

Publication number: 20130104132

Abstract: An approach for composing an analytic solution is provided. After associating descriptive schemas with web services and web-based applets, a set of input data sources is enumerated for selection. A desired output type is received. Based on the descriptive schemas that specify required inputs and outputs of the web services and web-based applets, combinations of web services and web-based applets are generated. The generated combinations achieve a result of the desired output type from one of the enumerated input data sources. Each combination is derived from available web services and web-based applets. The combinations include one or more workflows that provide an analytic solution. A workflow whose result satisfies the business objective may be saved. Steps in a workflow may be iteratively refined to generate a workflow whose result satisfies the business objective.

Type: Application

Filed: October 25, 2011

Publication date: April 25, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ying Chen, Thilina Gunarathne, Eugene M. Maximilien, William S. Spangler
Document clustering based on cohesive terms

Patent number: 7930282

Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.

Type: Grant

Filed: March 28, 2008

Date of Patent: April 19, 2011

Assignee: International Business Machines Corporation

Inventor: William S. Spangler
Text explanation for on-line analytic processing events

Patent number: 7822704

Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.

Type: Grant

Filed: August 20, 2007

Date of Patent: October 26, 2010

Assignee: International Business Machines Corporation

Inventors: William F. Cody, Vikas Krishna, Justin T. Lessler, William S. Spangler, Jeffrey T. Kreulen
Method for adapting a K-means text clustering to emerging data

Patent number: 7779349

Abstract: A method and structure for clustering documents in datasets which include clustering first documents and a first dataset to produce first document classes, creating centroid seeds based on the first document classes, and clustering second documents in a second dataset using the centroid seeds, wherein the first dataset and the second dataset are related. The clustering of the first documents in the first dataset forms a first dictionary of most common words in the first dataset and generates a first vector space model by counting, for each word in the first dictionary, a number of the first documents in which the word occurs, and clusters the first documents in the first dataset based on the first vector space model, and further generates a second vector space model by counting, for each word in the first dictionary, a number of the second documents in which the word occurs.

Type: Grant

Filed: April 7, 2008

Date of Patent: August 17, 2010

Assignee: International Business Machines Corporation

Inventor: William S. Spangler
Document clustering based on cohesive terms

Patent number: 7512605

Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.

Type: Grant

Filed: November 1, 2006

Date of Patent: March 31, 2009

Assignee: International Business Machines Corporation

Inventor: William S. Spangler
DOCUMENT CLUSTERING BASED ON COHESIVE TERMS

Publication number: 20080177736

Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.

Type: Application

Filed: March 28, 2008

Publication date: July 24, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: William S. Spangler
Text explanation for on-line analytic processing events

Patent number: 7383257

Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.

Type: Grant

Filed: May 30, 2003

Date of Patent: June 3, 2008

Assignee: International Business Machines Corporation

Inventors: William F Cody, Vikas Krishna, Justin T. Lessler, William S Spangler, Jeffrey T. Kreulen
DOCUMENT CLUSTERING BASED ON COHESIVE TERMS

Publication number: 20080104054

Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.

Type: Application

Filed: November 1, 2006

Publication date: May 1, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: William S. Spangler
Text explanation for on-line analytic processing events

Publication number: 20040243561

Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.

Type: Application

Filed: May 30, 2003

Publication date: December 2, 2004

Inventors: William F. Cody, Vikas Krishna, Justin T. Lessler, William S. Spangler, Jeffrey T. Kreulen
Expert system for statistical design of experiments

Patent number: 5253331

Abstract: An expert system for the design and analysis of experiments includes a descriptive mathematical model of the experiment under consideration. From this mathematical model, expected mean squares are computed, tests are determined, and the power of the tests computed. This supplies the information needed to compare different designs and choose the best possible design. A layout sheet is then generated to aid in the collection of data. Once the data has been collected and entered, the system analyzes and interprets the results.

Type: Grant

Filed: July 3, 1991

Date of Patent: October 12, 1993

Assignee: General Motors Corporation

Inventors: Thomas J. Lorenzen, William S. Spangler, William T. Corpus, Lynn T. Truss

prev 1 2 3