Patents by Inventor William S Spangler

William S Spangler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20140163964
    Abstract: According to one embodiment, a method is provided for approximate named-entity extraction from a dictionary that includes entries, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names.
    Type: Application
    Filed: August 20, 2013
    Publication date: June 12, 2014
    Applicant: International Business Machines Corporation
    Inventors: Ying Chen, William S. Spangler, Su Yan
  • Publication number: 20140163958
    Abstract: According to one embodiment, approximate named-entity extraction from a dictionary that includes entries is provided, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names.
    Type: Application
    Filed: December 12, 2012
    Publication date: June 12, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ying Chen, William S. Spangler, Su Yan
  • Patent number: 8745062
    Abstract: Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.
    Type: Grant
    Filed: August 16, 2012
    Date of Patent: June 3, 2014
    Assignee: International Business Machines Corporation
    Inventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
  • Patent number: 8600985
    Abstract: A system for classifying documents in a collection of documents according to their intended readerships includes: a computer configured to select a document in the collection of documents; and a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. A computer classifies the selected document as misleading, commercial, or personal according to its determined characteristic; and a computer repeats the steps of select document, determines a characteristic of the selected document, and classifies the selected document for additional documents in the collection. At least some documents are classified as misleading, some as commercial, and at least some as personal.
    Type: Grant
    Filed: May 16, 2012
    Date of Patent: December 3, 2013
    Assignee: International Business Machines Corporation
    Inventors: Ying Chen, Bin He, William S. Spangler
  • Publication number: 20130318090
    Abstract: Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 28, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
  • Publication number: 20130318091
    Abstract: Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.
    Type: Application
    Filed: August 16, 2012
    Publication date: November 28, 2013
    Applicant: International Business Machines Corporation
    Inventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
  • Publication number: 20130132418
    Abstract: Discovering a keyword query corresponding to an input collection of documents taken from a candidate pool includes selecting a document from a working set as the input set, and extracting a list of snippets in the selected document. For each snippet, executing a set of proximity queries based on selected terms in that snippet, and finding all possible proximity queries that return less than N query results from the candidate pool. A query is selected from said proximity queries, based on the selected query returning the greatest number of working set documents, and returning the smallest number of documents not in the working set. Documents returned by the selected query are removed from the working set, and the above steps are repeated until no documents remain in the working set. The disjunction of selected queries is returned as the discovered query.
    Type: Application
    Filed: November 18, 2011
    Publication date: May 23, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: William S. Spangler
  • Publication number: 20130104134
    Abstract: An approach for composing an analytic solution is provided. After associating descriptive schemas with web services and web-based applets, a set of input data sources is enumerated for selection. A desired output type is received. Based on the descriptive schemas that specify required inputs and outputs of the web services and web-based applets, combinations of web services and web-based applets are generated. The generated combinations achieve a result of the desired output type from one of the enumerated input data sources. Each combination is derived from available web services and web-based applets. The combinations include one or more workflows that provide an analytic solution. A workflow whose result satisfies the business objective may be saved. Steps in a workflow may be iteratively refined to generate a workflow whose result satisfies the business objective.
    Type: Application
    Filed: August 28, 2012
    Publication date: April 25, 2013
    Applicant: International Business Machines Corporation
    Inventors: Ying Chen, Thilina Gunarathne, Eugene M. Maximilien, William S. Spangler
  • Publication number: 20130104132
    Abstract: An approach for composing an analytic solution is provided. After associating descriptive schemas with web services and web-based applets, a set of input data sources is enumerated for selection. A desired output type is received. Based on the descriptive schemas that specify required inputs and outputs of the web services and web-based applets, combinations of web services and web-based applets are generated. The generated combinations achieve a result of the desired output type from one of the enumerated input data sources. Each combination is derived from available web services and web-based applets. The combinations include one or more workflows that provide an analytic solution. A workflow whose result satisfies the business objective may be saved. Steps in a workflow may be iteratively refined to generate a workflow whose result satisfies the business objective.
    Type: Application
    Filed: October 25, 2011
    Publication date: April 25, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ying Chen, Thilina Gunarathne, Eugene M. Maximilien, William S. Spangler
  • Patent number: 7930282
    Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: April 19, 2011
    Assignee: International Business Machines Corporation
    Inventor: William S. Spangler
  • Patent number: 7822704
    Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.
    Type: Grant
    Filed: August 20, 2007
    Date of Patent: October 26, 2010
    Assignee: International Business Machines Corporation
    Inventors: William F. Cody, Vikas Krishna, Justin T. Lessler, William S. Spangler, Jeffrey T. Kreulen
  • Patent number: 7779349
    Abstract: A method and structure for clustering documents in datasets which include clustering first documents and a first dataset to produce first document classes, creating centroid seeds based on the first document classes, and clustering second documents in a second dataset using the centroid seeds, wherein the first dataset and the second dataset are related. The clustering of the first documents in the first dataset forms a first dictionary of most common words in the first dataset and generates a first vector space model by counting, for each word in the first dictionary, a number of the first documents in which the word occurs, and clusters the first documents in the first dataset based on the first vector space model, and further generates a second vector space model by counting, for each word in the first dictionary, a number of the second documents in which the word occurs.
    Type: Grant
    Filed: April 7, 2008
    Date of Patent: August 17, 2010
    Assignee: International Business Machines Corporation
    Inventor: William S. Spangler
  • Patent number: 7512605
    Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.
    Type: Grant
    Filed: November 1, 2006
    Date of Patent: March 31, 2009
    Assignee: International Business Machines Corporation
    Inventor: William S. Spangler
  • Publication number: 20080177736
    Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.
    Type: Application
    Filed: March 28, 2008
    Publication date: July 24, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: William S. Spangler
  • Patent number: 7383257
    Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.
    Type: Grant
    Filed: May 30, 2003
    Date of Patent: June 3, 2008
    Assignee: International Business Machines Corporation
    Inventors: William F Cody, Vikas Krishna, Justin T. Lessler, William S Spangler, Jeffrey T. Kreulen
  • Publication number: 20080104054
    Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.
    Type: Application
    Filed: November 1, 2006
    Publication date: May 1, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: William S. Spangler
  • Publication number: 20040243561
    Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.
    Type: Application
    Filed: May 30, 2003
    Publication date: December 2, 2004
    Inventors: William F. Cody, Vikas Krishna, Justin T. Lessler, William S. Spangler, Jeffrey T. Kreulen
  • Patent number: 5253331
    Abstract: An expert system for the design and analysis of experiments includes a descriptive mathematical model of the experiment under consideration. From this mathematical model, expected mean squares are computed, tests are determined, and the power of the tests computed. This supplies the information needed to compare different designs and choose the best possible design. A layout sheet is then generated to aid in the collection of data. Once the data has been collected and entered, the system analyzes and interprets the results.
    Type: Grant
    Filed: July 3, 1991
    Date of Patent: October 12, 1993
    Assignee: General Motors Corporation
    Inventors: Thomas J. Lorenzen, William S. Spangler, William T. Corpus, Lynn T. Truss