Patents by Inventor William S Spangler
William S Spangler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20140163964Abstract: According to one embodiment, a method is provided for approximate named-entity extraction from a dictionary that includes entries, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names.Type: ApplicationFiled: August 20, 2013Publication date: June 12, 2014Applicant: International Business Machines CorporationInventors: Ying Chen, William S. Spangler, Su Yan
-
Publication number: 20140163958Abstract: According to one embodiment, approximate named-entity extraction from a dictionary that includes entries is provided, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names.Type: ApplicationFiled: December 12, 2012Publication date: June 12, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ying Chen, William S. Spangler, Su Yan
-
Patent number: 8745062Abstract: Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.Type: GrantFiled: August 16, 2012Date of Patent: June 3, 2014Assignee: International Business Machines CorporationInventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
-
Patent number: 8600985Abstract: A system for classifying documents in a collection of documents according to their intended readerships includes: a computer configured to select a document in the collection of documents; and a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. A computer classifies the selected document as misleading, commercial, or personal according to its determined characteristic; and a computer repeats the steps of select document, determines a characteristic of the selected document, and classifies the selected document for additional documents in the collection. At least some documents are classified as misleading, some as commercial, and at least some as personal.Type: GrantFiled: May 16, 2012Date of Patent: December 3, 2013Assignee: International Business Machines CorporationInventors: Ying Chen, Bin He, William S. Spangler
-
Publication number: 20130318090Abstract: Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.Type: ApplicationFiled: May 24, 2012Publication date: November 28, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
-
Publication number: 20130318091Abstract: Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.Type: ApplicationFiled: August 16, 2012Publication date: November 28, 2013Applicant: International Business Machines CorporationInventors: Sumit Bhatia, Bin He, Qi He, William S. Spangler
-
Publication number: 20130132418Abstract: Discovering a keyword query corresponding to an input collection of documents taken from a candidate pool includes selecting a document from a working set as the input set, and extracting a list of snippets in the selected document. For each snippet, executing a set of proximity queries based on selected terms in that snippet, and finding all possible proximity queries that return less than N query results from the candidate pool. A query is selected from said proximity queries, based on the selected query returning the greatest number of working set documents, and returning the smallest number of documents not in the working set. Documents returned by the selected query are removed from the working set, and the above steps are repeated until no documents remain in the working set. The disjunction of selected queries is returned as the discovered query.Type: ApplicationFiled: November 18, 2011Publication date: May 23, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: William S. Spangler
-
Publication number: 20130104134Abstract: An approach for composing an analytic solution is provided. After associating descriptive schemas with web services and web-based applets, a set of input data sources is enumerated for selection. A desired output type is received. Based on the descriptive schemas that specify required inputs and outputs of the web services and web-based applets, combinations of web services and web-based applets are generated. The generated combinations achieve a result of the desired output type from one of the enumerated input data sources. Each combination is derived from available web services and web-based applets. The combinations include one or more workflows that provide an analytic solution. A workflow whose result satisfies the business objective may be saved. Steps in a workflow may be iteratively refined to generate a workflow whose result satisfies the business objective.Type: ApplicationFiled: August 28, 2012Publication date: April 25, 2013Applicant: International Business Machines CorporationInventors: Ying Chen, Thilina Gunarathne, Eugene M. Maximilien, William S. Spangler
-
Publication number: 20130104132Abstract: An approach for composing an analytic solution is provided. After associating descriptive schemas with web services and web-based applets, a set of input data sources is enumerated for selection. A desired output type is received. Based on the descriptive schemas that specify required inputs and outputs of the web services and web-based applets, combinations of web services and web-based applets are generated. The generated combinations achieve a result of the desired output type from one of the enumerated input data sources. Each combination is derived from available web services and web-based applets. The combinations include one or more workflows that provide an analytic solution. A workflow whose result satisfies the business objective may be saved. Steps in a workflow may be iteratively refined to generate a workflow whose result satisfies the business objective.Type: ApplicationFiled: October 25, 2011Publication date: April 25, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ying Chen, Thilina Gunarathne, Eugene M. Maximilien, William S. Spangler
-
Patent number: 7930282Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.Type: GrantFiled: March 28, 2008Date of Patent: April 19, 2011Assignee: International Business Machines CorporationInventor: William S. Spangler
-
Patent number: 7822704Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.Type: GrantFiled: August 20, 2007Date of Patent: October 26, 2010Assignee: International Business Machines CorporationInventors: William F. Cody, Vikas Krishna, Justin T. Lessler, William S. Spangler, Jeffrey T. Kreulen
-
Patent number: 7779349Abstract: A method and structure for clustering documents in datasets which include clustering first documents and a first dataset to produce first document classes, creating centroid seeds based on the first document classes, and clustering second documents in a second dataset using the centroid seeds, wherein the first dataset and the second dataset are related. The clustering of the first documents in the first dataset forms a first dictionary of most common words in the first dataset and generates a first vector space model by counting, for each word in the first dictionary, a number of the first documents in which the word occurs, and clusters the first documents in the first dataset based on the first vector space model, and further generates a second vector space model by counting, for each word in the first dictionary, a number of the second documents in which the word occurs.Type: GrantFiled: April 7, 2008Date of Patent: August 17, 2010Assignee: International Business Machines CorporationInventor: William S. Spangler
-
Patent number: 7512605Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.Type: GrantFiled: November 1, 2006Date of Patent: March 31, 2009Assignee: International Business Machines CorporationInventor: William S. Spangler
-
Publication number: 20080177736Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.Type: ApplicationFiled: March 28, 2008Publication date: July 24, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: William S. Spangler
-
Patent number: 7383257Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.Type: GrantFiled: May 30, 2003Date of Patent: June 3, 2008Assignee: International Business Machines CorporationInventors: William F Cody, Vikas Krishna, Justin T. Lessler, William S Spangler, Jeffrey T. Kreulen
-
Publication number: 20080104054Abstract: A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.Type: ApplicationFiled: November 1, 2006Publication date: May 1, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: William S. Spangler
-
Publication number: 20040243561Abstract: A method and structure for analyzing a database having non-text data in data fields and text in text fields. The invention first selects a subset of the database based upon criteria. The subset includes data field(s) and associated text field(s). The invention searches for data matching the criteria within structured data fields of the database. If the invention searches multiple databases, the invention creates shared dimensions for databases that do not share common attributes. The invention automatically selects a relatively short text phrase from the text fields that helps to explain the underlying meaning (i.e. unique text content) of a data subset selected using the non-text data fields.Type: ApplicationFiled: May 30, 2003Publication date: December 2, 2004Inventors: William F. Cody, Vikas Krishna, Justin T. Lessler, William S. Spangler, Jeffrey T. Kreulen
-
Patent number: 5253331Abstract: An expert system for the design and analysis of experiments includes a descriptive mathematical model of the experiment under consideration. From this mathematical model, expected mean squares are computed, tests are determined, and the power of the tests computed. This supplies the information needed to compare different designs and choose the best possible design. A layout sheet is then generated to aid in the collection of data. Once the data has been collected and entered, the system analyzes and interprets the results.Type: GrantFiled: July 3, 1991Date of Patent: October 12, 1993Assignee: General Motors CorporationInventors: Thomas J. Lorenzen, William S. Spangler, William T. Corpus, Lynn T. Truss