Patents by Inventor Andrew R. Freed

Andrew R. Freed has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200104350
    Abstract: Systems and methods for generating and annotating cell documents include extracting tables from a document using a table extraction engine. Headers are extracted for each of the tables using a header detection engine. Cells are extracted from each of the tables using a cell extraction engine. A cell document is generated for each of the cells which are each correlated to corresponding portions of the headers, each cell document recording the correlation between the cells and the the headers. Each cell document is annotated to generate annotated cell documents with a cell recognition model trained to perform natural language processing on the cell documents by classifying each term in each of the cell documents and extracting relationships between the terms of each of the cell documents.
    Type: Application
    Filed: September 28, 2018
    Publication date: April 2, 2020
    Inventors: Joshua Allen, Andrew R. Freed, Thai T. La
  • Publication number: 20200097533
    Abstract: A method, system and computer-usable medium are disclosed for associating data cells with headers and tables having one or more embedded header structures. In certain embodiments, a table having rows and columns is received, wherein the table includes a plurality of cells, wherein each cell is populated with at least one of a header name, data value, or no information, the table having at least one embedded header. A determination is made as to whether a cell is a header cell or data cell. If the cell is a header cell, a count of consecutive column headers is maintained. A current list of column headers is dynamically updated based on the count of the consecutive column headers. Upon encountering a data cell, the current list of column headers is assigned to the data cell.
    Type: Application
    Filed: September 20, 2018
    Publication date: March 26, 2020
    Inventors: Kyle G. Christianson, Joshua S. Allen, Hassan Nadim, Andrew R. Freed
  • Publication number: 20200097759
    Abstract: A method, system and computer-usable medium for detecting headers in various documents, such as PDF and HTML files. The files are converted to a two dimensional array or table, having orthogonal rows and columns. Either rows or columns are determined to include headers. For determining if rows include headers. For each row in the array or table, pair wise comparison is performed for each cell of each column that is orthogonal to that row. The pair wise comparison scores or values are summed up for each orthogonal column to that row and the sum across for all the orthogonal columns to row provide a score or value for that row. Row scores are evaluated relative to one another to determine likelihood of headers in the row. For determining if columns have headers, similar calculation is performed between columns and their orthogonal rows.
    Type: Application
    Filed: September 20, 2018
    Publication date: March 26, 2020
    Inventors: Hassan Nadim, Andrew R. Freed, Joshua S. Allen, Kyle G. Christianson
  • Publication number: 20200097541
    Abstract: A method, system and computer-usable medium are disclosed for associating data cells with headers and header labels. In certain embodiments, a table having rows and columns is received, wherein the table includes a plurality of cells, wherein each cell is populated with at least one of a header name, data value, or no information. A determination is made as to whether a cell is a header cell or data cell. If the cell is a header cell, current list of column and current list of row headers are dynamically updated. The current list of column and row headers are assigned to the cell regardless of whether the cell is a header cell or data cell. Headers associated with header cells are used to identify label candidates for the header name of the header cell. The labels may be used to provide additional context for headers within a data cell.
    Type: Application
    Filed: September 20, 2018
    Publication date: March 26, 2020
    Inventors: Kyle G. Christianson, Joshua S. Allen, Hassan Nadim, Andrew R. Freed
  • Publication number: 20200097532
    Abstract: A method, system and computer-usable medium are disclosed for finding vertically and horizontally aligned cells in a complex table structure. A file or document, such as an HTML file that defines a complex table includes spanning rows and columns is expanded into a two dimensional (2D) array or table with orthogonal rows and columns, where the spanning rows and columns included cells with copied values or object references. The expanded 2D array or table can be deduplicated row or column wise to determine header alignment of the table.
    Type: Application
    Filed: September 20, 2018
    Publication date: March 26, 2020
    Inventors: Kyle G. Christianson, Hassan Nadim, Joshua S. Allen, Andrew R. Freed
  • Patent number: 10599682
    Abstract: An embodiment of the invention may include a method, computer program product, and system for generating ground truth data for a plurality of cognitive capabilities within an overall cognitive system. The embodiment may include configuring multiple sets of training data. Each set of training data corresponds to a separate cognitive capability. The embodiment may include displaying a set of ground truth curation activities via a user interface. The embodiment may include determining the ground truth curation activities performed for a first type of data for a first duration. The first type of data is selected from the single set of grouped training data. The embodiment may include determining whether the first duration has exceeded a pre-determined threshold. The embodiment may include switching the curation activities to a second type of data. The second type of data is selected from the single set of grouped data.
    Type: Grant
    Filed: August 8, 2017
    Date of Patent: March 24, 2020
    Assignee: International Business Machines Corporation
    Inventors: Corville O. Allen, Andrew R. Freed, Sorabh Murgai
  • Patent number: 10599997
    Abstract: A method for ground truth generation includes providing training questions to a machine learning system executing on a computer. The machine learning system generates candidate answers to each of the training questions. The method also includes providing the candidate answers to a plurality of subject matter experts for evaluation with respect to the training questions, wherein the evaluation comprises assignment of an SME relevance score to each of the candidate answers. The method further includes analyzing each of the candidate answers with respect to a plurality of scoring features, wherein each of the scoring features is indicative of quality of the candidate answer. The method yet further includes generating a ground truth metric value that indicates a measure of agreement between the subject matter experts relative to a measure of agreement between results of the analyzing.
    Type: Grant
    Filed: August 11, 2016
    Date of Patent: March 24, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Corville O. Allen, Andrew R. Freed, Joseph N. Kozhaya, Dwi Sianto Mansjur
  • Patent number: 10585930
    Abstract: A computer-implemented method according to one embodiment includes identifying a summary of a single instance of content, monitoring user interaction with the summary, and determining a relevancy of the summary to the single instance of content, based on the user interaction.
    Type: Grant
    Filed: July 29, 2016
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Corville O. Allen, Andrew R. Freed, Joseph N. Kozhaya, Dwi Sianto Mansjur
  • Patent number: 10572547
    Abstract: A primary ingestion pipeline configured for use in natural language processing includes annotators configured for annotating documents. The annotators and documents to be annotated are evaluated. Based on the evaluations, an ingestion risk score is generated for each document. Each ingestion risk score represents a likelihood that an associated document will not successfully be annotated by the annotators. Each ingestion risk score is compared to a set of risk criteria. Based on the comparisons, a determination is made that each document of a first set of documents satisfies the set of risk criteria. A further determination is made, based on the comparisons, that each document of a second set of documents does not satisfy the set of risk criteria. In response to these determinations, the first set of documents is entered into the primary ingestion pipeline and the second set of documents is provided special handling.
    Type: Grant
    Filed: January 31, 2019
    Date of Patent: February 25, 2020
    Assignee: International Business Machines Corporation
    Inventors: Pamela D. Andrejko, Andrew R. Freed, Cynthia M. Murch, Jan M. Nordland, Humberto R. Rivero
  • Publication number: 20200034732
    Abstract: A method, system and computer-usable medium are disclosed for automated analysis of ground truth using confidence model to prioritize correction options. In certain embodiments, the ground truth data is analyzed to identify review-candidates. A confidence level may be assigned to each of the identified review-candidates and the review-candidates are prioritized, at least in part, using the assigned confidence levels. The review-candidates are electronically presented in prioritized order to solicit verification or correction feedback for updating the ground truth data.
    Type: Application
    Filed: July 25, 2018
    Publication date: January 30, 2020
    Inventors: Andrew R. Freed, Kyle G. Christianson, Christopher Phipps
  • Publication number: 20200005032
    Abstract: A classifier receives a document from a multi-document transaction. The classifier analyzes the document to identify one or more embedded dates in the content of the document and context of one or more positions of the one or more embedded dates in the document. The classifier evaluates each of the one or more embedded dates based on the separate context of each of the one or more positions within the document and a relative age of the one or more embedded dates in view of temporal characteristics of multiple categories of documents of a transaction to select a particular category associated with the document from among the multiple categories. The classifier classifies the document within the transaction as a particular logical type identified by the particular category from among multiple logical types.
    Type: Application
    Filed: July 1, 2018
    Publication date: January 2, 2020
    Inventors: ANDREW R. FREED, CORVILLE O. ALLEN
  • Publication number: 20190391956
    Abstract: An approach is provided in which an information handling system performs multiple tests using a cognitive service and multiple trained machine learning models on user data corresponding to a user application. For each of the multiple tests, a different one of the trained machine learning models is utilized. The information handling system generates results from the tests and then selects at least one of the trained machine learning models based on the test results. In turn, the information handling system assigns the cognitive service and the selected trained machine learning models to the user application.
    Type: Application
    Filed: June 26, 2018
    Publication date: December 26, 2019
    Inventors: Joseph N. Kozhaya, Corville O. Allen, Andrew R. Freed
  • Patent number: 10503768
    Abstract: Embodiments relate to a system, program product, and method for use with an intelligent computer platform to decipher analogical phrases. An analogy list is obtained, each analogy within the analogy list having a known meaning, and receiving an analogy phrase. A verb is identified within the phrase and a verb definition list is generated. A subject is identified within the phrase and a subject definition list is generated. An adjective is identified within the phrase and an adjective definition list is generated, including filtering adjectives to accept adjective definitions associated with the identified subject. A set of outcomes is identified, incorporating the verb definition, subject definition, and adjective definition, and a corpus is searched for evidentiary use. The outcomes are ranked in accordance to evidentiary use found in the corpus the highest ranked outcome is outputted.
    Type: Grant
    Filed: October 6, 2016
    Date of Patent: December 10, 2019
    Assignee: International Business Machines Corporation
    Inventors: Corville O. Allen, Andrew R. Freed
  • Publication number: 20190370540
    Abstract: A method, system and computer-usable medium for classifying a source document using sub-documents identified in the source document. The method, system, and computer-usable medium are used to access the source document from electronic memory. The source document is electronically searched to detect markers indicative of whether the source document includes one or more sub-documents. Incongruities in the source document are located using the detected markers and the source document is split into sub-documents at the located incongruities. Each of the sub-documents is classified. The sub-documents are joined as a re-assembled source document with classifications including classifications for one or more of the sub-documents.
    Type: Application
    Filed: May 31, 2018
    Publication date: December 5, 2019
    Inventors: Andrew R. Freed, Corville O. Allen
  • Publication number: 20190361980
    Abstract: Improved data ingestion techniques are provided. A data set comprising records is received, where each record contains one or more fields. A group of fields is identified, where each of the fields has a common metadata attribute. Metrics are determined for the group based on metadata associated with each field, and weight values are assigned to each of the metrics. A natural language processing (NLP) measure and a discreteness measure are generated for the group of fields based on the metrics and the weight values. A processing workflow is selected to use when ingesting data from the group of fields into a corpus, based on comparing the NLP measure and the discreteness measure to one or more predefined thresholds, and each of the fields in the group of fields are processed using the processing workflow.
    Type: Application
    Filed: May 22, 2018
    Publication date: November 28, 2019
    Inventors: Troy BIESTERFELD, Andrew R FREED, Elizabeth Teresa DETTMAN, Jeremy J SALSMAN, Paul R CHMIELEWSKI
  • Patent number: 10489442
    Abstract: A method, system, and computer program product for identifying related information in dissimilar data are provided in the illustrative embodiments. Using a first part of a first entry in a dictionary, a first portion is identified in a first data, the first part matching the first portion within a tolerance. A second part of the first entry referencing a section of a second data is determined, the second data being organized in a repository according to a schema. A third part of the first entry sufficient to locate a record in the section of the second data is determined. A query is constructed using the second part and the third part, and performed on the second data. A result set is obtained, wherein a record in the result set is related to the first portion in the first data and the record does not include the first portion.
    Type: Grant
    Filed: January 19, 2015
    Date of Patent: November 26, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Andrew R. Freed, Ahmed M. Nassar, Eman Omar, Craig M. Trim
  • Publication number: 20190354555
    Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: determining user clusters and navigation-type clusters based on multiple information requests, and training facets and corresponding usefulness factor of the facets from the multiple information requests by machine learning. When a user submits a query, the user and the query is respectively mapped with one of the user clusters and the navigation-type clusters, and the query is customized based on the associated pair of clusters. Results of the query are obtained, ranked by usefulness of the facets as determined according to the pair of clusters, and presented to the user.
    Type: Application
    Filed: July 30, 2019
    Publication date: November 21, 2019
    Inventors: Andrew R. FREED, Norbert HERMAN, Shubhadip RAY, Avik SANYAL
  • Publication number: 20190332644
    Abstract: A computer-implemented method includes receiving first lossy converted documents. The computer-implemented method includes generating corrected documents for the first lossy converted documents. Each of the corrected documents includes edit markers that reflect structure changes relative to a corresponding document of the first lossy converted documents. The computer-implemented method includes generating feature vectors for the first lossy converted documents. The feature vectors include structure features of the first lossy converted documents. The computer-implemented method includes training one or more models based on the structure features and the edit markers. The computer-implemented method includes applying the trained one or more models to second lossy converted documents to determine proposed structure edits. The computer-implemented method includes transforming the second lossy converted documents to second corrected documents by applying one or more of the proposed structure edits.
    Type: Application
    Filed: April 27, 2018
    Publication date: October 31, 2019
    Inventors: Andrew R. Freed, Corville O. Allen
  • Patent number: 10430713
    Abstract: A mechanism is provided in a data processing system for predicting and enhancing ingestion time for a set of input documents. The mechanism receives a set of documents to be added to a corpus of the data processing system. The mechanism records document features of each document within the set of documents using an annotation engine within the data processing system. The mechanism predicts an ingestion time for each document within the set of documents based on the document characteristics and a machine learning model. The mechanism assigns the set of documents to data processing system resources to be processed based on the predicted ingestion time for each document.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: October 1, 2019
    Assignee: International Business Machines Corporation
    Inventors: Corville O. Allen, Andrew R. Freed
  • Patent number: 10430465
    Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: determining user clusters and navigation-type clusters based on multiple information requests, and training facets and corresponding usefulness factor of the facets from the multiple information requests by machine learning. When a user submits a query, the user and the query is respectively mapped with one of the user clusters and the navigation-type clusters, and the query is customized based on the associated pair of clusters. Results of the query are obtained, ranked by usefulness of the facets as determined according to the pair of clusters, and presented to the user.
    Type: Grant
    Filed: January 4, 2017
    Date of Patent: October 1, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Andrew R. Freed, Norbert Herman, Shubhadip Ray, Avik Sanyal