Patents by Inventor Douglas R. Burdick

Douglas R. Burdick has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11734939
    Abstract: Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.
    Type: Grant
    Filed: November 18, 2021
    Date of Patent: August 22, 2023
    Assignee: International Business Machines Corporation
    Inventors: Xin Ru Wang, Douglas R. Burdick, Xinyi Zheng
  • Patent number: 11734576
    Abstract: Methods, systems, and computer program products for cooperative neural networks with spatial containment constraints are provided herein. A computer-implemented method includes dividing a processing task into multiple sub-tasks; training multiple independent neural networks, such that at least some of the multiple sub-tasks correspond to different ones of the multiple independent neural networks; defining, via implementing constraint-based domain knowledge related to the processing task in connection with the multiple independent neural networks, a constraint loss for a given one of the multiple sub-tasks, the constraint loss being dependent on output from at least one of the other multiple sub-tasks; and effecting re-training of at least a portion of the multiple independent neural networks by incorporating the constraint loss into at least one of the multiple independent neural networks.
    Type: Grant
    Filed: April 14, 2020
    Date of Patent: August 22, 2023
    Assignee: International Business Machines Corporation
    Inventors: Xin Ru Wang, Xinyi Zheng, Douglas R. Burdick, Ioannis Katsis
  • Publication number: 20220076012
    Abstract: Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.
    Type: Application
    Filed: November 18, 2021
    Publication date: March 10, 2022
    Inventors: Xin Ru Wang, Douglas R. Burdick, Xinyi Zheng
  • Patent number: 11222201
    Abstract: Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.
    Type: Grant
    Filed: April 14, 2020
    Date of Patent: January 11, 2022
    Assignee: International Business Machines Corporation
    Inventors: Xin Ru Wang, Douglas R. Burdick, Xinyi Zheng
  • Patent number: 11194826
    Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.
    Type: Grant
    Filed: February 8, 2019
    Date of Patent: December 7, 2021
    Assignee: International Business Machines Corporation
    Inventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
  • Publication number: 20210319217
    Abstract: Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.
    Type: Application
    Filed: April 14, 2020
    Publication date: October 14, 2021
    Inventors: Xin Ru Wang, Douglas R. Burdick, Xinyi Zheng
  • Publication number: 20210319325
    Abstract: Methods, systems, and computer program products for cooperative neural networks with spatial containment constraints are provided herein. A computer-implemented method includes dividing a processing task into multiple sub-tasks; training multiple independent neural networks, such that at least some of the multiple sub-tasks correspond to different ones of the multiple independent neural networks; defining, based at least in part on constraint-based domain knowledge related to the processing task, at least one constraint loss for a given one of the multiple sub-tasks, the at least one constraint loss being dependent on output from at least one of the other multiple sub-tasks; and re-training at least a portion of the multiple independent neural networks, the re-training being dependent on using the at least one constraint loss.
    Type: Application
    Filed: April 14, 2020
    Publication date: October 14, 2021
    Inventors: Xin Ru Wang, Xinyi Zheng, Douglas R. Burdick, Ioannis Katsis
  • Publication number: 20190171641
    Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.
    Type: Application
    Filed: February 8, 2019
    Publication date: June 6, 2019
    Inventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
  • Patent number: 10229168
    Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.
    Type: Grant
    Filed: November 20, 2015
    Date of Patent: March 12, 2019
    Assignee: International Business Machines Corporation
    Inventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
  • Patent number: 9996607
    Abstract: Described herein are methods, systems and computer program products for entity resolution. Entity resolution, also known as entity matching or record linkage, seeks to identify equivalent data objects between or among datasets. An example method includes creating a deterministic model by defining an entity to be resolved, selecting two datasets for comparison, defining matching predicates for attributes of the datasets to select a set of candidate matches, and defining a precedence rule for the candidate matches to select a subset of the candidate matches. The method includes running the deterministic model on the two datasets. Running the deterministic model includes applying the matching predicates and the precedence rule to data in the datasets that correspond to the attributes. The method also includes applying a cardinality rule to results of the running, and outputting the matching candidates for which the cardinality rule is satisfied.
    Type: Grant
    Filed: October 31, 2014
    Date of Patent: June 12, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bogdan Alexe, Douglas R. Burdick, Mauricio A. Hernandez-Sherrington, Hima P. Karanam, Rajasekar Krishnamurthy, Lucian Popa, Shivakumar Vaithyanathan
  • Patent number: 9684493
    Abstract: In a method for analyzing a large data set using a statistical computing environment language operation, a processor generates code from the statistical computing environment language operation that can be understood by a software system for processing machine learning algorithms in a MapReduce environment. A processor transfers the code to the software system for processing machine learning algorithms in a MapReduce environment. A processor invokes execution of the code with the software system for processing machine learning algorithms in a MapReduce environment.
    Type: Grant
    Filed: June 2, 2014
    Date of Patent: June 20, 2017
    Assignee: International Business Machines Corporation
    Inventors: Matthias Boehm, Douglas R. Burdick, Stefan Burnicki, Berthold Reinwald, Shirish Tatikonda
  • Publication number: 20170147673
    Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.
    Type: Application
    Filed: November 20, 2015
    Publication date: May 25, 2017
    Inventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
  • Publication number: 20160125067
    Abstract: Embodiments relate to entity resolution. One aspect includes creating a deterministic model by defining an entity to be resolved, selecting two datasets for comparison, defining matching predicates for attributes of the datasets to select a set of candidate matches, and defining a precedence rule for the candidate matches to select a subset of the candidate matches. An aspect further includes running the deterministic model on the two datasets. Running the deterministic model includes applying the matching predicates and the precedence rule to data in the datasets that correspond to the attributes. An aspect also includes applying a cardinality rule to results of the running, and outputting the matching candidates for which the cardinality rule is satisfied.
    Type: Application
    Filed: October 31, 2014
    Publication date: May 5, 2016
    Inventors: Bogdan Alexe, Douglas R. Burdick, Mauricio A. Hernandez-Sherrington, Hima P. Karanam, Rajasekar Krishnamurthy, Lucian Popa, Shivakumar Vaithyanathan
  • Publication number: 20150347101
    Abstract: In a method for analyzing a large data set using a statistical computing environment language operation, a processor generates code from the statistical computing environment language operation that can be understood by a software system for processing machine learning algorithms in a MapReduce environment. A processor transfers the code to the software system for processing machine learning algorithms in a MapReduce environment. A processor invokes execution of the code with the software system for processing machine learning algorithms in a MapReduce environment.
    Type: Application
    Filed: June 2, 2014
    Publication date: December 3, 2015
    Applicant: International Business Machines Corporation
    Inventors: Matthias Boehm, Douglas R. Burdick, Stefan Burnicki, Berthold Reinwald, Shirish Tatikonda
  • Patent number: 7370057
    Abstract: A system evaluates a first data cleansing application and a second data cleansing application. The system includes a test data generator, an application execution module, and a results reporting module. The test data generator creates a dirty set of sample data from a clean set of data. The application execution module cleanses the dirty set of sample data. The application execution module utilizes the first data cleansing application to cleanse the dirty set of sample data and create a first cleansed output. The application execution module further utilizes the second data cleansing application to cleanse the dirty set of sample data and create a second cleansed output. The results reporting module evaluates the first and second cleansed output. The results reporting module produces an output of scores and statistics for each of the first and second data cleansing applications.
    Type: Grant
    Filed: December 3, 2002
    Date of Patent: May 6, 2008
    Assignee: Lockheed Martin Corporation
    Inventors: Douglas R. Burdick, Robert J. Szczerba, Joseph H. Visgitus
  • Patent number: 7225412
    Abstract: A system views results of a data cleansing application. The system includes a results visualization module and a learning visualization interface module. The results visualization module organizes output of the data cleansing application into a predefined format. The results visualization module displays the output to a user. The learning visualization interface module facilitates interaction with the data cleansing application by the user.
    Type: Grant
    Filed: December 3, 2002
    Date of Patent: May 29, 2007
    Assignee: Lockheed Martin Corporation
    Inventors: Douglas R. Burdick, Robert J. Szczerba, Wei Kang Zhan
  • Patent number: 7020804
    Abstract: A system evaluates a data cleansing application. The system includes a collection of records cleansed by the data cleansing application, a plurality of dirtying functions for operating upon the collection to introduce errors to the collection, and a record of the errors introduced to the cleansed collection. The plurality of dirtying functions produces a collection of dirty records.
    Type: Grant
    Filed: December 3, 2002
    Date of Patent: March 28, 2006
    Assignee: Lockheed Martin Corporation
    Inventors: Douglas R. Burdick, Robert J. Szczerba
  • Publication number: 20040181512
    Abstract: A system builds an extended dictionary for a data cleansing application. The system includes a record collection. Each record in the collection includes a list of fields and data contained in each field. The system further includes an input dictionary defining predetermined valid values for variants of values in at least one of the fields and a set of rules derived from patterns of the field values. The system still further includes an extended dictionary including the input dictionary and the rules.
    Type: Application
    Filed: March 11, 2003
    Publication date: September 16, 2004
    Applicant: Lockheed Martin Corporation
    Inventors: Douglas R. Burdick, Robert J. Szczerba
  • Publication number: 20040181526
    Abstract: A system learns a record similarity measurement. The system includes a set of record clusters. Each record in each cluster may have a list of fields and data contained in each field. The system may further include a predetermined threshold score for two of the records in one of the clusters to be considered similar and at least one decision tree constructed from a portion of the set of clusters. The decision tree encodes rules for determining a field similarity score of a related set of fields. The system may further include an output set of record pairs that are determined to be duplicate records. The output set of record pairs may have a record similarity score greater than or equal to the predetermined threshold score.
    Type: Application
    Filed: March 11, 2003
    Publication date: September 16, 2004
    Applicant: Lockheed Martin Corporation
    Inventors: Douglas R. Burdick, Robert J. Szczerba
  • Publication number: 20040181501
    Abstract: A system represents data during a data cleansing application. The system includes a record collection. Each record in the collection includes a list of fields and data contained in each field. The system further includes a predetermined sequence of operations to be performed on the record collection and a plurality of bit-maps representing the record collection. The system still further includes a partitioned sequence of operations for parallel processing of the bit-maps by a plurality of separate devices.
    Type: Application
    Filed: March 11, 2003
    Publication date: September 16, 2004
    Applicant: Lockheed Martin Corporation
    Inventors: Douglas R. Burdick, Steven Rostedt, Robert J. Szczerba