Patents by Inventor Douglas R. Burdick
Douglas R. Burdick has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11734939Abstract: Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.Type: GrantFiled: November 18, 2021Date of Patent: August 22, 2023Assignee: International Business Machines CorporationInventors: Xin Ru Wang, Douglas R. Burdick, Xinyi Zheng
-
Patent number: 11734576Abstract: Methods, systems, and computer program products for cooperative neural networks with spatial containment constraints are provided herein. A computer-implemented method includes dividing a processing task into multiple sub-tasks; training multiple independent neural networks, such that at least some of the multiple sub-tasks correspond to different ones of the multiple independent neural networks; defining, via implementing constraint-based domain knowledge related to the processing task in connection with the multiple independent neural networks, a constraint loss for a given one of the multiple sub-tasks, the constraint loss being dependent on output from at least one of the other multiple sub-tasks; and effecting re-training of at least a portion of the multiple independent neural networks by incorporating the constraint loss into at least one of the multiple independent neural networks.Type: GrantFiled: April 14, 2020Date of Patent: August 22, 2023Assignee: International Business Machines CorporationInventors: Xin Ru Wang, Xinyi Zheng, Douglas R. Burdick, Ioannis Katsis
-
Publication number: 20220076012Abstract: Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.Type: ApplicationFiled: November 18, 2021Publication date: March 10, 2022Inventors: Xin Ru Wang, Douglas R. Burdick, Xinyi Zheng
-
Patent number: 11222201Abstract: Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.Type: GrantFiled: April 14, 2020Date of Patent: January 11, 2022Assignee: International Business Machines CorporationInventors: Xin Ru Wang, Douglas R. Burdick, Xinyi Zheng
-
Patent number: 11194826Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.Type: GrantFiled: February 8, 2019Date of Patent: December 7, 2021Assignee: International Business Machines CorporationInventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
-
Publication number: 20210319325Abstract: Methods, systems, and computer program products for cooperative neural networks with spatial containment constraints are provided herein. A computer-implemented method includes dividing a processing task into multiple sub-tasks; training multiple independent neural networks, such that at least some of the multiple sub-tasks correspond to different ones of the multiple independent neural networks; defining, based at least in part on constraint-based domain knowledge related to the processing task, at least one constraint loss for a given one of the multiple sub-tasks, the at least one constraint loss being dependent on output from at least one of the other multiple sub-tasks; and re-training at least a portion of the multiple independent neural networks, the re-training being dependent on using the at least one constraint loss.Type: ApplicationFiled: April 14, 2020Publication date: October 14, 2021Inventors: Xin Ru Wang, Xinyi Zheng, Douglas R. Burdick, Ioannis Katsis
-
Publication number: 20210319217Abstract: Methods, systems, and computer program products for vision-based cell structure recognition using hierarchical neural networks and cell boundaries to structure clustering are provided herein. A computer-implemented method includes detecting a style of the given table using at least one style classification model; selecting, based at least in part on the detected style, a cell detection model appropriate for the detected style; detecting cells within the given table using the selected cell detection model; and outputting, to at least one user, information pertaining to the detected cells comprising image coordinates of one or more bounding boxes associated with the detected cells.Type: ApplicationFiled: April 14, 2020Publication date: October 14, 2021Inventors: Xin Ru Wang, Douglas R. Burdick, Xinyi Zheng
-
Publication number: 20190171641Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.Type: ApplicationFiled: February 8, 2019Publication date: June 6, 2019Inventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
-
Patent number: 10229168Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.Type: GrantFiled: November 20, 2015Date of Patent: March 12, 2019Assignee: International Business Machines CorporationInventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
-
Patent number: 9996607Abstract: Described herein are methods, systems and computer program products for entity resolution. Entity resolution, also known as entity matching or record linkage, seeks to identify equivalent data objects between or among datasets. An example method includes creating a deterministic model by defining an entity to be resolved, selecting two datasets for comparison, defining matching predicates for attributes of the datasets to select a set of candidate matches, and defining a precedence rule for the candidate matches to select a subset of the candidate matches. The method includes running the deterministic model on the two datasets. Running the deterministic model includes applying the matching predicates and the precedence rule to data in the datasets that correspond to the attributes. The method also includes applying a cardinality rule to results of the running, and outputting the matching candidates for which the cardinality rule is satisfied.Type: GrantFiled: October 31, 2014Date of Patent: June 12, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bogdan Alexe, Douglas R. Burdick, Mauricio A. Hernandez-Sherrington, Hima P. Karanam, Rajasekar Krishnamurthy, Lucian Popa, Shivakumar Vaithyanathan
-
Patent number: 9684493Abstract: In a method for analyzing a large data set using a statistical computing environment language operation, a processor generates code from the statistical computing environment language operation that can be understood by a software system for processing machine learning algorithms in a MapReduce environment. A processor transfers the code to the software system for processing machine learning algorithms in a MapReduce environment. A processor invokes execution of the code with the software system for processing machine learning algorithms in a MapReduce environment.Type: GrantFiled: June 2, 2014Date of Patent: June 20, 2017Assignee: International Business Machines CorporationInventors: Matthias Boehm, Douglas R. Burdick, Stefan Burnicki, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20170147673Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.Type: ApplicationFiled: November 20, 2015Publication date: May 25, 2017Inventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
-
Publication number: 20160125067Abstract: Embodiments relate to entity resolution. One aspect includes creating a deterministic model by defining an entity to be resolved, selecting two datasets for comparison, defining matching predicates for attributes of the datasets to select a set of candidate matches, and defining a precedence rule for the candidate matches to select a subset of the candidate matches. An aspect further includes running the deterministic model on the two datasets. Running the deterministic model includes applying the matching predicates and the precedence rule to data in the datasets that correspond to the attributes. An aspect also includes applying a cardinality rule to results of the running, and outputting the matching candidates for which the cardinality rule is satisfied.Type: ApplicationFiled: October 31, 2014Publication date: May 5, 2016Inventors: Bogdan Alexe, Douglas R. Burdick, Mauricio A. Hernandez-Sherrington, Hima P. Karanam, Rajasekar Krishnamurthy, Lucian Popa, Shivakumar Vaithyanathan
-
Publication number: 20150347101Abstract: In a method for analyzing a large data set using a statistical computing environment language operation, a processor generates code from the statistical computing environment language operation that can be understood by a software system for processing machine learning algorithms in a MapReduce environment. A processor transfers the code to the software system for processing machine learning algorithms in a MapReduce environment. A processor invokes execution of the code with the software system for processing machine learning algorithms in a MapReduce environment.Type: ApplicationFiled: June 2, 2014Publication date: December 3, 2015Applicant: International Business Machines CorporationInventors: Matthias Boehm, Douglas R. Burdick, Stefan Burnicki, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 7370057Abstract: A system evaluates a first data cleansing application and a second data cleansing application. The system includes a test data generator, an application execution module, and a results reporting module. The test data generator creates a dirty set of sample data from a clean set of data. The application execution module cleanses the dirty set of sample data. The application execution module utilizes the first data cleansing application to cleanse the dirty set of sample data and create a first cleansed output. The application execution module further utilizes the second data cleansing application to cleanse the dirty set of sample data and create a second cleansed output. The results reporting module evaluates the first and second cleansed output. The results reporting module produces an output of scores and statistics for each of the first and second data cleansing applications.Type: GrantFiled: December 3, 2002Date of Patent: May 6, 2008Assignee: Lockheed Martin CorporationInventors: Douglas R. Burdick, Robert J. Szczerba, Joseph H. Visgitus
-
Patent number: 7225412Abstract: A system views results of a data cleansing application. The system includes a results visualization module and a learning visualization interface module. The results visualization module organizes output of the data cleansing application into a predefined format. The results visualization module displays the output to a user. The learning visualization interface module facilitates interaction with the data cleansing application by the user.Type: GrantFiled: December 3, 2002Date of Patent: May 29, 2007Assignee: Lockheed Martin CorporationInventors: Douglas R. Burdick, Robert J. Szczerba, Wei Kang Zhan
-
Patent number: 7020804Abstract: A system evaluates a data cleansing application. The system includes a collection of records cleansed by the data cleansing application, a plurality of dirtying functions for operating upon the collection to introduce errors to the collection, and a record of the errors introduced to the cleansed collection. The plurality of dirtying functions produces a collection of dirty records.Type: GrantFiled: December 3, 2002Date of Patent: March 28, 2006Assignee: Lockheed Martin CorporationInventors: Douglas R. Burdick, Robert J. Szczerba
-
Publication number: 20040181501Abstract: A system represents data during a data cleansing application. The system includes a record collection. Each record in the collection includes a list of fields and data contained in each field. The system further includes a predetermined sequence of operations to be performed on the record collection and a plurality of bit-maps representing the record collection. The system still further includes a partitioned sequence of operations for parallel processing of the bit-maps by a plurality of separate devices.Type: ApplicationFiled: March 11, 2003Publication date: September 16, 2004Applicant: Lockheed Martin CorporationInventors: Douglas R. Burdick, Steven Rostedt, Robert J. Szczerba
-
Publication number: 20040181526Abstract: A system learns a record similarity measurement. The system includes a set of record clusters. Each record in each cluster may have a list of fields and data contained in each field. The system may further include a predetermined threshold score for two of the records in one of the clusters to be considered similar and at least one decision tree constructed from a portion of the set of clusters. The decision tree encodes rules for determining a field similarity score of a related set of fields. The system may further include an output set of record pairs that are determined to be duplicate records. The output set of record pairs may have a record similarity score greater than or equal to the predetermined threshold score.Type: ApplicationFiled: March 11, 2003Publication date: September 16, 2004Applicant: Lockheed Martin CorporationInventors: Douglas R. Burdick, Robert J. Szczerba
-
Publication number: 20040181512Abstract: A system builds an extended dictionary for a data cleansing application. The system includes a record collection. Each record in the collection includes a list of fields and data contained in each field. The system further includes an input dictionary defining predetermined valid values for variants of values in at least one of the fields and a set of rules derived from patterns of the field values. The system still further includes an extended dictionary including the input dictionary and the rules.Type: ApplicationFiled: March 11, 2003Publication date: September 16, 2004Applicant: Lockheed Martin CorporationInventors: Douglas R. Burdick, Robert J. Szczerba