Patents by Inventor Bart Samwel

Bart Samwel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient merging of tabular data with post-processing compaction

Patent number: 12681934

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

Type: Grant

Filed: July 10, 2024

Date of Patent: July 14, 2026

Assignee: Databricks, Inc.

Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel, Prakhar Jain
Concurrent optimistic transactions for tables with deletion vectors

Patent number: 12596700

Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

Type: Grant

Filed: October 28, 2024

Date of Patent: April 7, 2026

Assignee: Databricks, Inc.

Inventors: Bart Samwel, Christos Stavrakakis
K-D tree balanced splitting

Patent number: 12561303

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

Type: Grant

Filed: July 15, 2024

Date of Patent: February 24, 2026

Assignee: Databricks, Inc.

Inventors: Bart Samwel, Prakhar Jain
FETCHING QUERY RESULTS THROUGH CLOUD OBJECT STORES

Publication number: 20250348501

Abstract: A cloud computation system configured to 1) receive a first request to read a first set of query results stored in a cloud based data storage; 2) transmit a first subset of the first set of query results in response to the first request; 3) transmit a second subset of the first set of query results in response to the first request; 4) receive a second request to read a second set of query results stored in the cloud based data storage; 5) transmit a first subset of the second set of query results in response to the second request. 6) transmit a second subset of the second set of query results in response to the second request.

Type: Application

Filed: July 21, 2025

Publication date: November 13, 2025

Inventors: Bogdan Ionut Ghit, Juliusz Sompolski, Shi Xin, Bart Samwel
Data maintenance transaction rollbacks

Patent number: 12430294

Abstract: The present application discloses a method, system, and computer system for managing a data in a storage system. The method includes receiving a first transaction that modifies or deletes first data stored in a storage system, determining that the first data is subject to an intervening re-arrangement transaction, and in response to determining that the first data is subject to the intervening re-arrangement transaction, rolling back the re-arrangement transaction at least with respect to the first data and committing the first transaction.

Type: Grant

Filed: August 15, 2024

Date of Patent: September 30, 2025

Assignee: Databricks, Inc.

Inventors: Prakhar Jain, Bart Samwel, Burak Yavuz
Data file clustering with KD-classifier trees

Patent number: 12405920

Abstract: A data processing service generates a data classifier tree for managing data files of a data table. The data classifier tree may be configured as a KD-classifier tree and includes a plurality of nodes and edges. A node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. A node of the data classifier tree may be associated with one or more data files assigned to the node. The data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. The data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.

Type: Grant

Filed: July 5, 2023

Date of Patent: September 2, 2025

Assignee: Databricks, Inc.

Inventors: Prakhar Jain, Frederick Ryan Johnson, Terry Kim, Vijayan Prabhakaran, Bart Samwel
Fetching query results through cloud object stores

Patent number: 12399901

Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.

Type: Grant

Filed: March 22, 2024

Date of Patent: August 26, 2025

Assignee: Databricks, Inc.

Inventors: Bogdan Ionut Ghit, Juliusz Sompolski, Shi Xin, Bart Samwel
Efficient merge of tabular data using a processing filter

Patent number: 12353843

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first, second and a third jobs, and obtaining a resulting table based at least in part on the second job resulting file(s) and third job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s). Performing the third job includes determining unmatched rows for target table files and storing the unmatched rows in third job resulting file(s).

Type: Grant

Filed: August 25, 2022

Date of Patent: July 8, 2025

Assignee: Databricks, Inc.

Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel
Efficient merge of tabular data using mixing

Patent number: 12346330

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and obtaining other resulting files based at least in part on a second set of unmatched rows among the target table and the source table that results from the first set of unmatched rows having been processed in the second job, and obtaining a resulting table based on (i) second job resulting file(s), and (ii) other resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a first matching action based on matched rows and a second matching action based on a subset of unmatched rows.

Type: Grant

Filed: August 25, 2022

Date of Patent: July 1, 2025

Assignee: Databricks, Inc.

Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel
Data file clustering with KD-epsilon trees

Patent number: 12332862

Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

Type: Grant

Filed: July 6, 2023

Date of Patent: June 17, 2025

Assignee: Databricks, Inc.

Inventors: Prakhar Jain, Frederick Ryan Johnson, Bart Samwel
CONCURRENT OPTIMISTIC TRANSACTIONS FOR TABLES WITH DELETION VECTORS

Publication number: 20250103580

Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

Type: Application

Filed: October 28, 2024

Publication date: March 27, 2025

Inventors: Bart Samwel, Christos Stavrakakis
K-D Tree Balanced Splitting

Publication number: 20250086155

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

Type: Application

Filed: July 15, 2024

Publication date: March 13, 2025

Inventors: Bart Samwel, Prakhar Jain
DATA FILE CLUSTERING WITH KD-CLASSIFIER TREES

Publication number: 20250013606

Abstract: A data processing service generates a data classifier tree for managing data files of a data table. The data classifier tree may be configured as a KD-classifier tree and includes a plurality of nodes and edges. A node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. A node of the data classifier tree may be associated with one or more data files assigned to the node. The data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. The data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.

Type: Application

Filed: July 5, 2023

Publication date: January 9, 2025

Inventors: Prakhar Jain, Frederick Ryan Johnson, Terry Kim, Vijayan Prabhakaran, Bart Samwel
DATA FILE CLUSTERING WITH KD-EPSILON TREES

Publication number: 20250013619

Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

Type: Application

Filed: July 6, 2023

Publication date: January 9, 2025

Inventors: Prakhar Jain, Frederick Ryan Johnson, Bart Samwel
Efficient Merging of Tabular Data with Post-Processing Compaction

Publication number: 20250013644

Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

Type: Application

Filed: July 10, 2024

Publication date: January 9, 2025

Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel, Prakhar Jain
Fetching Query Results Through Cloud Object Stores

Publication number: 20240394271

Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.

Type: Application

Filed: March 22, 2024

Publication date: November 28, 2024

Inventors: Bogdan Ionut Ghit, Juliusz Sompolski, Shi Xin, Bart Samwel
Concurrent optimistic transactions for tables with deletion vectors

Patent number: 12147412

Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

Type: Grant

Filed: January 18, 2023

Date of Patent: November 19, 2024

Assignee: Databricks, Inc.

Inventors: Bart Samwel, Christos Stavrakakis
Data maintenance transaction rollbacks

Patent number: 12072843

Abstract: The present application discloses a method, system, and computer system for managing a data in a storage system. The method includes receiving a first transaction that modifies or deletes first data stored in a storage system, determining that the first data is subject to an intervening re-arrangement transaction, and in response to determining that the first data is subject to the intervening re-arrangement transaction, rolling back the re-arrangement transaction at least with respect to the first data and committing the first transaction.

Type: Grant

Filed: January 20, 2022

Date of Patent: August 27, 2024

Assignee: Databricks, Inc.

Inventors: Prakhar Jain, Bart Samwel, Burak Yavuz
Data ingestion using data file clustering with KD-epsilon trees

Patent number: 12072863

Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

Type: Grant

Filed: July 5, 2023

Date of Patent: August 27, 2024

Assignee: Databricks, Inc.

Inventors: Prakhar Jain, Frederick Ryan Johnson, Bart Samwel
K-D tree balanced splitting

Patent number: 12061586

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

Type: Grant

Filed: May 6, 2022

Date of Patent: August 13, 2024

Assignee: Databricks, Inc.

Inventors: Bart Samwel, Prakhar Jain

1 2 next