Patents by Inventor Victor Tze-Yeuan Tso
Victor Tze-Yeuan Tso has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11461304Abstract: Signature-based cache optimization for data preparation includes: performing a first set of sequenced data preparation operations on one or more sets of data to generate a plurality of transformation results; caching one or more of the plurality of transformation results and one or more corresponding operation signatures, a cached operation signature being derived based at least in part on a subset of sequenced operations that generated a corresponding result; receiving a specification of a second set of sequenced operations; determining an operation signature associated with the second set of sequenced operations; identifying a cached result among the cached results based at least in part on the determined operation signature; and outputting the cached result.Type: GrantFiled: March 10, 2020Date of Patent: October 4, 2022Assignee: DataRobot, Inc.Inventors: Dave Brewster, Victor Tze-Yeuan Tso
-
Publication number: 20220284183Abstract: A step editor for data preparation can instruct a user interface to present a first plurality of operations to be applied in a sequential order to one or more sets of data, receive user inputs including at least one indication to mute at least one operation of the first plurality of operations to prevent the processors from performing the at least one operation, generate a second plurality of operations, the second plurality of operations to be applied in a sequential order to the sets of data and comprising the first plurality of operations excluding the operation muted by the user inputs, obtain a cached data traversal program associated with the second plurality of operations and comprising a representation of a result of transforming the sets of data, and instruct the user interface to present output based at least in part on execution of the cached data traversal program.Type: ApplicationFiled: March 25, 2022Publication date: September 8, 2022Applicant: DataRobot, Inc.Inventors: Nenshad Bardoliwalla, Michael Matthews, Ian Timourian, Jing Chen, Lilia Gutnik, Whitman Kwok, Dave Brewster, Victor Tze-Yeuan Tso
-
Patent number: 11288447Abstract: Using a step editor for data preparation includes: receiving an indication of a user input with respect to at least some of a set of sequenced data preparation operations on a set of data; generating, using one or more processors, a signature based at least in part on the set of sequenced data preparation operations, references to the set of data, and the user input; using the generated signature to determine whether there exists a cached result associated with the set of sequenced data preparation operations, the references to the set of data, and the user input; based at least in part on the determination, obtaining a data traversal program representing a result associated with the set of sequenced operations, the references to the set of data, and the user input; and providing output based at least in part on the result represented by the obtained data traversal program.Type: GrantFiled: March 10, 2020Date of Patent: March 29, 2022Assignee: DR HoldCo 2, Inc.Inventors: Nenshad Dinshaw Bardoliwalla, Michael Matthews, Ian Timourian, Jing Chen, Lilia Gutnik, Whitman Kwok, Dave Brewster, Victor Tze-Yeuan Tso
-
Publication number: 20220012231Abstract: Automatic append includes: identifying, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns, a plurality of matching columns and a plurality of non-matching columns. The matching columns comprise one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns. The non-matching columns comprise: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns.Type: ApplicationFiled: May 21, 2021Publication date: January 13, 2022Inventors: Dave Brewster, Victor Tze-Yeuan Tso, Ashley Ping Jin, Quan Chuong Ta, Lakshman Roy Sankar, Whitman Kwok
-
Patent number: 11169978Abstract: Distributed pipeline optimization for data preparation includes receiving a specification of a set of sequenced operations to be performed on a set of organized data. It further includes dividing the set of data into a plurality of work portions based on a cost function that is dependent on at least one dimension of the set of data. It further includes distributing the plurality of work portions to a plurality of processing nodes to be processed according to the specification of operations.Type: GrantFiled: October 14, 2015Date of Patent: November 9, 2021Assignee: DR HoldCo 2, Inc.Inventors: Dave Brewster, Victor Tze-Yeuan Tso
-
Patent number: 11030183Abstract: Automatic append includes: identifying, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns, a plurality of matching columns and a plurality of non-matching columns. The matching columns comprise one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns. The non-matching columns comprise: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns.Type: GrantFiled: September 13, 2016Date of Patent: June 8, 2021Assignee: DR HoldCo 2, Inc.Inventors: Dave Brewster, Victor Tze-Yeuan Tso, Ashley Ping Jin, Quan Chuong Ta, Lakshman Roy Sankar, Whitman Kwok
-
Publication number: 20210056090Abstract: Cache optimization for data preparation includes: generating a data traversal program that represents a result of a set of sequenced data preparation operations performed on one or more sets of data, wherein the data traversal program indicates how to assemble one or more affected columns in the one or more sets of data to derive the result; in response to receiving a specification of the set of sequenced operations to be performed on the one or more sets of data, accessing the data traversal program that represents the result or a stored copy of the data traversal program that represents the result; assembling the one or more affected columns in the one or more sets of data according to the data traversal program to re-generate the result; and outputting the result.Type: ApplicationFiled: July 1, 2020Publication date: February 25, 2021Inventors: Dave Brewster, Victor Tze-Yeuan Tso
-
Patent number: 10740316Abstract: Cache optimization for data preparation includes generating a data traversal program that represents a result of a set of sequenced data preparation operations performed on one or more sets of data. The data traversal program indicates how to assemble one or more affected columns in the one or more sets of data to derive the result. It further includes in response to receiving a specification of the set of sequenced operations to be performed on the one or more sets of data, accessing the data traversal program that represents the result or a stored copy of the data traversal program that represents the result. It further includes assembling the one or more affected columns in the one or more sets of data according to the data traversal program to re-generate the result. It further includes outputting the result.Type: GrantFiled: October 14, 2015Date of Patent: August 11, 2020Assignee: DR HoldCo 2, Inc.Inventors: Dave Brewster, Victor Tze-Yeuan Tso
-
Publication number: 20200210642Abstract: Using a step editor for data preparation includes: receiving an indication of a user input with respect to at least some of a set of sequenced data preparation operations on a set of data; generating, using one or more processors, a signature based at least in part on the set of sequenced data preparation operations, references to the set of data, and the user input; using the generated signature to determine whether there exists a cached result associated with the set of sequenced data preparation operations, the references to the set of data, and the user input; based at least in part on the determination, obtaining a data traversal program representing a result associated with the set of sequenced operations, the references to the set of data, and the user input; and providing output based at least in part on the result represented by the obtained data traversal program.Type: ApplicationFiled: March 10, 2020Publication date: July 2, 2020Inventors: Nenshad Dinshaw Bardoliwalla, Michael Matthews, Ian Timourian, Jing Chen, Lilia Gutnik, Whitman Kwok, Dave Brewster, Victor Tze-Yeuan Tso
-
Publication number: 20200210399Abstract: Signature-based cache optimization for data preparation includes: performing a first set of sequenced data preparation operations on one or more sets of data to generate a plurality of transformation results; caching one or more of the plurality of transformation results and one or more corresponding operation signatures, a cached operation signature being derived based at least in part on a subset of sequenced operations that generated a corresponding result; receiving a specification of a second set of sequenced operations; determining an operation signature associated with the second set of sequenced operations; identifying a cached result among the cached results based at least in part on the determined operation signature; and outputting the cached result.Type: ApplicationFiled: March 10, 2020Publication date: July 2, 2020Inventors: Dave Brewster, Victor Tze-Yeuan Tso
-
Patent number: 10642815Abstract: Using a step editor for data preparation includes receiving an indication of a user input with respect to at least some of a set of sequenced data preparation operations on a set of data. It further includes generating, using one or more processors, a signature based at least in part on the set of sequenced data preparation operations, references to the set of data, and the user input. It further includes using the generated signature to determine whether there exists a cached result associated with the set of sequenced data preparation operations, the references to the set of data, and the user input. It further includes based at least in part on the determination, obtaining a data traversal program representing a result associated with the set of sequenced operations, the references to the set of data, and the user input. It further includes providing output based at least in part on the result represented by the obtained data traversal program.Type: GrantFiled: October 14, 2015Date of Patent: May 5, 2020Assignee: Paxata, Inc.Inventors: Nenshad Dinshaw Bardoliwalla, Michael Matthews, Ian Timourian, Jing Chen, Lilia Gutnik, Whitman Kwok, Dave Brewster, Victor Tze-Yeuan Tso
-
Patent number: 10642814Abstract: Signature-based cache optimization for data preparation includes performing a first set of sequenced data preparation operations on one or more sets of data to generate a plurality of transformation results. It further includes caching one or more of the plurality of transformation results and one or more corresponding operation signatures, a cached operation signature being derived based at least in part on a subset of sequenced operations that generated a corresponding result. It further includes receiving a specification of a second set of sequenced operations. It further includes determining an operation signature associated with the second set of sequenced operations. It further includes identifying a cached result among the cached results based at least in part on the determined operation signature; and outputting the cached result.Type: GrantFiled: October 14, 2015Date of Patent: May 5, 2020Assignee: Paxata, Inc.Inventors: Dave Brewster, Victor Tze-Yeuan Tso
-
Patent number: 10216792Abstract: Automated join detection includes: identifying a set of one or more candidate joins of a first table and a second table; evaluating a set of one or more quality measures corresponding to the set of one or more candidate joins; obtaining a set of one or more selected joins among the set of one or more candidate joins, the set of one or more selected joins being selected based at least in part on one or more corresponding quality measures; and generating a joined table, including by joining the first table and the second table according to a selected join.Type: GrantFiled: October 14, 2015Date of Patent: February 26, 2019Assignee: Paxata, Inc.Inventors: Dave Brewster, Victor Tze-Yeuan Tso, Ashley Jin, Quan Chuong Ta, Lakshman Roy Sankar, Nenshad Dinshaw Bardoliwalla
-
Publication number: 20170262491Abstract: Automatic append includes: identifying, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns, a plurality of matching columns and a plurality of non-matching columns. The matching columns comprise one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns. The non-matching columns comprise: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns.Type: ApplicationFiled: September 13, 2016Publication date: September 14, 2017Inventors: David Brewster, Victor Tze-Yeuan Tso, Ashley Ping Jin, Quan Chuong Ta, Lakshman Roy Sankar, Whitman Kwok
-
Publication number: 20170109402Abstract: Automated join detection includes: identifying a set of one or more candidate joins of a first table and a second table; evaluating a set of one or more quality measures corresponding to the set of one or more candidate joins; obtaining a set of one or more selected joins among the set of one or more candidate joins, the set of one or more selected joins being selected based at least in part on one or more corresponding quality measures; and generating a joined table, including by joining the first table and the second table according to a selected join.Type: ApplicationFiled: October 14, 2015Publication date: April 20, 2017Inventors: Dave Brewster, Victor Tze-Yeuan Tso, Ashley Jin, Quan Chuong Ta, Lakshman Roy Sankar, Nenshad Dinshaw Bardoliwalla
-
Publication number: 20170109388Abstract: Signature-based cache optimization for data preparation includes: performing a first set of sequenced data preparation operations on one or more sets of data to generate a plurality of transformation results; caching one or more of the plurality of transformation results and one or more corresponding operation signatures, a cached operation signature being derived based at least in part on a subset of sequenced operations that generated a corresponding result; receiving a specification of a second set of sequenced operations; determining an operation signature associated with the second set of sequenced operations; identifying a cached result among the cached results based at least in part on the determined operation signature; and outputting the cached result.Type: ApplicationFiled: October 14, 2015Publication date: April 20, 2017Inventors: Dave Brewster, Victor Tze-Yeuan Tso
-
Publication number: 20170109387Abstract: Cache optimization for data preparation includes: generating a data traversal program that represents a result of a set of sequenced data preparation operations performed on one or more sets of data, wherein the data traversal program indicates how to assemble one or more affected columns in the one or more sets of data to derive the result; in response to receiving a specification of the set of sequenced operations to be performed on the one or more sets of data, accessing the data traversal program that represents the result or a stored copy of the data traversal program that represents the result; assembling the one or more affected columns in the one or more sets of data according to the data traversal program to re-generate the result; and outputting the result.Type: ApplicationFiled: October 14, 2015Publication date: April 20, 2017Inventors: Dave Brewster, Victor Tze-Yeuan Tso
-
Publication number: 20170109389Abstract: Using a step editor for data preparation includes: receiving an indication of a user input with respect to at least some of a set of sequenced data preparation operations on a set of data; generating, using one or more processors, a signature based at least in part on the set of sequenced data preparation operations, references to the set of data, and the user input; using the generated signature to determine whether there exists a cached result associated with the set of sequenced data preparation operations, the references to the set of data, and the user input; based at least in part on the determination, obtaining a data traversal program representing a result associated with the set of sequenced operations, the references to the set of data, and the user input; and providing output based at least in part on the result represented by the obtained data traversal program.Type: ApplicationFiled: October 14, 2015Publication date: April 20, 2017Inventors: Nenshad Dinshaw Bardoliwalla, Michael Matthews, Ian Timourian, Jing Chen, Lilia Gutnik, Whitman Kwok, Dave Brewster, Victor Tze-Yeuan Tso
-
Publication number: 20170109378Abstract: Distributed pipeline optimization for data preparation includes: receiving a specification of a set of sequenced operations to be performed on a set of organized data; dividing the set of data into a plurality of work portions based on a cost function that is dependent on at least one dimension of the set of data; and distributing the plurality of work portions to a plurality of processing nodes to be processed according to the specification of operations.Type: ApplicationFiled: October 14, 2015Publication date: April 20, 2017Inventors: Dave Brewster, Victor Tze-Yeuan Tso