Patents by Inventor Wangchao Le
Wangchao Le has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250139091Abstract: A click-to-script service enables developers of big-data job scripts to quickly see the underlying script operations from optimized execution plans. Once a big-data job is received, the disclosed examples compile it and generate tokens that are associated with each operation of the big-data job. These tokens include may include the file name of the job, the line number of the operation, and/or an Abstract Syntax Tree (AST) node for the given operations. An original execution plan is optimized into an optimized execution plan, and the tokens for the original operations of the job script are assigned to the optimized operations of the optimized execution plan. The optimized execution plan is graphically displayed in an interactive manner such that users may view the optimized execution plan and click on its optimized operations to find the original operations of the job script.Type: ApplicationFiled: November 4, 2024Publication date: May 1, 2025Inventors: Xiangnan LI, Marc Todd FRIEDMAN, Wangchao LE, Evgueni ZABOKRITSKI
-
Patent number: 12164516Abstract: A click-to-script service enables developers of big-data job scripts to quickly see the underlying script operations from optimized execution plans. Once a big-data job is received, the disclosed examples compile it and generate tokens that are associated with each operation of the big-data job. These tokens include may include the file name of the job, the line number of the operation, and/or an Abstract Syntax Tree (AST) node for the given operations. An original execution plan is optimized into an optimized execution plan, and the tokens for the original operations of the job script are assigned to the optimized operations of the optimized execution plan. The optimized execution plan is graphically displayed in an interactive manner such that users may view the optimized execution plan and click on its optimized operations to find the original operations of the job script.Type: GrantFiled: June 25, 2021Date of Patent: December 10, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Xiangnan Li, Marc Todd Friedman, Wangchao Le, Evgueni Zabokritski
-
Patent number: 12067014Abstract: Example aspects include techniques for clustering delete targets for vectorized deletion including retrieving, from a set of delete targets in a distributed database system, a file to be deleted, scanning existing clusters of files marked for deletion to identify at least one existing cluster of files having constraints corresponding to the file, based on identifying the at least one existing cluster of files, adding the file to the at least one existing cluster of files to create a new cluster of files, based on failing to identify the at least one existing cluster of files, creating the new cluster of files including the file, and generating, for each file in the new cluster of files and based on a deletion signal, a delta array including multiple bits representing data items in each file and indicating, based on bit value, target data items to be deleted from each file.Type: GrantFiled: June 14, 2023Date of Patent: August 20, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Wangchao Le, Marc Todd Friedman, Hiren Patel
-
Publication number: 20240126754Abstract: A click-to-script service enables developers of big-data job scripts to quickly see the underlying script operations from optimized execution plans. Once a big-data job is received, the disclosed examples compile it and generate tokens that are associated with each operation of the big-data job. These tokens include may include the file name of the job, the line number of the operation, and/or an Abstract Syntax Tree (AST) node for the given operations. An original execution plan is optimized into an optimized execution plan, and the tokens for the original operations of the job script are assigned to the optimized operations of the optimized execution plan. The optimized execution plan is graphically displayed in an interactive manner such that users may view the optimized execution plan and click on its optimized operations to find the original operations of the job script.Type: ApplicationFiled: June 25, 2021Publication date: April 18, 2024Inventors: Xiangnan LI, Marc Todd FRIEDMAN, Wangchao LE, Evgueni ZABOKRITSKI
-
Publication number: 20230325390Abstract: Example aspects include techniques for clustering delete targets for vectorized deletion including retrieving, from a set of delete targets in a distributed database system, a file to be deleted, scanning existing clusters of files marked for deletion to identify at least one existing cluster of files having constraints corresponding to the file, based on identifying the at least one existing cluster of files, adding the file to the at least one existing cluster of files to create a new cluster of files, based on failing to identify the at least one existing cluster of files, creating the new cluster of files including the file, and generating, for each file in the new cluster of files and based on a deletion signal, a delta array including multiple bits representing data items in each file and indicating, based on bit value, target data items to be deleted from each file.Type: ApplicationFiled: June 14, 2023Publication date: October 12, 2023Inventors: Wangchao LE, Marc Todd Friedman, Hiren Patel
-
Patent number: 11734282Abstract: Example aspects include techniques for performing vectorized delete operations in a distributed database system including clustering multiple files stored in the distributed database system, and generating, for each of the multiple files and based on a deletion signal, a delta array including multiple bits representing the data items in the file and indicating, based on bit value, the target data items to be deleted from the file. Generating, for each of the multiple files, the delta array can include reading at least one second file shard of multiple second file shards before performing a join operation on at least one first file shard of multiple first file shards is completed.Type: GrantFiled: March 30, 2022Date of Patent: August 22, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Wangchao Le, Marc Todd Friedman, Hiren Patel
-
Patent number: 10726007Abstract: Constructing a heavy hitter summary for query optimization. The heavy hitter summary is constructed by sampling each of multiple partitions of a dataset using a uniformed sampling rate. For each partition, performing a two-stage heavy hitter estimation process to determine whether an estimated frequency of a key of the sampled data units may be included in a partition-level heavy hitter summary. Constructing a partition-level heavy hitter summary for each partition of the dataset based on the keys determined via the two-stage process, and constructing a dataset-level heavy hitter summary based on the partition-level heavy hitter summary. The dataset-level heavy hitter summary may be used to optimize query trees.Type: GrantFiled: September 26, 2017Date of Patent: July 28, 2020Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Wangchao Le, Yongchul Kwon, Marc Todd Friedman
-
Patent number: 10726006Abstract: Query optimization using of a query that is compiled into a query tree. The optimization is efficiently performed by using a distinct value estimation data structure (e.g., a KMV synopsis) that represents within an interval distinctness of values that are generated based on data within an interval, even if the resultant data from a subinterval is considered. Various candidate query trees are evaluated, with distinct value generation data structures being propagated for parent nodes based on the distinct value generation data structures of its child node(s). Propagation operations correlate to the operation represented by the parent node in the query tree. The optimizer uses the propagated distinct value estimation structure in order to evaluate the number of distinct values of data that would result from execution of the candidate query tree at least at the corresponding operations (and not just based on the distinct values of the input data).Type: GrantFiled: June 30, 2017Date of Patent: July 28, 2020Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Wangchao Le, Yongchul Kwon, Marc Todd Friedman
-
Publication number: 20190095487Abstract: Constructing a heavy hitter summary for query optimization. The heavy hitter summary is constructed by sampling each of multiple partitions of a dataset using a uniformed sampling rate. For each partition, performing a two-stage heavy hitter estimation process to determine whether an estimated frequency of a key of the sampled data units may be included in a partition-level heavy hitter summary. Constructing a partition-level heavy hitter summary for each partition of the dataset based on the keys determined via the two-stage process, and constructing a dataset-level heavy hitter summary based on the partition-level heavy hitter summary. The dataset-level heavy hitter summary may be used to optimize query trees.Type: ApplicationFiled: September 26, 2017Publication date: March 28, 2019Inventors: Wangchao LE, Yongchul KWON, Marc Todd FRIEDMAN
-
Publication number: 20190005092Abstract: Query optimization using of a query that is compiled into a query tree. The optimization is efficiently performed by using a distinct value estimation data structure (e.g., a KMV synopsis) that represents within an interval distinctness of values that are generated based on data within an interval, even if the resultant data from a subinterval is considered. Various candidate query trees are evaluated, with distinct value generation data structures being propagated for parent nodes based on the distinct value generation data structures of its child node(s). Propagation operations correlate to the operation represented by the parent node in the query tree. The optimizer uses the propagated distinct value estimation structure in order to evaluate the number of distinct values of data that would result from execution of the candidate query tree at least at the corresponding operations (and not just based on the distinct values of the input data).Type: ApplicationFiled: June 30, 2017Publication date: January 3, 2019Inventors: Wangchao LE, Yongchul KWON, Marc Todd FRIEDMAN
-
Patent number: 10095742Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.Type: GrantFiled: November 28, 2016Date of Patent: October 9, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
-
Publication number: 20170083577Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common substructures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.Type: ApplicationFiled: November 28, 2016Publication date: March 23, 2017Inventors: Songyun DUAN, Anastasios KEMENTSIETSIDIS, Wangchao LE, Feifei LI
-
Patent number: 9542444Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.Type: GrantFiled: January 28, 2016Date of Patent: January 10, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
-
Publication number: 20160162549Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.Type: ApplicationFiled: January 28, 2016Publication date: June 9, 2016Applicant: International Business Machines CorporationInventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
-
Patent number: 9280583Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.Type: GrantFiled: November 30, 2012Date of Patent: March 8, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
-
Patent number: 8983990Abstract: A method of performing a graph query issued by a user is provided. The method includes performing on a processor, receiving a user graph query. The method includes rewriting the user graph query as a new query based on a query policy expressed in a graph query language. The method includes performing the new query on graph data to obtain a result.Type: GrantFiled: August 17, 2010Date of Patent: March 17, 2015Assignee: International Business Machines CorporationInventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Min Wang
-
Patent number: 8984019Abstract: Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.Type: GrantFiled: November 20, 2012Date of Patent: March 17, 2015Assignee: International Business Machines CorporationInventors: Songyun Duan, Achille Belly Fokoue-Nkoutche, Anastasios Kementsietsidis, Wangchao Le, Feifei Li, Kavitha Srinivas
-
Patent number: 8977650Abstract: Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.Type: GrantFiled: November 21, 2012Date of Patent: March 10, 2015Assignee: International Business Machines CorporationInventors: Songyun Duan, Achille Belly Fokoue-Nkoutche, Anastasios Kementsietsidis, Wangchao Le, Feifei Li, Kavitha Srinivas
-
Publication number: 20140156633Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.Type: ApplicationFiled: November 30, 2012Publication date: June 5, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
-
Publication number: 20140143281Abstract: Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.Type: ApplicationFiled: November 21, 2012Publication date: May 22, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Songyun Duan, Achille Belly Fokoue-Nkoutche, Anastasios Kementsietsidis, Wangchao Le, Feifei Li, Kavitha Srinivas