Patents by Inventor Wangchao Le

Wangchao Le has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CLICK-TO-SCRIPT REFLECTION

Publication number: 20250139091

Abstract: A click-to-script service enables developers of big-data job scripts to quickly see the underlying script operations from optimized execution plans. Once a big-data job is received, the disclosed examples compile it and generate tokens that are associated with each operation of the big-data job. These tokens include may include the file name of the job, the line number of the operation, and/or an Abstract Syntax Tree (AST) node for the given operations. An original execution plan is optimized into an optimized execution plan, and the tokens for the original operations of the job script are assigned to the optimized operations of the optimized execution plan. The optimized execution plan is graphically displayed in an interactive manner such that users may view the optimized execution plan and click on its optimized operations to find the original operations of the job script.

Type: Application

Filed: November 4, 2024

Publication date: May 1, 2025

Inventors: Xiangnan LI, Marc Todd FRIEDMAN, Wangchao LE, Evgueni ZABOKRITSKI
Click-to-script reflection

Patent number: 12164516

Abstract: A click-to-script service enables developers of big-data job scripts to quickly see the underlying script operations from optimized execution plans. Once a big-data job is received, the disclosed examples compile it and generate tokens that are associated with each operation of the big-data job. These tokens include may include the file name of the job, the line number of the operation, and/or an Abstract Syntax Tree (AST) node for the given operations. An original execution plan is optimized into an optimized execution plan, and the tokens for the original operations of the job script are assigned to the optimized operations of the optimized execution plan. The optimized execution plan is graphically displayed in an interactive manner such that users may view the optimized execution plan and click on its optimized operations to find the original operations of the job script.

Type: Grant

Filed: June 25, 2021

Date of Patent: December 10, 2024

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Xiangnan Li, Marc Todd Friedman, Wangchao Le, Evgueni Zabokritski
Methods and systems for performing a vectorized delete in a distributed database system

Patent number: 12067014

Abstract: Example aspects include techniques for clustering delete targets for vectorized deletion including retrieving, from a set of delete targets in a distributed database system, a file to be deleted, scanning existing clusters of files marked for deletion to identify at least one existing cluster of files having constraints corresponding to the file, based on identifying the at least one existing cluster of files, adding the file to the at least one existing cluster of files to create a new cluster of files, based on failing to identify the at least one existing cluster of files, creating the new cluster of files including the file, and generating, for each file in the new cluster of files and based on a deletion signal, a delta array including multiple bits representing data items in each file and indicating, based on bit value, target data items to be deleted from each file.

Type: Grant

Filed: June 14, 2023

Date of Patent: August 20, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Wangchao Le, Marc Todd Friedman, Hiren Patel
CLICK-TO-SCRIPT REFLECTION

Publication number: 20240126754

Abstract: A click-to-script service enables developers of big-data job scripts to quickly see the underlying script operations from optimized execution plans. Once a big-data job is received, the disclosed examples compile it and generate tokens that are associated with each operation of the big-data job. These tokens include may include the file name of the job, the line number of the operation, and/or an Abstract Syntax Tree (AST) node for the given operations. An original execution plan is optimized into an optimized execution plan, and the tokens for the original operations of the job script are assigned to the optimized operations of the optimized execution plan. The optimized execution plan is graphically displayed in an interactive manner such that users may view the optimized execution plan and click on its optimized operations to find the original operations of the job script.

Type: Application

Filed: June 25, 2021

Publication date: April 18, 2024

Inventors: Xiangnan LI, Marc Todd FRIEDMAN, Wangchao LE, Evgueni ZABOKRITSKI
METHODS AND SYSTEMS FOR PERFORMING A VECTORIZED DELETE IN A DISTRIBUTED DATABASE SYSTEM

Publication number: 20230325390

Abstract: Example aspects include techniques for clustering delete targets for vectorized deletion including retrieving, from a set of delete targets in a distributed database system, a file to be deleted, scanning existing clusters of files marked for deletion to identify at least one existing cluster of files having constraints corresponding to the file, based on identifying the at least one existing cluster of files, adding the file to the at least one existing cluster of files to create a new cluster of files, based on failing to identify the at least one existing cluster of files, creating the new cluster of files including the file, and generating, for each file in the new cluster of files and based on a deletion signal, a delta array including multiple bits representing data items in each file and indicating, based on bit value, target data items to be deleted from each file.

Type: Application

Filed: June 14, 2023

Publication date: October 12, 2023

Inventors: Wangchao LE, Marc Todd Friedman, Hiren Patel
Methods and systems for performing a vectorized delete in a distributed database system

Patent number: 11734282

Abstract: Example aspects include techniques for performing vectorized delete operations in a distributed database system including clustering multiple files stored in the distributed database system, and generating, for each of the multiple files and based on a deletion signal, a delta array including multiple bits representing the data items in the file and indicating, based on bit value, the target data items to be deleted from the file. Generating, for each of the multiple files, the delta array can include reading at least one second file shard of multiple second file shards before performing a join operation on at least one first file shard of multiple first file shards is completed.

Type: Grant

Filed: March 30, 2022

Date of Patent: August 22, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Wangchao Le, Marc Todd Friedman, Hiren Patel
Query optimization using propagated data distinctness

Patent number: 10726006

Abstract: Query optimization using of a query that is compiled into a query tree. The optimization is efficiently performed by using a distinct value estimation data structure (e.g., a KMV synopsis) that represents within an interval distinctness of values that are generated based on data within an interval, even if the resultant data from a subinterval is considered. Various candidate query trees are evaluated, with distinct value generation data structures being propagated for parent nodes based on the distinct value generation data structures of its child node(s). Propagation operations correlate to the operation represented by the parent node in the query tree. The optimizer uses the propagated distinct value estimation structure in order to evaluate the number of distinct values of data that would result from execution of the candidate query tree at least at the corresponding operations (and not just based on the distinct values of the input data).

Type: Grant

Filed: June 30, 2017

Date of Patent: July 28, 2020

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Wangchao Le, Yongchul Kwon, Marc Todd Friedman
Building heavy hitter summary for query optimization

Patent number: 10726007

Abstract: Constructing a heavy hitter summary for query optimization. The heavy hitter summary is constructed by sampling each of multiple partitions of a dataset using a uniformed sampling rate. For each partition, performing a two-stage heavy hitter estimation process to determine whether an estimated frequency of a key of the sampled data units may be included in a partition-level heavy hitter summary. Constructing a partition-level heavy hitter summary for each partition of the dataset based on the keys determined via the two-stage process, and constructing a dataset-level heavy hitter summary based on the partition-level heavy hitter summary. The dataset-level heavy hitter summary may be used to optimize query trees.

Type: Grant

Filed: September 26, 2017

Date of Patent: July 28, 2020

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Wangchao Le, Yongchul Kwon, Marc Todd Friedman
BUILDING HEAVY HITTER SUMMARY FOR QUERY OPTIMIZATION

Publication number: 20190095487

Abstract: Constructing a heavy hitter summary for query optimization. The heavy hitter summary is constructed by sampling each of multiple partitions of a dataset using a uniformed sampling rate. For each partition, performing a two-stage heavy hitter estimation process to determine whether an estimated frequency of a key of the sampled data units may be included in a partition-level heavy hitter summary. Constructing a partition-level heavy hitter summary for each partition of the dataset based on the keys determined via the two-stage process, and constructing a dataset-level heavy hitter summary based on the partition-level heavy hitter summary. The dataset-level heavy hitter summary may be used to optimize query trees.

Type: Application

Filed: September 26, 2017

Publication date: March 28, 2019

Inventors: Wangchao LE, Yongchul KWON, Marc Todd FRIEDMAN
QUERY OPTIMIZATION USING PROPAGATED DATA DISTINCTNESS

Publication number: 20190005092

Abstract: Query optimization using of a query that is compiled into a query tree. The optimization is efficiently performed by using a distinct value estimation data structure (e.g., a KMV synopsis) that represents within an interval distinctness of values that are generated based on data within an interval, even if the resultant data from a subinterval is considered. Various candidate query trees are evaluated, with distinct value generation data structures being propagated for parent nodes based on the distinct value generation data structures of its child node(s). Propagation operations correlate to the operation represented by the parent node in the query tree. The optimizer uses the propagated distinct value estimation structure in order to evaluate the number of distinct values of data that would result from execution of the candidate query tree at least at the corresponding operations (and not just based on the distinct values of the input data).

Type: Application

Filed: June 30, 2017

Publication date: January 3, 2019

Inventors: Wangchao LE, Yongchul KWON, Marc Todd FRIEDMAN
Scalable multi-query optimization for SPARQL

Patent number: 10095742

Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.

Type: Grant

Filed: November 28, 2016

Date of Patent: October 9, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
SCALABLE MULTI-QUERY OPTIMIZATION FOR SPARQL

Publication number: 20170083577

Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common substructures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.

Type: Application

Filed: November 28, 2016

Publication date: March 23, 2017

Inventors: Songyun DUAN, Anastasios KEMENTSIETSIDIS, Wangchao LE, Feifei LI
Scalable multi-query optimization for SPARQL

Patent number: 9542444

Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.

Type: Grant

Filed: January 28, 2016

Date of Patent: January 10, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
Scalable Multi-Query Optimization for SPARQL

Publication number: 20160162549

Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.

Type: Application

Filed: January 28, 2016

Publication date: June 9, 2016

Applicant: International Business Machines Corporation

Inventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
Scalable multi-query optimization for SPARQL

Patent number: 9280583

Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.

Type: Grant

Filed: November 30, 2012

Date of Patent: March 8, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
Enforcing query policies over resource description framework data

Patent number: 8983990

Abstract: A method of performing a graph query issued by a user is provided. The method includes performing on a processor, receiving a user graph query. The method includes rewriting the user graph query as a new query based on a query policy expressed in a graph query language. The method includes performing the new query on graph data to obtain a result.

Type: Grant

Filed: August 17, 2010

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Min Wang
Scalable summarization of data graphs

Patent number: 8984019

Abstract: Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.

Type: Grant

Filed: November 20, 2012

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Songyun Duan, Achille Belly Fokoue-Nkoutche, Anastasios Kementsietsidis, Wangchao Le, Feifei Li, Kavitha Srinivas
Scalable summarization of data graphs

Patent number: 8977650

Abstract: Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.

Type: Grant

Filed: November 21, 2012

Date of Patent: March 10, 2015

Assignee: International Business Machines Corporation

Inventors: Songyun Duan, Achille Belly Fokoue-Nkoutche, Anastasios Kementsietsidis, Wangchao Le, Feifei Li, Kavitha Srinivas
Scalable Multi-Query Optimization for SPARQL

Publication number: 20140156633

Abstract: Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.

Type: Application

Filed: November 30, 2012

Publication date: June 5, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Songyun Duan, Anastasios Kementsietsidis, Wangchao Le, Feifei Li
Scalable Summarization of Data Graphs

Publication number: 20140143281

Abstract: Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.

Type: Application

Filed: November 21, 2012

Publication date: May 22, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Songyun Duan, Achille Belly Fokoue-Nkoutche, Anastasios Kementsietsidis, Wangchao Le, Feifei Li, Kavitha Srinivas

1 2 next