Patents by Inventor Surajit Chaudhuri
Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20220318221Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.Type: ApplicationFiled: June 23, 2022Publication date: October 6, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Yeye HE, Jie SONG, Yue WANG, Surajit CHAUDHURI, Vishal Kumar Seshagirirao ANIL, Yaron Y. GOLAND, Gaurav MALHOTRA, Blake LASSITER
-
Patent number: 11397716Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.Type: GrantFiled: November 19, 2020Date of Patent: July 26, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Kumar Seshagirirao Anil, Yaron Y. Goland, Gaurav Malhotra, Blake Lassiter
-
Publication number: 20220152474Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.Type: ApplicationFiled: November 23, 2021Publication date: May 19, 2022Inventors: John C. PLATT, Surajit CHAUDHURI, Lev NOVIK, Henricus Johannes Maria MEIJER
-
Publication number: 20220156242Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.Type: ApplicationFiled: November 19, 2020Publication date: May 19, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Yeye HE, Jie SONG, Yue WANG, Surajit CHAUDHURI, Vishal Kumar Seshagirirao ANIL, Yaron Y. GOLAND, Gaurav MALHOTRA, Blake LASSITER
-
Publication number: 20220058205Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.Type: ApplicationFiled: November 8, 2021Publication date: February 24, 2022Inventors: Yeye HE, Kris GANJAM, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI, Xu CHU
-
Publication number: 20210406744Abstract: A model generator constructs a model for estimating selectivity of database operations by determining a number of training examples necessary for the model to achieve a target accuracy and by generating approximate selectivity labels for the training examples. The model generator may train the model on an initial number of training examples using cross-validation. The model generator may determine whether the model satisfies the target accuracy and iteratively and geometrically increase the number of training examples based on an optimized geometric step size (which may minimize model construction time) until the model achieves the target accuracy based on a defined confidence level. The model generator may generate labels using a subset of tuples from an intermediate query expression. The model generator may iteratively increase a size of the subset of tuples used until a relative error of the generated labels is below a target threshold.Type: ApplicationFiled: June 30, 2020Publication date: December 30, 2021Inventors: Anshuman DUTT, Chi WANG, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
-
Patent number: 11202958Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.Type: GrantFiled: April 11, 2012Date of Patent: December 21, 2021Assignee: Microsoft Technology Licensing, LLCInventors: John C. Platt, Surajit Chaudhuri, Lev Novik, Henricus Johannes Maria Meijer
-
Publication number: 20210374134Abstract: A transform-by-pattern (TBP) system is configured to proactively suggest relevant TBP programs based on inputted source dataset and target dataset without requiring users typing in examples. The TBP system has access to multiple TBP programs, each of which includes a combination of a source pattern, a target pattern, and a transformation program that is configured to transform data that fits into the target pattern into data that fits into the source pattern. When a source dataset and a target dataset are received from a user, the TBP system identifies a subset of the source dataset and a subset of the target dataset as related data. The TBP system then identifies one or more applicable TBP programs amongst the multiple TBP programs, and suggest or apply at least one of the one or more applicable TBP programs.Type: ApplicationFiled: May 29, 2020Publication date: December 2, 2021Inventors: Yeye HE, Surajit CHAUDHURI, Zhongjun JIN
-
Patent number: 11182360Abstract: Systems, methods, and computer-executable instructions for reorganizing a physical layout of data of a database a database. A workload is selected from previously executed database operations. A total resource consumption of the previously executed database operations and of the workload is determined. The total resource consumption of the workload is more than a predetermined threshold of the total resource consumption of the previously executed database operations. Optimization operations for the database are determined using the workload. A cloned database of the database is created. The optimization operations are executed on the cloned database. A database operation is received for the database. The database operation is executed on the database and the cloned database. The performance of the cloned database is verified as being improved compared to the performance of the database based on the executing of the database operation on the database and the cloned database.Type: GrantFiled: January 14, 2019Date of Patent: November 23, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Sudipto Das, Vivek R Narasayya, Gaoxiang Xu, Surajit Chaudhuri, Andrija Jovanovic, Miodrag Radulovic
-
Patent number: 11170020Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.Type: GrantFiled: November 4, 2016Date of Patent: November 9, 2021Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Xu Chu
-
Patent number: 11163788Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values is received. An index to identify a plurality of data transformation tools that are relevant to the set of example values is referenced, wherein each of the data transformation tools correspond with one or more tool examples. The data transformation tools are ranked based on an extent of similarity between the set of example values and the tool examples. For data transformation tools associated with the extent of similarity that exceeds a similarity threshold, a transformation program is generated that uses the data transformation tool and a supplemental transformation tool to transform the one or more example input values to the desired form in which to transform data.Type: GrantFiled: November 4, 2016Date of Patent: November 2, 2021Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Xu Chu
-
Publication number: 20210319023Abstract: the present disclosure relates to systems, methods, and computer-readable media for optimizing and implementing operator trees based on a received query. For example, systems disclosed herein may generate an operator tree based on a received query. The systems described herein may systematically analyze the impact of bitvector filters in optimizing a join order of the operator tree to generate an optimized operator tree. The systems described herein may further implement the bit-vector aware operator tree by providing the optimized operator tree to an execution engine for further processing.Type: ApplicationFiled: June 30, 2020Publication date: October 14, 2021Inventors: Bailu DING, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
-
Patent number: 11138266Abstract: Systems and techniques for leveraging query executions to improve index recommendations are described herein. In an example, a machine learning model is adapted to receive a first query plan and a second query plan for performing a query with a database, where the first query plan is different from the second query plan. The machine learning model may be further adapted to determine execution cost efficiency between the first query plan and the second query plan. The machine learning model is trained using relative execution cost comparisons between a set of pairs of query plans for the database. The machine learning model is further adapted to output a ranking of the first query plan and second query plan, where the first query plan and second query plan are ranked based on execution cost efficiency.Type: GrantFiled: February 21, 2019Date of Patent: October 5, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Bailu Ding, Sudipto Das, Surajit Chaudhuri, Vivek R Narasayya, Ryan Marcus, Lin Ma, Adith Swaminathan
-
Patent number: 11093494Abstract: Methods and systems for joining two tables are provided. At least two tables to be joined are received. A joinable row pair between the at least two tables is determined. The determined joinable row pair includes a first row from a first table having a common string value with a second row from a second table of the at least two tables. A transformation model is generated from the determined joinable row pair. A column of the first table is transformed based on the generated transformation model. The transformed first table is joined with the second table.Type: GrantFiled: April 6, 2017Date of Patent: August 17, 2021Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Yeye He, Surajit Chaudhuri, Er Kang Zhu
-
Patent number: 10963471Abstract: A location associated with a user of a computing device and a prefix portion of an input string may be received as one or more successive characters of the input string are provided by the user via the computing device. A list of suggested items may be obtained based on a function of respective recommendation indicators and proximities of the items to the location in response to receiving the prefix portion, and based on partially traversing a character string search structure having a plurality of non-terminal nodes augmented with bound indicators associated with spatial regions. The list of suggested items and descriptive information associated with each suggested item may be returned to the user, in response to receiving the prefix portion, for rendering an image illustrating indicators associated with the list in a manner relative to the location, as the user provides each successive character of the input string.Type: GrantFiled: December 18, 2018Date of Patent: March 30, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Kaushik Chakrabarti, Surajit Chaudhuri, Senjuti Basu Roy
-
Patent number: 10896229Abstract: The present invention extends to methods, systems, and computer program products for computing features of structured data. Aspects of the invention include computing features of table components (e.g., of rows, columns, cells, etc.). Computed features can be used for ranking the table components. When aggregated, features for different components of a table can be used for ranking the table (e.g., a web table).Type: GrantFiled: November 12, 2018Date of Patent: January 19, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
-
Publication number: 20210011926Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a transformation function is executed using an example input value to obtain an initial output value. Thereafter, a plurality of supplemental transformation tools is applied to the initial output value to generate a plurality of intermediary output values. Based on a comparison of each of the intermediary output values to an example output value, the supplemental transformation tool that generated an intermediary output value having a greatest extent of similarity to the example output values is identified. The identified supplemental transformation tool and the transformation function are used to generate a transformation program that transforms the example input values to the desired form in which to transform data.Type: ApplicationFiled: September 9, 2020Publication date: January 14, 2021Inventors: Yeye HE, Kris Ganjam, Vivek Ravindraneth NARASAYYA, Surajit Chaudhuri
-
Patent number: 10853332Abstract: Systems, methods, and computer-executable instructions for partitioning a data set include receiving anchor attributes of a data set. The data set includes records, with each record including attributes. A set of filter attributes that are not mutually exclusive with any of the anchor attributes is determined. A set of candidate attributes that include each unique attribute from the first data set, excluding the anchor attributes and the filter attributes, is determined. For each of the anchor attributes and the anchor attributes, an attribute context is determined. For each of the candidate attributes, a context similarity between each of the anchor attributes is determined. A new anchor attribute is selected from the set of candidate attributes based on the context similarity.Type: GrantFiled: April 19, 2018Date of Patent: December 1, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Lev Novik, Surajit Chaudhuri, Yeye He
-
Patent number: 10853344Abstract: The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject tuple (e.g., a subject column) for a table, detecting a tuple header (e.g., a column header) using other tables, and detecting a tuple header (e.g., a column header) using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.Type: GrantFiled: July 27, 2017Date of Patent: December 1, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Zhongyuan Wang, Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
-
Patent number: 10838957Abstract: A relational database server may concurrently execute many relational queries, but a complex relational query may cause performance delays in the fulfillment of other relational queries. Instead, the relational database server may generate a query plan for the relational query, and may endeavor to partition the relational query between a spool operator and a scan operator into two or more query slices, where each query slice may be executed within a query slice threshold. Many alternative candidate query plans may be considered, such as inserting spool and scan operators after various operators and parameterizing operators in order to partition the records of a relation into two or more ranges based on an attribute of the relation. A large search space of candidate query plans may be reviewed in order to select a query plan that respects the query slice threshold while efficiently executing the logic of the relational query.Type: GrantFiled: June 17, 2010Date of Patent: November 17, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Nicolas Bruno, Ravishankar Ramamurthy, Surajit Chaudhuri, Vivek Ravindranath Narasayya