Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220318221
    Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.
    Type: Application
    Filed: June 23, 2022
    Publication date: October 6, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yeye HE, Jie SONG, Yue WANG, Surajit CHAUDHURI, Vishal Kumar Seshagirirao ANIL, Yaron Y. GOLAND, Gaurav MALHOTRA, Blake LASSITER
  • Patent number: 11397716
    Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.
    Type: Grant
    Filed: November 19, 2020
    Date of Patent: July 26, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Kumar Seshagirirao Anil, Yaron Y. Goland, Gaurav Malhotra, Blake Lassiter
  • Publication number: 20220152474
    Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.
    Type: Application
    Filed: November 23, 2021
    Publication date: May 19, 2022
    Inventors: John C. PLATT, Surajit CHAUDHURI, Lev NOVIK, Henricus Johannes Maria MEIJER
  • Publication number: 20220156242
    Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.
    Type: Application
    Filed: November 19, 2020
    Publication date: May 19, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yeye HE, Jie SONG, Yue WANG, Surajit CHAUDHURI, Vishal Kumar Seshagirirao ANIL, Yaron Y. GOLAND, Gaurav MALHOTRA, Blake LASSITER
  • Publication number: 20220058205
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.
    Type: Application
    Filed: November 8, 2021
    Publication date: February 24, 2022
    Inventors: Yeye HE, Kris GANJAM, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI, Xu CHU
  • Publication number: 20210406744
    Abstract: A model generator constructs a model for estimating selectivity of database operations by determining a number of training examples necessary for the model to achieve a target accuracy and by generating approximate selectivity labels for the training examples. The model generator may train the model on an initial number of training examples using cross-validation. The model generator may determine whether the model satisfies the target accuracy and iteratively and geometrically increase the number of training examples based on an optimized geometric step size (which may minimize model construction time) until the model achieves the target accuracy based on a defined confidence level. The model generator may generate labels using a subset of tuples from an intermediate query expression. The model generator may iteratively increase a size of the subset of tuples used until a relative error of the generated labels is below a target threshold.
    Type: Application
    Filed: June 30, 2020
    Publication date: December 30, 2021
    Inventors: Anshuman DUTT, Chi WANG, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
  • Patent number: 11202958
    Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.
    Type: Grant
    Filed: April 11, 2012
    Date of Patent: December 21, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: John C. Platt, Surajit Chaudhuri, Lev Novik, Henricus Johannes Maria Meijer
  • Publication number: 20210374134
    Abstract: A transform-by-pattern (TBP) system is configured to proactively suggest relevant TBP programs based on inputted source dataset and target dataset without requiring users typing in examples. The TBP system has access to multiple TBP programs, each of which includes a combination of a source pattern, a target pattern, and a transformation program that is configured to transform data that fits into the target pattern into data that fits into the source pattern. When a source dataset and a target dataset are received from a user, the TBP system identifies a subset of the source dataset and a subset of the target dataset as related data. The TBP system then identifies one or more applicable TBP programs amongst the multiple TBP programs, and suggest or apply at least one of the one or more applicable TBP programs.
    Type: Application
    Filed: May 29, 2020
    Publication date: December 2, 2021
    Inventors: Yeye HE, Surajit CHAUDHURI, Zhongjun JIN
  • Patent number: 11182360
    Abstract: Systems, methods, and computer-executable instructions for reorganizing a physical layout of data of a database a database. A workload is selected from previously executed database operations. A total resource consumption of the previously executed database operations and of the workload is determined. The total resource consumption of the workload is more than a predetermined threshold of the total resource consumption of the previously executed database operations. Optimization operations for the database are determined using the workload. A cloned database of the database is created. The optimization operations are executed on the cloned database. A database operation is received for the database. The database operation is executed on the database and the cloned database. The performance of the cloned database is verified as being improved compared to the performance of the database based on the executing of the database operation on the database and the cloned database.
    Type: Grant
    Filed: January 14, 2019
    Date of Patent: November 23, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sudipto Das, Vivek R Narasayya, Gaoxiang Xu, Surajit Chaudhuri, Andrija Jovanovic, Miodrag Radulovic
  • Patent number: 11170020
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.
    Type: Grant
    Filed: November 4, 2016
    Date of Patent: November 9, 2021
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Xu Chu
  • Patent number: 11163788
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values is received. An index to identify a plurality of data transformation tools that are relevant to the set of example values is referenced, wherein each of the data transformation tools correspond with one or more tool examples. The data transformation tools are ranked based on an extent of similarity between the set of example values and the tool examples. For data transformation tools associated with the extent of similarity that exceeds a similarity threshold, a transformation program is generated that uses the data transformation tool and a supplemental transformation tool to transform the one or more example input values to the desired form in which to transform data.
    Type: Grant
    Filed: November 4, 2016
    Date of Patent: November 2, 2021
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Xu Chu
  • Publication number: 20210319023
    Abstract: the present disclosure relates to systems, methods, and computer-readable media for optimizing and implementing operator trees based on a received query. For example, systems disclosed herein may generate an operator tree based on a received query. The systems described herein may systematically analyze the impact of bitvector filters in optimizing a join order of the operator tree to generate an optimized operator tree. The systems described herein may further implement the bit-vector aware operator tree by providing the optimized operator tree to an execution engine for further processing.
    Type: Application
    Filed: June 30, 2020
    Publication date: October 14, 2021
    Inventors: Bailu DING, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
  • Patent number: 11138266
    Abstract: Systems and techniques for leveraging query executions to improve index recommendations are described herein. In an example, a machine learning model is adapted to receive a first query plan and a second query plan for performing a query with a database, where the first query plan is different from the second query plan. The machine learning model may be further adapted to determine execution cost efficiency between the first query plan and the second query plan. The machine learning model is trained using relative execution cost comparisons between a set of pairs of query plans for the database. The machine learning model is further adapted to output a ranking of the first query plan and second query plan, where the first query plan and second query plan are ranked based on execution cost efficiency.
    Type: Grant
    Filed: February 21, 2019
    Date of Patent: October 5, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bailu Ding, Sudipto Das, Surajit Chaudhuri, Vivek R Narasayya, Ryan Marcus, Lin Ma, Adith Swaminathan
  • Patent number: 11093494
    Abstract: Methods and systems for joining two tables are provided. At least two tables to be joined are received. A joinable row pair between the at least two tables is determined. The determined joinable row pair includes a first row from a first table having a common string value with a second row from a second table of the at least two tables. A transformation model is generated from the determined joinable row pair. A column of the first table is transformed based on the generated transformation model. The transformed first table is joined with the second table.
    Type: Grant
    Filed: April 6, 2017
    Date of Patent: August 17, 2021
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Surajit Chaudhuri, Er Kang Zhu
  • Patent number: 10963471
    Abstract: A location associated with a user of a computing device and a prefix portion of an input string may be received as one or more successive characters of the input string are provided by the user via the computing device. A list of suggested items may be obtained based on a function of respective recommendation indicators and proximities of the items to the location in response to receiving the prefix portion, and based on partially traversing a character string search structure having a plurality of non-terminal nodes augmented with bound indicators associated with spatial regions. The list of suggested items and descriptive information associated with each suggested item may be returned to the user, in response to receiving the prefix portion, for rendering an image illustrating indicators associated with the list in a manner relative to the location, as the user provides each successive character of the input string.
    Type: Grant
    Filed: December 18, 2018
    Date of Patent: March 30, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Senjuti Basu Roy
  • Patent number: 10896229
    Abstract: The present invention extends to methods, systems, and computer program products for computing features of structured data. Aspects of the invention include computing features of table components (e.g., of rows, columns, cells, etc.). Computed features can be used for ranking the table components. When aggregated, features for different components of a table can be used for ranking the table (e.g., a web table).
    Type: Grant
    Filed: November 12, 2018
    Date of Patent: January 19, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Publication number: 20210011926
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a transformation function is executed using an example input value to obtain an initial output value. Thereafter, a plurality of supplemental transformation tools is applied to the initial output value to generate a plurality of intermediary output values. Based on a comparison of each of the intermediary output values to an example output value, the supplemental transformation tool that generated an intermediary output value having a greatest extent of similarity to the example output values is identified. The identified supplemental transformation tool and the transformation function are used to generate a transformation program that transforms the example input values to the desired form in which to transform data.
    Type: Application
    Filed: September 9, 2020
    Publication date: January 14, 2021
    Inventors: Yeye HE, Kris Ganjam, Vivek Ravindraneth NARASAYYA, Surajit Chaudhuri
  • Patent number: 10853332
    Abstract: Systems, methods, and computer-executable instructions for partitioning a data set include receiving anchor attributes of a data set. The data set includes records, with each record including attributes. A set of filter attributes that are not mutually exclusive with any of the anchor attributes is determined. A set of candidate attributes that include each unique attribute from the first data set, excluding the anchor attributes and the filter attributes, is determined. For each of the anchor attributes and the anchor attributes, an attribute context is determined. For each of the candidate attributes, a context similarity between each of the anchor attributes is determined. A new anchor attribute is selected from the set of candidate attributes based on the context similarity.
    Type: Grant
    Filed: April 19, 2018
    Date of Patent: December 1, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Lev Novik, Surajit Chaudhuri, Yeye He
  • Patent number: 10853344
    Abstract: The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject tuple (e.g., a subject column) for a table, detecting a tuple header (e.g., a column header) using other tables, and detecting a tuple header (e.g., a column header) using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.
    Type: Grant
    Filed: July 27, 2017
    Date of Patent: December 1, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhongyuan Wang, Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Patent number: 10838957
    Abstract: A relational database server may concurrently execute many relational queries, but a complex relational query may cause performance delays in the fulfillment of other relational queries. Instead, the relational database server may generate a query plan for the relational query, and may endeavor to partition the relational query between a spool operator and a scan operator into two or more query slices, where each query slice may be executed within a query slice threshold. Many alternative candidate query plans may be considered, such as inserting spool and scan operators after various operators and parameterizing operators in order to partition the records of a relation into two or more ranges based on an attribute of the relation. A large search space of candidate query plans may be reviewed in order to select a query plan that respects the query slice threshold while efficiently executing the logic of the relational query.
    Type: Grant
    Filed: June 17, 2010
    Date of Patent: November 17, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Nicolas Bruno, Ravishankar Ramamurthy, Surajit Chaudhuri, Vivek Ravindranath Narasayya