Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10824592
    Abstract: Generally discussed herein are devices, systems, and methods for database management. A method may include determining a first hyperloglog (HLL) sketch of a first column of data, determining a second HLL sketch of a second column of data, estimating an inclusion coefficient based on the first and second HLL sketches, and performing operations on the first column of data or the second column of data in response to determining the inclusion coefficient is greater than, or equal to, a specified threshold.
    Type: Grant
    Filed: June 14, 2018
    Date of Patent: November 3, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Azade Nazi, Bolin Ding, Vivek R Narasayya, Surajit Chaudhuri
  • Patent number: 10810181
    Abstract: The present invention extends to methods, systems, and computer program products for refining structured data indexes. Aspects of the invention include associating structured data, such as, for example, tables, with additional content. Additional content can include content outside the <table> and </table> tags of a web table. Indexes for structured data (e.g., table indexes) can be refined based on the additional content to improve the relevance of providing parts of the structured data (e.g., parts of the table) in search results.
    Type: Grant
    Filed: April 11, 2018
    Date of Patent: October 20, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Patent number: 10810202
    Abstract: Systems, methods, and computer-executable instructions for creating a query execution plan for a query of a database includes receiving, from the database, a set of previously executed query execution plans for the query. Each previously-executed query execution plans includes subplans. Each subplan indicates a tree of physical operators. Physical operators that executed in the set of previously-executed query execution plans are determined. For each physical operator, an execution cost based is determined. Invalid physical operators from the previously-executed query execution plans that are invalid for the database are removed. Equivalent subplans from the previously-executed query execution plans are identified based on physical properties and logical expressions of the subplans. A constrained search space is created based on the equivalent subplans. A query execution plan for the query is constructed from the constrained search space based on the execution cost.
    Type: Grant
    Filed: June 14, 2018
    Date of Patent: October 20, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, Vivek R Narasayya
  • Publication number: 20200320093
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.
    Type: Application
    Filed: June 19, 2020
    Publication date: October 8, 2020
    Inventors: Kris Ganjam, Yeye HE, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Patent number: 10776375
    Abstract: Various technologies that facilitate performance of a data finding data (DFD) search are described herein. A user specifies entities, for example, by entering the entities into a query field, selecting the entities from a computer-executable application, or the like. The user further specifies an attribute of the entities that is of interest. A query is constructed based upon the entities and the attribute, and a search for tables is performed based upon the entities and the attribute. Values of the attribute for the selected entities are identified in a table, and the values of the attribute are returned.
    Type: Grant
    Filed: May 21, 2014
    Date of Patent: September 15, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kris Ganjam, Zhimin Chen, Kaushik Chakrabarti, Surajit Chaudhuri, Vivek Narasayya, James Finnigan, Kanstantsyn Zoryn
  • Patent number: 10776380
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a transformation function is executed using an example input value to obtain an initial output value. Thereafter, a plurality of supplemental transformation tools is applied to the initial output value to generate a plurality of intermediary output values. Based on a comparison of each of the intermediary output values to an example output value, the supplemental transformation tool that generated an intermediary output value having a greatest extent of similarity to the example output values is identified. The identified supplemental transformation tool and the transformation function are used to generate a transformation program that transforms the example input values to the desired form in which to transform data.
    Type: Grant
    Filed: October 21, 2016
    Date of Patent: September 15, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Publication number: 20200272667
    Abstract: Systems and techniques for leveraging query executions to improve index recommendations are described herein. In an example, a machine learning model is adapted to receive a first query plan and a second query plan for performing a query with a database, where the first query plan is different from the second query plan. The machine learning model may be further adapted to determine execution cost efficiency between the first query plan and the second query plan. The machine learning model is trained using relative execution cost comparisons between a set of pairs of query plans for the database. The machine learning model is further adapted to output a ranking of the first query plan and second query plan, where the first query plan and second query plan are ranked based on execution cost efficiency.
    Type: Application
    Filed: February 21, 2019
    Publication date: August 27, 2020
    Inventors: Bailu Ding, Sudipto Das, Surajit Chaudhuri, Vivek R Narasayya, Ryan Marcus, Lin Ma, Adith Swaminathan
  • Patent number: 10740328
    Abstract: A processing unit can determine a first subset of a data set including data records selected based on measure values thereof. The processing unit can determine an index mapping a predicate to data records associated with that predicate and approximation values of the records. The processing unit can process a query against the first subset to provide a first result and a first accuracy value, determine that the first accuracy value does not satisfy an accuracy criterion, and process the query against the index. In some examples, the processing unit can process the query against a second subset including data records satisfying a predetermined predicate. In some examples, the processing unit can receive data records and determine the first subset. Data records can include respective measure values. Data records with higher measure values can occur in the first subset more frequently than data records with lower measure values.
    Type: Grant
    Filed: June 24, 2016
    Date of Patent: August 11, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bolin Ding, Silu Huang, Chi Wang, Kaushik Chakrabarti, Surajit Chaudhuri
  • Publication number: 20200242127
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.
    Type: Application
    Filed: April 13, 2020
    Publication date: July 30, 2020
    Inventors: Yeye HE, Kris GANJAM, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
  • Publication number: 20200226109
    Abstract: Systems, methods, and computer-executable instructions for reorganizing a physical layout of data of a database a database. A workload is selected from previously executed database operations. A total resource consumption of the previously executed database operations and of the workload is determined. The total resource consumption of the workload is more than a predetermined threshold of the total resource consumption of the previously executed database operations. Optimization operations for the database are determined using the workload. A cloned database of the database is created. The optimization operations are executed on the cloned database. A database operation is received for the database. The database operation is executed on the database and the cloned database. The performance of the cloned database is verified as being improved compared to the performance of the databased based on the executing of the database operation on the database and the cloned database.
    Type: Application
    Filed: January 14, 2019
    Publication date: July 16, 2020
    Inventors: Sudipto Das, Vivek R. Narasayya, Gaoxiang Xu, Surajit Chaudhuri, Andrija Jovanovic, Miodrag Radulovic
  • Patent number: 10706066
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.
    Type: Grant
    Filed: October 17, 2016
    Date of Patent: July 7, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kris Ganjam, Yeye He, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Patent number: 10685020
    Abstract: In some embodiments, the disclosed subject matter involves a server query optimizer for parametric query optimization (PQO) to address the problem of finding and reusing a relatively small number of query plans that can achieve good plan quality across multiple instances of a parameterized query. An embodiment processes query instances on-line and ensures (a) tight, bounded cost sub-optimality for each instance, (b) low optimization overheads, and (c) only a small number of plans need to be stored. A plan re-costing based approach is disclosed to provide good performance on all three metrics. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 2, 2017
    Date of Patent: June 16, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Surajit Chaudhuri, Anshuman Dutt, Vivek R Narasayya
  • Patent number: 10621195
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.
    Type: Grant
    Filed: September 20, 2016
    Date of Patent: April 14, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Publication number: 20200065712
    Abstract: In automated machine learning, an approximate best configuration can be selected among multiple candidate machine-learning configurations by progressively sampling training and test datasets for the iterative training and testing of the configurations while progressively pruning the set of candidate configurations based on associated estimated confidence intervals for their respective performance.
    Type: Application
    Filed: August 23, 2018
    Publication date: February 27, 2020
    Inventors: Chi Wang, Silu Huang, Surajit Chaudhuri, Bolin Ding
  • Publication number: 20190384830
    Abstract: Generally discussed herein are devices, systems, and methods for database management. A method may include determining a first hyperloglog (HLL) sketch of a first column of data, determining a second HLL sketch of a second column of data, estimating an inclusion coefficient based on the first and second HLL sketches, and performing operations on the first column of data or the second column of data in response to determining the inclusion coefficient is greater than, or equal to, a specified threshold.
    Type: Application
    Filed: June 14, 2018
    Publication date: December 19, 2019
    Inventors: Azade Nazi, Bolin Ding, Vivek R. Narasayya, Surajit Chaudhuri
  • Publication number: 20190384844
    Abstract: Systems, methods, and computer-executable instructions for creating a query execution plan for a query of a database includes receiving, from the database, a set of previously executed query execution plans for the query. Each previously-executed query execution plans includes subplans. Each subplan indicates a tree of physical operators. Physical operators that executed in the set of previously-executed query execution plans are determined. For each physical operator, an execution cost based is determined. Invalid physical operators from the previously-executed query execution plans that are invalid for the database are removed. Equivalent subplans from the previously-executed query execution plans are identified based on physical properties and logical expressions of the subplans. A constrained search space is created based on the equivalent subplans. A query execution plan for the query is constructed from the constrained search space based on the execution cost.
    Type: Application
    Filed: June 14, 2018
    Publication date: December 19, 2019
    Inventors: Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, Vivek R. Narasayya
  • Publication number: 20190378028
    Abstract: Implementations are presented for utilizing probabilistic predicates (PPs) to speed up searches requiring machine learning inferences. One method includes receiving a search query comprising a predicate for filtering blobs in a database utilizing a user-defined-function (UDF). The filtering requiring analysis of the blobs by the UDF to determine blobs that pass the filtering. Further, the method includes determining a PP sequence of PPs based on the predicate. Each PP is a classifier that calculates a PP-blob probability of satisfying a PP clause. The PP sequence defines an expression to combine the PPs. Further, the method includes operations for performing the PP sequence to determine a blob probability that the blob satisfies the expression, determining which blobs meet an accuracy threshold, discarding the blobs with the blob probability less than the accuracy threshold, and executing the database query over the blobs that have not been discarded. The results are then presented.
    Type: Application
    Filed: June 8, 2018
    Publication date: December 12, 2019
    Inventors: Surajit Chaudhuri, Srikanth Kandula, Yao Lu
  • Patent number: 10503704
    Abstract: Techniques for tenant performance isolation in a multiple-tenant database management system are described. These techniques may include providing a reservation of server resources. The server resources reservation may include a reservation of a central processing unit (CPU), a reservation of Input/Ouput throughput, and/or a reservation of buffer pool memory or working memory. The techniques may also include a metering mechanism that determines whether the resource reservation is satisfied. The metering mechanism may be independent of an actual resource allocation mechanism associated with the server resource reservation.
    Type: Grant
    Filed: August 31, 2016
    Date of Patent: December 10, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Vivek R. Narasayya, Sudipto Das, Feng Li, Manoj A. Syamala, Hyunjung Park, Surajit Chaudhuri, Badrish Chandramouli
  • Patent number: 10496643
    Abstract: One or more approximations of query output in a data analytics platform are controlled. The one or more approximations are controlled by generating values of error metrics associated with placements of samplers in one or more query execution plans associated with the query, and injecting a plurality of samplers into the query execution plans, using the determined values of the error metrics, in lieu of storing samples of input to the query prior to execution of the query.
    Type: Grant
    Filed: February 8, 2016
    Date of Patent: December 3, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Srikanth Kandula, Surajit Chaudhuri, Bolin Ding, Anil Atmanand Shanbhag, Aleksandar Vitorović, Matthaios Olma, Robert Grandl
  • Publication number: 20190325046
    Abstract: Systems, methods, and computer-executable instructions for partitioning a data set include receiving anchor attributes of a data set. The data set includes records, with each record including attributes. A set of filter attributes that are not mutually exclusive with any of the anchor attributes is determined. A set of candidate attributes that includes each unique attribute from the first data set excluding the anchor attributes and the filter attributes is determined. For each of the anchor attributes and the anchor attributes, an attribute context is determined. For each of the candidate attributes, a context similarity between each of the anchor attributes is determined. A new anchor attribute is selected from the set of candidate attributes based on the context similarity.
    Type: Application
    Filed: April 19, 2018
    Publication date: October 24, 2019
    Inventors: Lev Novik, Surajit Chaudhuri, Yeye He