Patents by Inventor Yeye He

Yeye He has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11928564
    Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.
    Type: Grant
    Filed: October 19, 2022
    Date of Patent: March 12, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Cong Yan
  • Patent number: 11886457
    Abstract: A transform-by-pattern (TBP) system is configured to proactively suggest relevant TBP programs based on inputted source dataset and target dataset without requiring users typing in examples. The TBP system has access to multiple TBP programs, each of which includes a combination of a source pattern, a target pattern, and a transformation program that is configured to transform data that fits into the target pattern into data that fits into the source pattern. When a source dataset and a target dataset are received from a user, the TBP system identifies a subset of the source dataset and a subset of the target dataset as related data. The TBP system then identifies one or more applicable TBP programs amongst the multiple TBP programs, and suggest or apply at least one of the one or more applicable TBP programs.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: January 30, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Surajit Chaudhuri, Zhongjun Jin
  • Publication number: 20240028607
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.
    Type: Application
    Filed: September 28, 2023
    Publication date: January 25, 2024
    Inventors: Yeye HE, Kris K. GANJAM, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
  • Patent number: 11880344
    Abstract: Methods and systems for generating multi-operator data transformation pipelines. An example method includes accessing raw data for transformation; receiving a selection of a target table or target visualization, wherein the target table or target visualization is for data other than the raw data; extracting table properties and target constraints; and based on the extracted table properties and target constraints, synthesizing one or more multi-operator data transformation pipelines for transforming the raw data to a generated table or generated visualization.
    Type: Grant
    Filed: May 14, 2021
    Date of Patent: January 23, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Surajit Chaudhuri, Junwen Yang
  • Publication number: 20230368068
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for training and implementing pipeline error detection models to facilitate automated detection of data quality (DQ) issues within recurring data pipelines. For example, systems described herein involve training a pipeline error detection model by first constructing a plurality of DQ constraints for a recurring data pipeline based on ranges of values observed over a history of pipeline executions. The systems may further train the model to predict DQ issues by synthetically applying data variants to historical executions of the recurring data pipeline or to data pipelines having similar characteristics thereto. Once trained, the pipeline error detection model(s) can be applied to new executions of the data pipeline as they become available to quickly and efficiently predict whether a given execution includes a predicted DQ issue therein.
    Type: Application
    Filed: May 12, 2022
    Publication date: November 16, 2023
    Inventors: Yeye HE, Weiwei CUI, Song GE, Haidong ZHANG, Shi HAN, Dongmei ZHANG, Surajit CHAUDHURI
  • Patent number: 11809442
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.
    Type: Grant
    Filed: April 13, 2020
    Date of Patent: November 7, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Patent number: 11809223
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.
    Type: Grant
    Filed: November 8, 2021
    Date of Patent: November 7, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Xu Chu
  • Patent number: 11714790
    Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: August 1, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Meiyalagan Balasubramanian, Lengning Liu, Aditya Kuppa, Kirk Hartmann Freiheit, Kalen Wong, Paula Budig Greve, Patrick Clinton Little, Lucas Pritz, Yue Wang, Vivek Ravindranath Narasayya, Katchaguy Areekijseree, Yeye He, Surajit Chaudhuri, Gaurav Ghosh
  • Patent number: 11698892
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for using a variety of hypothesis tests to identify errors within tables and other structured datasets. For example, systems disclosed herein can generate a modified table from an input table by removing one or more entries from the input table. The systems disclosed herein can further leverage a collection of training tables to determine probabilities associated with whether the input table and modified table are drawn from the collection of training tables. The systems disclosed herein can additionally compare the probabilities to accurately determine whether the one or more entries include errors therein. The systems disclosed herein may apply to a variety of different sizes and types of tables to identify different types of common errors within input tables.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: July 11, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Pei Wang
  • Publication number: 20230098926
    Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.
    Type: Application
    Filed: September 30, 2021
    Publication date: March 30, 2023
    Inventors: Meiyalagan BALASUBRAMANIAN, Lengning LIU, Aditya KUPPA, Kirk Hartmann FREIHEIT, Kalen WONG, Paula Budig GREVE, Patrick Clinton LITTLE, Lucas PRITZ, Yue WANG, Vivek Ravindranath NARASAYYA, Katchaguy AREEKIJSEREE, Yeye HE, Surajit CHAUDHURI
  • Patent number: 11586838
    Abstract: Systems and techniques for end-to-end fuzzy entity matching are described herein. A first input and a second input may be received. The first input and the second input may be evaluated to identify common attribute types. A set of attribute entity matching models may be selected that correspond to the attribute types. The first input and the second input may be evaluated using the set of attribute entity matching models to determine a set of weighted scores for attribute pairs in the first input and the second input. The set of weighted scores may be evaluated using a table-level entity matching model to identify a common entity included in the first input and the second input. A linking dataset may be generated that includes a cross-linking facility indicating a relationship between a first entity descriptor in the first input and a second entity descriptor in the second input.
    Type: Grant
    Filed: July 2, 2019
    Date of Patent: February 21, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Chen Zhao
  • Publication number: 20230043015
    Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.
    Type: Application
    Filed: October 19, 2022
    Publication date: February 9, 2023
    Inventors: Yeye He, Cong Yan
  • Patent number: 11520800
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: December 6, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kris Ganjam, Yeye He, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Publication number: 20220365910
    Abstract: Methods and systems for generating multi-operator data transformation pipelines. An example method includes accessing raw data for transformation; receiving a selection of a target table or target visualization, wherein the target table or target visualization is for data other than the raw data; extracting table properties and target constraints; and based on the extracted table properties and target constraints, synthesizing one or more multi-operator data transformation pipelines for transforming the raw data to a generated table or generated visualization.
    Type: Application
    Filed: May 14, 2021
    Publication date: November 17, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yeye HE, Surajit CHAUDHURI, Junwen YANG
  • Patent number: 11488068
    Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: November 1, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Cong Yan
  • Publication number: 20220318221
    Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.
    Type: Application
    Filed: June 23, 2022
    Publication date: October 6, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yeye HE, Jie SONG, Yue WANG, Surajit CHAUDHURI, Vishal Kumar Seshagirirao ANIL, Yaron Y. GOLAND, Gaurav MALHOTRA, Blake LASSITER
  • Patent number: 11397716
    Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.
    Type: Grant
    Filed: November 19, 2020
    Date of Patent: July 26, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Kumar Seshagirirao Anil, Yaron Y. Goland, Gaurav Malhotra, Blake Lassiter
  • Publication number: 20220164325
    Abstract: Aspects of the present disclosure relate to data validation using inferred patterns. Columns of a data store may be processed to generate a set of candidate patterns for each respective column, which may be combined to form a combined set of candidate patterns. Columns of the data store may then be processed using the combined set of candidate patterns to generate pattern scores for each candidate pattern with respect to each respective column. The candidate patterns may be ranked according to the pattern scores for given column. For example, the patterns may be ranked using an impurity score indicative of the percentage of rows not represented by a pattern and/or a coverage score indicative of a number of columns in a data store for which the pattern applies. A ranked pattern may be manually or automatically selected, which may then be applied to perform data validation of new data accordingly.
    Type: Application
    Filed: November 25, 2020
    Publication date: May 26, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yeye HE, Jie SONG
  • Publication number: 20220156242
    Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.
    Type: Application
    Filed: November 19, 2020
    Publication date: May 19, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yeye HE, Jie SONG, Yue WANG, Surajit CHAUDHURI, Vishal Kumar Seshagirirao ANIL, Yaron Y. GOLAND, Gaurav MALHOTRA, Blake LASSITER
  • Patent number: 11275649
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data error detection, according to embodiments of the present invention. In one embodiment, a target data set having a plurality of values for which to identify incompatible data is obtained. A pattern for each of the plurality of values is generated using at least one generalization language. A pair of patterns that represent a pair of values is utilized to identify a compatibility indicator that corresponds with a pair of training patterns in a compatibility index that match the pair of patterns. The compatibility indicator indicates the pair of patterns are incompatible with one another based on a statistical analysis performed in association with a corpus of data external to the target data set. An indication that the values are incompatible with one another is provided.
    Type: Grant
    Filed: January 19, 2018
    Date of Patent: March 15, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Huang Zhipeng