Patents by Inventor Yeye He
Yeye He has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11928564Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.Type: GrantFiled: October 19, 2022Date of Patent: March 12, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Cong Yan
-
Patent number: 11886457Abstract: A transform-by-pattern (TBP) system is configured to proactively suggest relevant TBP programs based on inputted source dataset and target dataset without requiring users typing in examples. The TBP system has access to multiple TBP programs, each of which includes a combination of a source pattern, a target pattern, and a transformation program that is configured to transform data that fits into the target pattern into data that fits into the source pattern. When a source dataset and a target dataset are received from a user, the TBP system identifies a subset of the source dataset and a subset of the target dataset as related data. The TBP system then identifies one or more applicable TBP programs amongst the multiple TBP programs, and suggest or apply at least one of the one or more applicable TBP programs.Type: GrantFiled: May 29, 2020Date of Patent: January 30, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Surajit Chaudhuri, Zhongjun Jin
-
Publication number: 20240028607Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.Type: ApplicationFiled: September 28, 2023Publication date: January 25, 2024Inventors: Yeye HE, Kris K. GANJAM, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
-
Patent number: 11880344Abstract: Methods and systems for generating multi-operator data transformation pipelines. An example method includes accessing raw data for transformation; receiving a selection of a target table or target visualization, wherein the target table or target visualization is for data other than the raw data; extracting table properties and target constraints; and based on the extracted table properties and target constraints, synthesizing one or more multi-operator data transformation pipelines for transforming the raw data to a generated table or generated visualization.Type: GrantFiled: May 14, 2021Date of Patent: January 23, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Yeye He, Surajit Chaudhuri, Junwen Yang
-
Publication number: 20230368068Abstract: The present disclosure relates to systems, methods, and computer-readable media for training and implementing pipeline error detection models to facilitate automated detection of data quality (DQ) issues within recurring data pipelines. For example, systems described herein involve training a pipeline error detection model by first constructing a plurality of DQ constraints for a recurring data pipeline based on ranges of values observed over a history of pipeline executions. The systems may further train the model to predict DQ issues by synthetically applying data variants to historical executions of the recurring data pipeline or to data pipelines having similar characteristics thereto. Once trained, the pipeline error detection model(s) can be applied to new executions of the data pipeline as they become available to quickly and efficiently predict whether a given execution includes a predicted DQ issue therein.Type: ApplicationFiled: May 12, 2022Publication date: November 16, 2023Inventors: Yeye HE, Weiwei CUI, Song GE, Haidong ZHANG, Shi HAN, Dongmei ZHANG, Surajit CHAUDHURI
-
Patent number: 11809442Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.Type: GrantFiled: April 13, 2020Date of Patent: November 7, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri
-
Patent number: 11809223Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.Type: GrantFiled: November 8, 2021Date of Patent: November 7, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Xu Chu
-
Patent number: 11714790Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.Type: GrantFiled: September 30, 2021Date of Patent: August 1, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Meiyalagan Balasubramanian, Lengning Liu, Aditya Kuppa, Kirk Hartmann Freiheit, Kalen Wong, Paula Budig Greve, Patrick Clinton Little, Lucas Pritz, Yue Wang, Vivek Ravindranath Narasayya, Katchaguy Areekijseree, Yeye He, Surajit Chaudhuri, Gaurav Ghosh
-
Patent number: 11698892Abstract: The present disclosure relates to systems, methods, and computer-readable media for using a variety of hypothesis tests to identify errors within tables and other structured datasets. For example, systems disclosed herein can generate a modified table from an input table by removing one or more entries from the input table. The systems disclosed herein can further leverage a collection of training tables to determine probabilities associated with whether the input table and modified table are drawn from the collection of training tables. The systems disclosed herein can additionally compare the probabilities to accurately determine whether the one or more entries include errors therein. The systems disclosed herein may apply to a variety of different sizes and types of tables to identify different types of common errors within input tables.Type: GrantFiled: October 25, 2021Date of Patent: July 11, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Pei Wang
-
Publication number: 20230098926Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.Type: ApplicationFiled: September 30, 2021Publication date: March 30, 2023Inventors: Meiyalagan BALASUBRAMANIAN, Lengning LIU, Aditya KUPPA, Kirk Hartmann FREIHEIT, Kalen WONG, Paula Budig GREVE, Patrick Clinton LITTLE, Lucas PRITZ, Yue WANG, Vivek Ravindranath NARASAYYA, Katchaguy AREEKIJSEREE, Yeye HE, Surajit CHAUDHURI
-
Patent number: 11586838Abstract: Systems and techniques for end-to-end fuzzy entity matching are described herein. A first input and a second input may be received. The first input and the second input may be evaluated to identify common attribute types. A set of attribute entity matching models may be selected that correspond to the attribute types. The first input and the second input may be evaluated using the set of attribute entity matching models to determine a set of weighted scores for attribute pairs in the first input and the second input. The set of weighted scores may be evaluated using a table-level entity matching model to identify a common entity included in the first input and the second input. A linking dataset may be generated that includes a cross-linking facility indicating a relationship between a first entity descriptor in the first input and a second entity descriptor in the second input.Type: GrantFiled: July 2, 2019Date of Patent: February 21, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Chen Zhao
-
Publication number: 20230043015Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.Type: ApplicationFiled: October 19, 2022Publication date: February 9, 2023Inventors: Yeye He, Cong Yan
-
Patent number: 11520800Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.Type: GrantFiled: June 19, 2020Date of Patent: December 6, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Kris Ganjam, Yeye He, Vivek Ravindranath Narasayya, Surajit Chaudhuri
-
Publication number: 20220365910Abstract: Methods and systems for generating multi-operator data transformation pipelines. An example method includes accessing raw data for transformation; receiving a selection of a target table or target visualization, wherein the target table or target visualization is for data other than the raw data; extracting table properties and target constraints; and based on the extracted table properties and target constraints, synthesizing one or more multi-operator data transformation pipelines for transforming the raw data to a generated table or generated visualization.Type: ApplicationFiled: May 14, 2021Publication date: November 17, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Yeye HE, Surajit CHAUDHURI, Junwen YANG
-
Patent number: 11488068Abstract: Systems are provided for facilitating the building and use of models used to make data preparation recommendations. The systems identify ground truth from a plurality of notebooks and utilizes the ground truth to generate the corresponding data preparation recommendation models. The data preparation recommendation models are used to predict accurate (e.g., useful and relevant) data preparations steps based on user input and user notebook data. The data preparation computing system generates a recommendation prompt based on output from the data preparation recommendation model that can be viewed and/or selected by the user to be applied to the user's notebook data.Type: GrantFiled: May 28, 2020Date of Patent: November 1, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Cong Yan
-
Publication number: 20220318221Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.Type: ApplicationFiled: June 23, 2022Publication date: October 6, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Yeye HE, Jie SONG, Yue WANG, Surajit CHAUDHURI, Vishal Kumar Seshagirirao ANIL, Yaron Y. GOLAND, Gaurav MALHOTRA, Blake LASSITER
-
Patent number: 11397716Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.Type: GrantFiled: November 19, 2020Date of Patent: July 26, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Kumar Seshagirirao Anil, Yaron Y. Goland, Gaurav Malhotra, Blake Lassiter
-
Publication number: 20220164325Abstract: Aspects of the present disclosure relate to data validation using inferred patterns. Columns of a data store may be processed to generate a set of candidate patterns for each respective column, which may be combined to form a combined set of candidate patterns. Columns of the data store may then be processed using the combined set of candidate patterns to generate pattern scores for each candidate pattern with respect to each respective column. The candidate patterns may be ranked according to the pattern scores for given column. For example, the patterns may be ranked using an impurity score indicative of the percentage of rows not represented by a pattern and/or a coverage score indicative of a number of columns in a data store for which the pattern applies. A ranked pattern may be manually or automatically selected, which may then be applied to perform data validation of new data accordingly.Type: ApplicationFiled: November 25, 2020Publication date: May 26, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Yeye HE, Jie SONG
-
Publication number: 20220156242Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.Type: ApplicationFiled: November 19, 2020Publication date: May 19, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Yeye HE, Jie SONG, Yue WANG, Surajit CHAUDHURI, Vishal Kumar Seshagirirao ANIL, Yaron Y. GOLAND, Gaurav MALHOTRA, Blake LASSITER
-
Patent number: 11275649Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data error detection, according to embodiments of the present invention. In one embodiment, a target data set having a plurality of values for which to identify incompatible data is obtained. A pattern for each of the plurality of values is generated using at least one generalization language. A pair of patterns that represent a pair of values is utilized to identify a compatibility indicator that corresponds with a pair of training patterns in a compatibility index that match the pair of patterns. The compatibility indicator indicates the pair of patterns are incompatible with one another based on a statistical analysis performed in association with a corpus of data external to the target data set. An indication that the values are incompatible with one another is provided.Type: GrantFiled: January 19, 2018Date of Patent: March 15, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Yeye He, Huang Zhipeng