Patents by Inventor Varun Mithal

Varun Mithal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

GENERATING RULE LISTS FROM TREE ENSEMBLE MODELS

Publication number: 20240070549

Abstract: Systems and methods for extracting rule lists from tree ensembles are provided. A system extracts first stage candidate rules from individual trees. The system identifies the first stage candidate rules that satisfy a precision threshold and places those rules in a solution set. Subsequently, a determination is made whether a further stage is needed based on whether a predetermined number of positive data samples of the data set are covered by the solution set. In the further stage, the system generates next stage candidate rules from previous stage candidate rules that have not been pruned and identifies the next stage candidate rules that satisfy the precision threshold, placing those rules in the solution set. A simplified rule list is generated by identifying a minimum subset of rules in the solution set that covers the positive data samples within the precision threshold.

Type: Application

Filed: August 24, 2022

Publication date: February 29, 2024

Inventors: Gopiram Roshan Lal, Varun Mithal, Xiaotong Chen
Multilabel learning with label relationships

Patent number: 11769087

Abstract: Machine learning based method for multilabel learning with label relationships is provided. This methodology addresses the technical problem of alleviating computational complexity of training a machine learning model that generates multilabel output with constraints, especially in contexts characterized by a large volume of data, by providing a new formulation that encodes probabilistic relationships among the labels as a regularization parameter in the training objective of the underlying model. For example, the training process of the model may be configured to have two objectives. Namely, in addition to the objective of minimizing conventional multilabel loss, there is another training objective, which is to minimize penalty associated with the prediction generated by the model breaking probabilistic relationships among the labels.

Type: Grant

Filed: June 4, 2020

Date of Patent: September 26, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Girish Kathalagiri Somashekairah, Varun Mithal, Aman Grover
Filtering content using generalized linear mixed models

Patent number: 11397899

Abstract: In some embodiments, a computer system selects a first subset of candidate content items based on their filter scores that are generated based on a partial generalized linear mixed model comprising a baseline model and a user-based model, with the baseline model being a generalized linear model, and the user-based model being a random effects model based on user actions by the target user directed towards reference content items related to the candidate content items. In some embodiments, the computer system then selects a second subset from the first subset based on recommendation scores that are generated based on a full generalized linear mixed model comprising the baseline model, the user-based model, and an item-based model, with the item-based model being a random effects model based on user actions directed towards the candidate online content item by reference users related to the target user.

Type: Grant

Filed: March 26, 2019

Date of Patent: July 26, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Huichao Xue, Girish Kathalagiri Somashekariah, Ye Yuan, Varun Mithal, Junrui Xu, Ada Cheuk Ying Yu
MULTILABEL LEARNING WITH LABEL RELATIONSHIPS

Publication number: 20210383306

Abstract: Machine learning based method for multilabel learning with label relationships is provided. This methodology addresses the technical problem of alleviating computational complexity of training a machine learning model that generates multilabel output with constraints, especially in contexts characterized by a large volume of data, by providing a new formulation that encodes probabilistic relationships among the labels as a regularization parameter in the training objective of the underlying model. For example, the training process of the model may be configured to have two objectives. Namely, in addition to the objective of minimizing conventional multilabel loss, there is another training objective, which is to minimize penalty associated with the prediction generated by the model breaking probabilistic relationships among the labels.

Type: Application

Filed: June 4, 2020

Publication date: December 9, 2021

Inventors: Girish Kathalagiri Somashekairah, Varun Mithal, Aman Grover
System user attribute relevance based on activity

Patent number: 11138281

Abstract: Techniques for using online user activity in determining relevance of attributes to improve computer functionality in generating recommendations of online content are disclosed herein. In some embodiments, a computer system calculates a corresponding relevance score for each attribute of a user based on a total number of online postings for which the user has performed at least one of a plurality of online actions within a particular sliding window of time defining a most recent time period, an attribute activity number representing a number of online postings in the plurality of online postings that have the attribute, and an inverse of a frequency value representing how many of a total number of online postings published within the particular sliding window of time have the attribute. In some embodiments, the computer system causes at least one recommendation associated with the user to be displayed based on the calculated relevance scores.

Type: Grant

Filed: May 22, 2019

Date of Patent: October 5, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Vita G. Markman, Ye Yuan, Varun Mithal, Igor Vladimir Yagolnitser
EXTRACTING TITLE HIERARCHY FROM TRAJECTORY DATA

Publication number: 20200410451

Abstract: Disclosed are systems, methods, and non-transitory computer-readable media extracting title hierarchy from trajectory data. A computing system generates a title hierarchy using a graph of connected nodes generated based on career trajectory data. Each distinct node in the graph represents a unique employment title identified in the career trajectory data. Connections established among pairs of nodes in the graph indicate user transitions among the employment titles associated with the nodes and edge values assigned to the connections indicate the number of users that transitioned from the employment titles associated with the nodes in the pair of nodes. The edge values are used to assign seniority values to each node in the graph, for example, by performing a topological sort of the nodes in the graph. The seniority values are used to establish the title hierarchy.

Type: Application

Filed: June 27, 2019

Publication date: December 31, 2020

Inventor: Varun Mithal
TECHNIQUE FOR LEVERAGING WEAK LABELS FOR JOB RECOMMENDATIONS

Publication number: 20200409960

Abstract: Described herein are methods and systems for using weak labels to train a model for use in identifying job listings that are relevant to a user of an online job hosting service. The weak labels correspond with various user actions that a user has undertaken with respect to job listings presented to the user. By way of example, the relevant user actions may include: Job Applies, Job Saves, Job Views, Job Skips and Job Dismisses.

Type: Application

Filed: June 27, 2019

Publication date: December 31, 2020

Inventors: Varun Mithal, Girish Kathalagiri Somashekariah
SYSTEM USER ATTRIBUTE RELEVANCE BASED ON ACTIVITY

Publication number: 20200372090

Abstract: Techniques for using online user activity in determining relevance of attributes to improve computer functionality in generating recommendations of online content are disclosed herein. In some embodiments, a computer system calculates a corresponding relevance score for each attribute of a user based on a total number of online postings for which the user has performed at least one of a plurality of online actions within a particular sliding window of time defining a most recent time period, an attribute activity number representing a number of online postings in the plurality of online postings that have the attribute, and an inverse of a frequency value representing how many of a total number of online postings published within the particular sliding window of time have the attribute. In some embodiments, the computer system causes at least one recommendation associated with the user to be displayed based on the calculated relevance scores.

Type: Application

Filed: May 22, 2019

Publication date: November 26, 2020

Inventors: Vita G. Markman, Ye Yuan, Varun Mithal, Igor Vladimir Yagolnitser
SELECTING RECOMMENDATIONS BASED ON TITLE TRANSITION EMBEDDINGS

Publication number: 20200311162

Abstract: The disclosed embodiments provide a system for selecting recommendations based on title transition embeddings. During operation, the system obtains a word embedding model of a set of job histories. Next, the system calculates similarities between pairs of the embeddings produced by the word embedding model from attributes associated with titles in the set of job histories. The system then identifies, based on the similarities, job titles with high similarity to a current title of the candidate. Finally, the system outputs the job titles for use in selecting job recommendations for the candidate.

Type: Application

Filed: March 28, 2019

Publication date: October 1, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Junrui Xu, Meng Meng, Girish Kathalagiri Somashekariah, Huichao Xue, Varun Mithal, Ada Cheuk Ying Yu
SYSTEM USER ATTRIBUTE DISAMBIGUATION BASED ON COHORT

Publication number: 20200311157

Abstract: In some embodiments, a computer system determines that online postings belong to a cohort based on the postings having an attribute of the cohort, identifies skills from the postings, determines that a user belongs to the cohort based on a determination that a profile of the user includes the attribute(s) of the cohort, determines that one or more of the skills is stored in association with the profile, determines a user confidence score that indicates a relevance level of the skill to the user for each one of the one or more of the skills, determines a cohort confidence score for each one of the one or more of the skills based on how many of the postings include the skill, and displays a recommendation associated based on a combination of the user confidence score and the cohort confidence score for at least a portion of the skills.

Type: Application

Filed: March 28, 2019

Publication date: October 1, 2020

Inventors: Ye Yuan, Girish Kathalagiri Somashekariah, Huichao Xue, Varun Mithal, Ada Cheuk Ying Yu, Junrui Xu
FILTERING CONTENT USING GENERALIZED LINEAR MIXED MODELS

Publication number: 20200311568

Abstract: In some embodiments, a computer system selects a first subset of candidate content items based on their filter scores that are generated based on a partial generalized linear mixed model comprising a baseline model and a user-based model, with the baseline model being a generalized linear model, and the user-based model being a random effects model based on user actions by the target user directed towards reference content items related to the candidate content items. In some embodiments, the computer system then selects a second subset from the first subset based on recommendation scores that are generated based on a full generalized linear mixed model comprising the baseline model, the user-based model, and an item-based model, with the item-based model being a random effects model based on user actions directed towards the candidate online content item by reference users related to the target user.

Type: Application

Filed: March 26, 2019

Publication date: October 1, 2020

Inventors: Huichao Xue, Girish Kathalagiri Somashekariah, Ye Yuan, Varun Mithal, Junrui Xu, Ada Cheuk Ying Yu
Classification of highly-skewed data

Patent number: 10776713

Abstract: A method for identifying highly-skewed classes using an imperfect annotation of every instance together with a set of features for all instances. The imperfect annotations designate a plurality of instances as belonging to the target rare class and others to the majority class. First, a classifier is trained on the set of features using the imperfect annotation as supervision, to designate each instance to either the rare class or majority class. A combination of the predictions from the trained classifier and the imperfect annotations is then used to classify each instance to either the rare class or majority class. In particular, an instance is classified to the rare class only when both the trained classifier and the imperfect annotation classify the instance to the rare class. Finally, for each instance assigned as a rare class instance by the combination stage, all instances in its neighborhood are re-classified as either rare class or majority class.

Type: Grant

Filed: April 25, 2016

Date of Patent: September 15, 2020

Assignee: Regents of the University of Minnesota

Inventors: Vipin Kumar, Varun Mithal, Guruprasad Nayak, Ankush Khandelwal
MACHINE-LANGUAGE-BASED MODEL FOR IDENTIFYING PEERS ON AN ONLINE SOCIAL NETWORK

Publication number: 20180315132

Abstract: Among other things, embodiments of the present disclosure discussed herein help to identify peers of various individuals and organizations who are members of an online social network. Groups of peers may be identified based on various criteria, and some embodiments may generate a probability score reflecting a confidence level that two or more members of the online social network are peers of one another.

Type: Application

Filed: April 28, 2017

Publication date: November 1, 2018

Inventors: Aibo Tian, Varun Mithal, Suman Sundaresh, Cissy Chen, Bowen Meng, Lanxiao Xu
SIMULTANEOUS ESTIMATION OF LOCATION ELEVATIONS AND WATER LEVELS

Publication number: 20180130193

Abstract: A method improves automated water body extent determinations using satellite sensor values and includes a processor receiving a time-sequence of land cover labels for a plurality of geographic areas represented by pixels in the satellite sensor values. The processor alternates between ordering the geographic areas based on a water level estimates at each time point in the time sequence such that the order of the geographic areas reflects an estimate of the relative elevations of the geographic areas and updating the water level estimates based on the land cover labels for the geographic areas. A final ordering of the geographic areas and a final water level estimate are used to correct the time-sequence of land cover labels.

Type: Application

Filed: November 8, 2017

Publication date: May 10, 2018

Inventors: Varun Mithal, Ankush Khandelwal, Vipin Kumar
CLASSIFICATION OF HIGHLY-SKEWED DATA

Publication number: 20160314411

Abstract: A method for identifying highly-skewed classes using an imperfect annotation of every instance together with a set of features for all instances. The imperfect annotations designate a plurality of instances as belonging to the target rare class and others to the majority class. First, a classifier is trained on the set of features using the imperfect annotation as supervision, to designate each instance to either the rare class or majority class. A combination of the predictions from the trained classifier and the imperfect annotations is then used to classify each instance to either the rare class or majority class. In particular, an instance is classified to the rare class only when both the trained classifier and the imperfect annotation classify the instance to the rare class. Finally, for each instance assigned as a rare class instance by the combination stage, all instances in its neighborhood are re-classified as either rare class or majority class.

Type: Application

Filed: April 25, 2016

Publication date: October 27, 2016

Inventors: Vipin Kumar, Varun Mithal, Guruprasad Nayak, Ankush Khandelwal
Unsupervised spatio-temporal data mining framework for burned area mapping

Patent number: 9478038

Abstract: A method reduces processing time required to identify locations burned by fire by receiving a feature value for each pixel in an image, each pixel representing a sub-area of a location. Pixels are then grouped based on similarities of the feature values to form candidate burn events. For each candidate burn event, a probability that the candidate burn event is a true burn event is determined based on at least one further feature value for each pixel in the candidate burn event. Candidate burn events that have a probability below a threshold are removed from further consideration as burn events to produce a set of remaining candidate burn events.

Type: Grant

Filed: March 30, 2015

Date of Patent: October 25, 2016

Assignee: Regents of the University of Minnesota

Inventors: Shyam Boriah, Vipin Kumar, Varun Mithal, Ankush Khandelwal
UNSUPERVISED SPATIO-TEMPORAL DATA MINING FRAMEWORK FOR BURNED AREA MAPPING

Publication number: 20150278603

Abstract: A method reduces processing time required to identify locations burned by fire by receiving a feature value for each pixel in an image, each pixel representing a sub-area of a location. Pixels are then grouped based on similarities of the feature values to form candidate burn events. For each candidate burn event, a probability that the candidate burn event is a true burn event is determined based on at least one further feature value for each pixel in the candidate burn event. Candidate burn events that have a probability below a threshold are removed from further consideration as burn events to produce a set of remaining candidate burn events.

Type: Application

Filed: March 30, 2015

Publication date: October 1, 2015

Applicant: Regents of the University of Minnesota

Inventors: Shyam Boriah, Vipin Kumar, Varun Mithal, Ankush Khandelwal
Automated mapping of land cover using sequences of aerial imagery

Patent number: 8958603

Abstract: A system has an aerial image database containing sensor data representing a plurality of aerial images of an area having multiple sub-areas. A processor applies a classifier to the sensor values to identify a label for each sub-area in each aerial image and to thereby generate an initial label sequence for each sub-area. The processor identifies a most likely land cover state for each sub-area based on the initial label sequence, a confusion matrix and a transition matrix. For each sub-area, the processor stores the most likely land cover state sequence for the sub-area.

Type: Grant

Filed: March 15, 2013

Date of Patent: February 17, 2015

Assignee: Regents of the University of Minnesota

Inventors: Shyam Boriah, Ankush Khandelwal, Vipin Kumar, Varun Mithal, Karsten Steinhaeuser
Automated Mapping of Land Cover Using Sequences of Aerial Imagery

Publication number: 20140212055

Abstract: A system has an aerial image database containing sensor data representing a plurality of aerial images of an area having multiple sub-areas. A processor applies a classifier to the sensor values to identify a label for each sub-area in each aerial image and to thereby generate an initial label sequence for each sub-area. The processor identifies a most likely land cover state for each sub-area based on the initial label sequence, a confusion matrix and a transition matrix. For each sub-area, the processor stores the most likely land cover state sequence for the sub-area.

Type: Application

Filed: March 15, 2013

Publication date: July 31, 2014

Inventors: Shyam Boriah, Ankush Khandelwal, Vipin Kumar, Varun Mithal, Karsten Steinhaeuser