Patents by Inventor Arun Kumar Jagota

Arun Kumar Jagota has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DETERMINING RATIONALE FOR A PREDICTION OF A MACHINE LEARNING BASED MODEL

Publication number: 20210241047

Abstract: An online system performs predictions for real-time tasks and near real-time tasks that need to be performed by a deadline. A client device receives a real-time machine learning based model associated with a measure of accuracy. If the client device determines that a task can be performed using predictions having less than the specified measure of accuracy, the client device uses the real-time machine learning based model. If the client device determines that a higher level of accuracy of results is required, the client device sends a request to an online system. The online system provides a prediction along with a string representing a rationale for the prediction.

Type: Application

Filed: January 31, 2020

Publication date: August 5, 2021

Inventors: Rakesh Ganapathi Karanth, Arun Kumar Jagota, Kaushal Bansal, Amrita Dasgupta
REAL-TIME PREDICTIONS BASED ON MACHINE LEARNING MODELS

Publication number: 20210241179

Abstract: An online system performs predictions for real-time tasks and near real-time tasks that need to be performed by a deadline. A client device receives a real-time machine learning based model associated with a measure of accuracy. If the client device determines that a task can be performed using predictions having less than the specified measure of accuracy, the client device uses the real-time machine learning based model. If the client device determines that a higher level of accuracy of results is required, the client device sends a request to an online system. The online system provides a prediction along with a string representing a rationale for the prediction.

Type: Application

Filed: January 30, 2020

Publication date: August 5, 2021

Inventors: Rakesh Ganapathi Karanth, Arun Kumar Jagota, Kaushal Bansal, Amrita Dasgupta
ADAPTIVE MATCH INDEXES

Publication number: 20210232637

Abstract: Determine first count of first records storing first value in first field, second count of second records storing second value in second field, third count of third records storing third value in third field. Determine count threshold using first, second and third counts, dispersion measure based on dispersion of values stored in second field by first records and other dispersion measure based on other dispersion of values stored in third field by first records. Train machine-learning model to determine dispersion measure threshold based on dispersion and other dispersion measures. If first count is greater than count threshold, and dispersion measure is greater than dispersion measure threshold, create match index based on first and second fields. Receive prospective record storing first value in first field, second value in second field. Use match index to identify record storing first value in first field, second value in second field as matching prospective record.

Type: Application

Filed: January 29, 2020

Publication date: July 29, 2021

Inventors: Arun Kumar Jagota, Ajitesh Jain, Rahul Mathias Madan, Shravani Madhavaram
ADAPTIVE RECOGNITION OF ENTITIES

Publication number: 20210224482

Abstract: A system receives a record which includes a string and separates the string into a number of tokens, including a token and another token. The system identifies a pattern that includes an entity, another entity, and a number of entities that equals the number of tokens, and another pattern that includes the same number of entities as the number of tokens. The system determines a combined probability that combines a probability based on the number of entries in the entity's dictionary which stores the token, and another probability based on a number of character types in the other entity that match characters in the other token. If the combined probability associated with the pattern is greater than another combined probability associated with the other pattern, the system matches the record to a system record based on recognizing the token as the entity and the other token as the other entity.

Type: Application

Filed: January 22, 2020

Publication date: July 22, 2021

Inventors: Arun Kumar Jagota, Ajitesh Jain
DISCOVERING SUSPICIOUS PERSON PROFILES

Publication number: 20210224614

Abstract: A model is trained to create a probability distribution of counts based on counts of distinct values stored by person profiles in a field. The model is trained to create another probability distribution of counts based on other counts of other distinct values stored by the person profiles in another field. The count of distinct values stored by a person profile in the field is identified. Another count of distinct values stored by the person profile in the other field is identified. A score is determined based on a cumulative distribution function of the count under the probability distribution of counts. Another score is determined based on the cumulative distribution function of the other count under the other probability distribution of counts. If the score and the other score combine in an overall score that satisfies a threshold, a message is output about the person profile being suspected of corruption.

Type: Application

Filed: January 17, 2020

Publication date: July 22, 2021

Inventor: Arun Kumar Jagota
Trie-based normalization of field values for matching

Patent number: 11016959

Abstract: A system tokenizes values stored in a field by multiple records. The system creates a trie from the tokenized values, each branch in the trie labeled with one of the tokenized values, each node storing a count indicating the number of the multiple records associated with a tokenized value sequence beginning from a root of the trie. The system tokenizes a value stored in the field by a prospective record. Beginning from the root of the trie, the system identifies each node corresponding to a token value sequence for the prospective record's tokenized value. Beginning from the most recently identified node for the prospective record's token value sequence, the system identifies each extending node which stores a count that satisfies a threshold, each identified extending node corresponding to another token value sequence. The system uses the other token value sequence to identify one of the multiple records that matches the prospective record.

Type: Grant

Filed: January 31, 2018

Date of Patent: May 25, 2021

Assignee: salesforce.com, inc.

Inventors: Arun Kumar Jagota, Ajitesh Jain, Dmytro Kudriavtsev
Machine learning from data steward feedback for data matching

Patent number: 11010771

Abstract: A system determines factored score by multiplying factor and match score for values of field in two records, offset score by adding offset to factored score, and weighted score by applying weight to offset score. The system determines status for two records based on combining weighted score with other weighted score corresponding to other field of two records. The system revises factor, offset, and weight based on feedback associated with two records. The system determines revised factored score by multiplying revised factor and match score for other values of field in two other records, revised offset score by adding revised offset to revised factored score, and revised weighted score by applying revised weight to revised offset score. The system determines learned status for two other records based on combining revised weighted score with additional weighted score corresponding to other field for two other records.

Type: Grant

Filed: January 31, 2019

Date of Patent: May 18, 2021

Assignee: salesforce.com, inc.

Inventors: Arun Kumar Jagota, Piranavan Selvanandan
GENERATING ADAPTIVE MATCH KEYS BASED ON ESTIMATING COUNTS

Publication number: 20210124779

Abstract: A system creates a graph of nodes connected by edges, the nodes including: i) a first node associated with a first value and a count of the first value, and ii) a second node associated with a second value and a count of the second value, the edges including an edge that connects the first and second nodes and is associated with a count of instances of the first value being stored with the second value. The system includes each node and each associated with clique count less than clique threshold in keys sets and deletes each node and each edge associated with clique count less than clique threshold. The system identifies triplet nodes connected by triplet edges. If estimated clique count for triplet values represented by triplet nodes is less than clique threshold, the system includes triplet values in keys set and identify triplet of nodes as analyzed.

Type: Application

Filed: October 23, 2019

Publication date: April 29, 2021

Inventor: Arun Kumar Jagota
Dense subset clustering

Patent number: 10956450

Abstract: Some embodiments of the present invention include a method for determining a dense subset from a group of records using a graphical representation of the group of records, the graphical representation having nodes and edges, a node associated with a record from the group of records, an edge connecting two nodes associated with two related records, wherein a node is associated with a weight corresponding to a number of edges connected to the node, wherein a record is added to the dense subset based on its associated node having a highest weight and a density that satisfies a density threshold, the density being based on the content of the dense subset, and wherein the content of the dense subset is to be processed as including duplicate records.

Type: Grant

Filed: March 28, 2016

Date of Patent: March 23, 2021

Assignee: salesforce.com, inc.

Inventors: Dai Duong Doan, Arun Kumar Jagota
Cross objects de-duplication

Patent number: 10949395

Abstract: Some embodiments of the present invention include a method for determining duplicate records in multiple objects and may include combining records associated with a first object with records associated with a second object to generate a third object, wherein the first object is related to the second object; performing de-duplication on the third object to generate a combined group of duplicate sets; and from the combined group of duplicate sets, identifying at least one duplicate set associated with both the first object and the second object based on the duplicate set having at least one record associated with the first object and at least one record associated with the second object.

Type: Grant

Filed: March 30, 2016

Date of Patent: March 16, 2021

Assignee: salesforce.com, inc.

Inventors: Dai Duong Doan, Arun Kumar Jagota, Chenghung Ker, Parth Vaishnav, Danil Dvinov, Dmytro Kudriavtsev
MACHINE-LEARNT FIELD-SPECIFIC TOKENIZATION

Publication number: 20210034596

Abstract: A training set is created via creating adjacent classified substrings by using character classes to replace corresponding characters in adjacent substrings in each training character string, and associating each pair of adjacent classified substrings and each pair of adjacent substrings with corresponding labels indicating whether corresponding pairs include any token boundary. The system splits input character string into beginning and ending parts and creates classified beginning part by replacing beginning part character with corresponding class and classified ending part by replacing ending part character with corresponding class. The machine-learning model determines probability of token identification, based on training set to determine count of instances that classified beginning part is paired with classified ending part and count of corresponding labels that indicate inclusion of any token boundary.

Type: Application

Filed: July 30, 2019

Publication date: February 4, 2021

Applicant: salesforce.com, inc.

Inventor: Arun Kumar Jagota
MACHINE-LEARNT FIELD-SPECIFIC STANDARDIZATION

Publication number: 20210034638

Abstract: A system tokenizes raw values and corresponding standardized values into raw token sequences and corresponding standardized token sequences. A machine-learning model learns standardization from token insertions and token substitutions that modify the raw token sequences to match the corresponding standardized token sequences. The system tokenizes an input value into an input token sequence. The machine-learning model determines a probability of inserting an insertion token after an insertion markable token in the input token sequence. If the probability of inserting the insertion token satisfies a threshold, the system inserts the insertion token after the insertion markable token in the input token sequence. The machine-learning model determines a probability of substituting a substitution token for a substitutable token in the input token sequence.

Type: Application

Filed: July 31, 2019

Publication date: February 4, 2021

Applicant: salesforce.com, inc.

Inventors: Arun Kumar Jagota, Stanislav Georgiev
Account recommendations for user account sets

Patent number: 10909575

Abstract: New account recommendations for user account sets are described. A system creates an accounts profile for a set of accounts based on multiple attributes associated with each account of the set of accounts. The system calculates an account score for an account based on comparing multiple attributes associated with the account against the accounts profile, wherein the account is not in the set of accounts. The system determines whether the account score satisfies an account score threshold. The system recommends the account to a user associated with the set of accounts if the account score satisfies the account score threshold.

Type: Grant

Filed: June 25, 2015

Date of Patent: February 2, 2021

Assignee: salesforce.com, inc.

Inventors: Arun Kumar Jagota, Sancho S. Pinto, Saurin G. Shah, Stanislav Georgiev
Optimized subset processing for de-duplication

Patent number: 10901996

Abstract: Some embodiments of the present invention include a method for identifying duplicate records from a group of records in a database system.

Type: Grant

Filed: February 24, 2016

Date of Patent: January 26, 2021

Assignee: salesforce.com, inc.

Inventors: Dai Duong Doan, Arun Kumar Jagota, Chenghung Ker, Parth Vaishnav, Danil Dvinov, Dmytro Kudriavtsev
ESTIMATING THE NUMBER OF DISTINCT ENTITIES FROM A SET OF RECORDS OF A DATABASE SYSTEM

Publication number: 20200401595

Abstract: A method and system for estimating a number of distinct entities in a set of records are described. For each one of a subset of records, a set of match rule keys are generated based on a set of match rules. Each match rule from the set of match rules defines a match between records, and each match rule key from the set of match rule keys includes at least a key field value. A high order key for the record is determined based on the match rule keys, and a counter associated with the high order key is incremented. When each record from the subset of records has been processed by determining the match rule keys, and incrementing the counter(s) of the high order keys, a sum of a number of counters that have a non-zero value is performed to estimate the distinct entities in the records.

Type: Application

Filed: June 21, 2019

Publication date: December 24, 2020

Applicant: Salesforce.com, inc.

Inventor: Arun Kumar Jagota
METHOD AND A SYSTEM FOR FUZZY MATCHING OF ENTITIES IN A DATABASE SYSTEM BASED ON MACHINE LEARNING

Publication number: 20200401587

Abstract: A method and system of matching field values of a field type are described. Blurring operations are applied on a first and second values to obtain blurred values. A first maximum score is determined from first scores for blurred values, where each one of the first scores is indicative of a confidence that a match of the first and the second values occurs with knowledge of a first blurred value. A second maximum score is determined from second scores for the blurred values, where each one of the second scores is indicative of a confidence that a non-match of the first and the second values occurs with knowledge of the first blurred value. Responsive to determining that the first maximum score is greater than the second maximum score, an indication that the first value matches the second value is output.

Type: Application

Filed: June 21, 2019

Publication date: December 24, 2020

Applicant: salesforce.com, inc.

Inventor: Arun Kumar Jagota
EFFICIENTLY AND ACCURATELY ASSESSING THE NUMBER OF IDENTIFIABLE RECORDS FOR CREATING PERSONAL PROFILES

Publication number: 20200356574

Abstract: A system determines a name probability based on a first name dataset frequency of a first name value stored by a first name field in a personal record and a last name dataset frequency of a last name value stored by a last name field in a personal record. The system determines at least one other probability based on another dataset frequency of another value stored by another field in the personal record and an additional dataset frequency of an additional value stored by an additional field in the personal record. The system determines a combined probability based on the name probability and the at least one other probability. The system increments a count of identifiable personal records for each personal record that has a corresponding combined probability that satisfies an identifiability threshold. The system outputs a message based on the count of identifiable personal records.

Type: Application

Filed: May 10, 2019

Publication date: November 12, 2020

Applicant: salesforce.com, inc.

Inventors: Arun Kumar Jagota, Stanislav Georgiev
Match index creation

Patent number: 10817465

Abstract: A system identifies a first number of distinct values stored in a first field by a dataset of records. The system identifies a second number of distinct values stored in a second field by the dataset of records. The system creates a trie from values stored in a field by multiple records, the field corresponding to the first field or the second field, based on comparing the first number to the second number. The system associates a node in the trie with one of the multiple records, based on a value stored in the field by the record. The system identifies a branch sequence in the trie as a key for a prospective record, based on a prospective value stored in a corresponding field by the prospective record. The system uses the key for the prospective record to identify one of the multiple records that matches the prospective record.

Type: Grant

Filed: April 25, 2017

Date of Patent: October 27, 2020

Assignee: salesforce.com, inc.

Inventors: Arun Kumar Jagota, Dmytro Kudriavtsev
Recommending data providers' datasets based on database value densities

Patent number: 10817479

Abstract: Recommending data providers' datasets based on database value densities is described. A database system determines a provider dataset density for a value by identifying a frequency of the value in a dataset that is provided by a data provider. The database system determines a user database density for the value by identifying a frequency of the value in a database used by a data user. The database system determines a relative density based on a relationship between the provider dataset density and the user database density. The database system determines an evaluation metric for the value, based on a combination of the relative density and the user database density. The database system causes a recommendation to be outputted, based on a relationship of the evaluation metric relative to other evaluation metrics for other values, which recommends that the data user acquire at least a part of the dataset.

Type: Grant

Filed: June 23, 2017

Date of Patent: October 27, 2020

Assignee: salesforce.com, inc.

Inventors: Arun Kumar Jagota, Marc Joseph Delurgio, Venkata Murali Tejomurtula
Augmenting match indices

Patent number: 10817549

Abstract: System creates three tries based on values stored in first three fields by records. System associates node in third trie with record, based on value stored in third field by record. System associates node with first dispersion measure, based on values stored in first field by records associated with node, and with second dispersion measure, based on values stored in second field by records associated with node. System identifies branch sequence in third trie as key for prospective record, based on value stored in third field by prospective record. System uses key to identify a subset of records that match prospective record. If a count of the subset exceeds threshold, the system identifies other branch sequence in first trie or second trie as other key for prospective record, based on first dispersion measure and second dispersion measure. System uses the key and the other key to identify at least one record that matches prospective record.

Type: Grant

Filed: May 9, 2017

Date of Patent: October 27, 2020

Assignee: salesforce.com, inc.

Inventors: Arun Kumar Jagota, Dmytro Kudriavtsev

prev 1 2 3 4 5 next