Patents by Inventor Hima Patel

Hima Patel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

IN-DATABASE DATA CLEANSING

Publication number: 20250139069

Abstract: A cleansing operation defined for a data structure of a database managed by a database management system is obtained. The cleansing operation is performed on data of the data structure to obtain clean data. The cleansing operation that is defined for the data structure and performed on data of the data structure is performed by the database management system.

Type: Application

Filed: October 27, 2023

Publication date: May 1, 2025

Inventors: Pedro Miguel BARBAS, Shaikh Shahriar QUADER, Adrian MAHJOUR, Hima PATEL, Nitin GUPTA
IN-DATABASE DATA CLEANSING AND INDEPENDENT STORE OF CLEAN DATA

Publication number: 20250139068

Abstract: A cleansing operation is performed on data of a data structure to obtain clean data. The clean data is stored as part of the data structure; however, the clean data is independent of the data. A mapping is performed to provide a set of ordered data that includes the data and the clean data.

Type: Application

Filed: October 27, 2023

Publication date: May 1, 2025

Inventors: Pedro Miguel BARBAS, Shaikh Shahriar QUADER, Adrian MAHJOUR, Hima PATEL, Nitin GUPTA
Corpus quality processing for a specified task

Patent number: 12242797

Abstract: Processing within a computing environment is facilitated using a corpus processing system to assess and enhance quality of a corpus of unstructured documents for a specified task. The processing includes referencing, by a corpus processing engine, the corpus of unstructured documents to obtain unstructured document data, and applying, by a corpus quality metrics engine, a set of quality metrics to the document data to obtain a set of quality metric scores. Further, the process includes automatically selecting, by a quality metric selection engine, a subset of task-relevant quality metrics using the quality metric scores and the specified task, and automatically transforming, at least in part, multiple documents of the corpus to remediate one or more identified issues with the documents. The automatically transforming results in remediated documents tuned for the specified task, which are provided for the specified task to be performed.

Type: Grant

Filed: February 6, 2023

Date of Patent: March 4, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shashank Mujumdar, Vitobha Munigala, Hima Patel
Automatically selecting relevant data based on user specified data and machine learning characteristics for data integration

Patent number: 12190215

Abstract: Automatically selecting data for machine learning datasets is provided. The method comprises receiving an input dataset and user-specified data quality metrics. The input dataset is matched to a subset of candidate datasets in a repository according to schema characteristics. A second subset of candidate datasets having a distance from the input dataset above a specified threshold is selected from the first subset of candidate datasets. The second subset of candidate datasets are merged into a merged dataset. Top ranked samples above a specified second threshold are identified from the merged dataset based on the user-specified data quality metrics. The input dataset, augmented with the top ranked samples, is returned to the user.

Type: Grant

Filed: October 25, 2023

Date of Patent: January 7, 2025

Assignee: International Business Machines Corporation

Inventors: Nitin Gupta, Shashank Mujumdar, Ruhi Sharma Mittal, Hima Patel
INTERACTIVE DATASET EXPLORATION AND PREPROCESSING

Publication number: 20240411750

Abstract: A functionality intent is extracted from a natural language input, the functionality intent comprising an operation on a dataset. A portion of source code implementing the functionality intent is generated. Using a result of executing an executable version of the portion of source code on the dataset, a next functionality intent is recommended, the next functionality intent expressed in natural language form.

Type: Application

Filed: June 6, 2023

Publication date: December 12, 2024

Applicant: International Business Machines Corporation

Inventors: Vitobha Munigala, Shanmukha Chaitanya Guttula, Hima Patel
CORPUS QUALITY PROCESSING FOR A SPECIFIED TASK

Publication number: 20240265196

Abstract: Processing within a computing environment is facilitated using a corpus processing system to assess and enhance quality of a corpus of unstructured documents for a specified task. The processing includes referencing, by a corpus processing engine, the corpus of unstructured documents to obtain unstructured document data, and applying, by a corpus quality metrics engine, a set of quality metrics to the document data to obtain a set of quality metric scores. Further, the process includes automatically selecting, by a quality metric selection engine, a subset of task-relevant quality metrics using the quality metric scores and the specified task, and automatically transforming, at least in part, multiple documents of the corpus to remediate one or more identified issues with the documents. The automatically transforming results in remediated documents tuned for the specified task, which are provided for the specified task to be performed.

Type: Application

Filed: February 6, 2023

Publication date: August 8, 2024

Inventors: Shashank MUJUMDAR, Vitobha MUNIGALA, Hima PATEL
GENERATION OF DATA TRANSFORMATIONS USING FINGERPRINTS

Publication number: 20240202573

Abstract: A method, computer program product, and computer system for transforming sets of source data having different formats into respective sets of target data having a same format. N source patterns are determined and respectively describe N different formats in which N sets of source data items are formatted, where N?1. A target format pattern is determined and describes a target format in which a target data items are formatted. N graphs are generated and respectively describe transformations of the N source patterns to the target pattern. Each graph includes multiple transformation paths. Each transformation path transforms the source pattern to the target pattern in a manner that maps source strings in the source pattern to each target string in the target pattern. A single transformation path is selected from the multiple transformation paths resulting in N single transformation paths having been selected.

Type: Application

Filed: December 19, 2022

Publication date: June 20, 2024

Inventors: Nagarjuna Surabathina, Nitin Gupta, Shramona Chakraborty, Hima Patel, Sameep Mehta, Ramkumar Ramalingam, Matu Agarwal
ANNOTATING AND COLLECTING DATA-CENTRIC AI QUALITY METRICS CONSIDERING USER PREFERENCES

Publication number: 20240193166

Abstract: A method, computer program, and computer system are provided for collecting and annotating data based on user preference. Unlabeled data corresponding to one or more entries within a dataset is received. Pseudo-labeled data is generated based on the unlabeled data. Based on one or more quality metrics, each entry from among the pseudo-labeled data is determining to be included within a final dataset. A user is prompted for annotations corresponding to entries of the pseudo-labeled data included within the final dataset. A determination is made as to whether additional data is needed based on comparing the final dataset to the one or more quality metrics, and the additional information is collected if the final dataset does not meet the quality metrics.

Type: Application

Filed: December 9, 2022

Publication date: June 13, 2024

Inventors: Shashank Mujumdar, Ruhi Sharma Mittal, Nitin Gupta, Hima Patel
Ordering annotation sets for machine learning

Patent number: 11966453

Abstract: Embodiments are disclosed for a method. The method includes receiving an annotation set for a machine learning model. The annotation set includes multiple data points relevant to a task for the machine learning model. The method also includes determining total weights corresponding to the data points. The total weights are determined based on multiple ordering constraints indicating multiple data classes and corresponding weights. The corresponding weights represent a relative priority of the data classes with respect to each other. The method further includes generating an ordered annotation set from the annotation set. The ordered annotation set includes the data points in a sequence based on the determined total weights.

Type: Grant

Filed: February 15, 2021

Date of Patent: April 23, 2024

Assignee: International Business Machines Corporation

Inventors: Naveen Panwar, Anush Sankaran, Kuntal Dey, Hima Patel, Sameep Mehta
Data transformation methodology using generated program code and token mappings

Patent number: 11928126

Abstract: A computer implemented method transforms data. Responsive to receiving a data transformation of an input string to an output string, a computer system identifies mappable tokens in the input string that are mappable to the output string. The computer system creates a set of initial mappings for a set of common tokens in the mappable tokens. The set of initial mappings maps the set of common tokens from the input string to the output string. The computer system creates a set of user mappings that maps the mappable tokens from input string to the output string using a user input to the set of initial mappings. The computer system generates program code that transform input strings to output strings using the set of user mappings that maps the mappable tokens from input string to the output string, wherein the program code is used to transform input strings to output strings.

Type: Grant

Filed: August 22, 2022

Date of Patent: March 12, 2024

Assignee: International Business Machines Corporation

Inventors: Shanmukha Chaitanya Guttula, Pranay Kumar Lohia, Nitin Gupta, Hima Patel
DATA TRANSFORMATION METHODOLOGY USING GENERATED PROGRAM CODE AND TOKEN MAPPINGS

Publication number: 20240061858

Abstract: A computer implemented method transforms data. Responsive to receiving a data transformation of an input string to an output string, a computer system identifies mappable tokens in the input string that are mappable to the output string. The computer system creates a set of initial mappings for a set of common tokens in the mappable tokens. The set of initial mappings maps the set of common tokens from the input string to the output string. The computer system creates a set of user mappings that maps the mappable tokens from input string to the output string using a user input to the set of initial mappings. The computer system generates program code that transform input strings to output strings using the set of user mappings that maps the mappable tokens from input string to the output string, wherein the program code is used to transform input strings to output strings.

Type: Application

Filed: August 22, 2022

Publication date: February 22, 2024

Inventors: Shanmukha Chaitanya Guttula, Pranay Kumar Lohia, Nitin Gupta, Hima Patel
Training sample set generation from imbalanced data in view of user goals

Patent number: 11836219

Abstract: One embodiment provides a method, including: receiving a sample set for training a machine-learning model, wherein the sample set includes a plurality of classes, wherein classes within the plurality of classes have an imbalance in a number of samples; creating an enlarged minority class by generating new samples from the samples within the minority class and adding the new samples to the minority class; selecting subset samples from both the samples within the enlarged minority class and the majority class; weighting each of the subset samples based upon user input defining goals for attributes of a training sample set to be used in training the machine-learning model; and generating, using the neural network, the training sample set by re-running the selecting in view of the weighting.

Type: Grant

Filed: November 3, 2021

Date of Patent: December 5, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ruhi Sharma Mittal, Lokesh Nagalapatti, Hima Patel, Nitin Gupta
AUTOMATICALLY TRAINING AND IMPLEMENTING ARTIFICIAL INTELLIGENCE-BASED ANOMALY DETECTION MODELS

Publication number: 20230274160

Abstract: Methods, systems, and computer program products for automatically detecting periods of normal activity by analyzing observability data in IT operations environments are provided herein. A computer-implemented method includes obtaining multiple types of data related to one or more artificial intelligence-related information technology operations; modelling at least a portion of the obtained data as time series data; automatically identifying, from the time series data, one or more time periods associated with one or more given levels of data activity; and performing one or more automated actions, in at least one artificial intelligence-related information technology operations environment, based at least in part on the data corresponding to the one or more identified time periods.

Type: Application

Filed: February 28, 2022

Publication date: August 31, 2023

Inventors: Shashank Mujumdar, Hima Patel, Sambaran Bandyopadhyay, Pooja Aggarwal, Anbang Xu, Hau-Wen Chang, Harshit Kumar, Katherine Guo, Rama Kalyani T. Akkiraju, Gargi B. Dasgupta
PRIVACY-PRESERVING CLASS LABEL STANDARDIZATION IN FEDERATED LEARNING SETTINGS

Publication number: 20230177113

Abstract: Methods, systems, and computer program products for privacy-preserving class label standardization in federated learning settings are provided herein. A computer-implemented method includes determining, using one or more data privacy-preserving techniques, a signature for each of one or more classes of data for each of multiple client devices within a federated learning environment; identifying one or more signature matches across at least a portion of the multiple client devices; generating one or more class labels for the one or more classes of data associated with the one or more signature matches; labeling, across the at least a portion of the multiple client devices, the one or more classes of data associated with the one or more signature matches with the one or more generated class labels; and performing one or more automated actions based at least in part on the one or more labeled classes of data.

Type: Application

Filed: December 2, 2021

Publication date: June 8, 2023

Inventors: Shonda Adena Witherspoon, Ramasuri Narayanam, Hima Patel, Sameep Mehta
Data Transformations for Mapping Enterprise Applications

Publication number: 20230169070

Abstract: A computer implemented method, computer system, and computer program product for transforming mapped data fields of enterprise applications. A number of processor units receiving a matching from a source data field to a target data field. The set of processor units receiving a number of annotated examples of transformations from a source format to a target format. Based on the annotated examples, the set of processor units autogenerating a query language expression for transforming data items from the source format to the target format.

Type: Application

Filed: November 29, 2021

Publication date: June 1, 2023

Inventors: Ramkumar Ramalingam, Nagarjuna Surabathina, Thanmayi Mruthyunjaya, Nitin Gupta, Pranay Kumar Lohia, Shanmukha Chaitanya Guttula, Hima Patel, Sameep Mehta, Matu Agarwal, Mudit Mehrotra
TRAINING SAMPLE SET GENERATION FROM IMBALANCED DATA IN VIEW OF USER GOALS

Publication number: 20230136125

Abstract: One embodiment provides a method, including: receiving a sample set for training a machine-learning model, wherein the sample set includes a plurality of classes, wherein classes within the plurality of classes have an imbalance in a number of samples; creating an enlarged minority class by generating new samples from the samples within the minority class and adding the new samples to the minority class; selecting subset samples from both the samples within the enlarged minority class and the majority class; weighting each of the subset samples based upon user input defining goals for attributes of a training sample set to be used in training the machine-learning model; and generating, using the neural network, the training sample set by re-running the selecting in view of the weighting.

Type: Application

Filed: November 3, 2021

Publication date: May 4, 2023

Inventors: Ruhi Sharma Mittal, Lokesh Nagalapatti, Hima Patel, Nitin Gupta
AUTOMATICALLY IMPROVING DATA ANNOTATIONS BY PROCESSING ANNOTATION PROPERTIES AND USER FEEDBACK

Publication number: 20230106490

Abstract: Methods, systems, and computer program products for automatically improving data annotations by processing annotation properties and user feedback are provided herein. A computer-implemented method includes obtaining data annotation pairs, each comprising an input data annotation in a first format and a corresponding output data annotation in a second format; determining, within at least a portion of the data annotation pairs, one or more non-diffs; identifying, across the at least a portion of data annotation pairs, data annotation properties associated with multiple intents by processing the non-diffs using property-related rules; modifying at least a portion of the data annotation pairs based on the identified data annotation properties; outputting the modified data annotation pairs to at least one user; and generating a final collection of data annotation pairs by processing at least a portion of the modified data annotation pairs and user feedback received in response to the outputting.

Type: Application

Filed: October 6, 2021

Publication date: April 6, 2023

Inventors: Shanmukha Chaitanya Guttula, Nitin Gupta, Pranay Kumar Lohia, Hima Patel
Method and system for automatically detecting errors in at least one data entry using image maps

Patent number: 11580092

Abstract: A method for automatically detecting errors in at least one data entry in a database, the at least one data entry including an input string of characters that do not match at least one predefined string of characters. The method includes generating a first image map; generating at least one classification parameter by comparing the first image map to a second image map, the second image map based at least partially on the predefined string of characters; determining that the input string of characters correlates to the predefined string of characters; and modifying the at least one data entry to match the predefined string of characters in response to determining that the input string of characters correlates to the predefined string of characters. Various other methods and systems for automatically detecting errors in at least one data entry in a database are also disclosed.

Type: Grant

Filed: December 23, 2020

Date of Patent: February 14, 2023

Assignee: Visa International Service Association

Inventor: Hima Patel
FEDERATED DATA STANDARDIZATION USING DATA PRIVACY TECHNIQUES

Publication number: 20230021563

Abstract: Methods, systems, and computer program products for federated data standardization using data privacy techniques are provided herein. A computer-implemented method includes obtaining multiple datasets from multiple clients in accordance with one or more data privacy techniques; determining one or more similar data columns across at least a portion of the multiple datasets; generating one or more column labels for the one or more similar data columns; standardizing at least a portion of data within the one or more similar data columns by processing the one or more generated column labels using at least one federated learning technique; and performing one or more automated actions based at least in part on results of the standardizing of the at least a portion of data within the one or more similar data columns.

Type: Application

Filed: July 23, 2021

Publication date: January 26, 2023

Inventors: Ramasuri Narayanam, Hima Patel, Sameep Mehta
DATA QUALITY ASSESSMENT FOR UNSUPERVISED MACHINE LEARNING

Publication number: 20220405631

Abstract: Techniques for qualitatively assessing unlabeled data in an unsupervised machine learning environment are disclosed. In one example, a method comprises the following steps. A dataset of unlabeled data points is converted into a graph structure. Nodes of the graph structure represent the unlabeled data points in the dataset and weighted edges between at least a portion of the nodes represent similarity between the unlabeled data points represented by the nodes. A metric is computed for each node of the graph structure. A value generated by the metric for a given node represents a measure of dissimilarity between the corresponding unlabeled data point of the given node and one or more other unlabeled data points of one or more other nodes. A subset of the dataset is generated by removing one or more unlabeled data points from the dataset based on one or more values of the computed metric.

Type: Application

Filed: June 22, 2021

Publication date: December 22, 2022

Inventors: Ramasuri Narayanam, Hima Patel, Lokesh Nagalapatti, Ruhi Sharma Mittal

1 2 next