Patents by Inventor Hima Patel
Hima Patel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250139068Abstract: A cleansing operation is performed on data of a data structure to obtain clean data. The clean data is stored as part of the data structure; however, the clean data is independent of the data. A mapping is performed to provide a set of ordered data that includes the data and the clean data.Type: ApplicationFiled: October 27, 2023Publication date: May 1, 2025Inventors: Pedro Miguel BARBAS, Shaikh Shahriar QUADER, Adrian MAHJOUR, Hima PATEL, Nitin GUPTA
-
Publication number: 20250139069Abstract: A cleansing operation defined for a data structure of a database managed by a database management system is obtained. The cleansing operation is performed on data of the data structure to obtain clean data. The cleansing operation that is defined for the data structure and performed on data of the data structure is performed by the database management system.Type: ApplicationFiled: October 27, 2023Publication date: May 1, 2025Inventors: Pedro Miguel BARBAS, Shaikh Shahriar QUADER, Adrian MAHJOUR, Hima PATEL, Nitin GUPTA
-
Patent number: 12242797Abstract: Processing within a computing environment is facilitated using a corpus processing system to assess and enhance quality of a corpus of unstructured documents for a specified task. The processing includes referencing, by a corpus processing engine, the corpus of unstructured documents to obtain unstructured document data, and applying, by a corpus quality metrics engine, a set of quality metrics to the document data to obtain a set of quality metric scores. Further, the process includes automatically selecting, by a quality metric selection engine, a subset of task-relevant quality metrics using the quality metric scores and the specified task, and automatically transforming, at least in part, multiple documents of the corpus to remediate one or more identified issues with the documents. The automatically transforming results in remediated documents tuned for the specified task, which are provided for the specified task to be performed.Type: GrantFiled: February 6, 2023Date of Patent: March 4, 2025Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Shashank Mujumdar, Vitobha Munigala, Hima Patel
-
Patent number: 12190215Abstract: Automatically selecting data for machine learning datasets is provided. The method comprises receiving an input dataset and user-specified data quality metrics. The input dataset is matched to a subset of candidate datasets in a repository according to schema characteristics. A second subset of candidate datasets having a distance from the input dataset above a specified threshold is selected from the first subset of candidate datasets. The second subset of candidate datasets are merged into a merged dataset. Top ranked samples above a specified second threshold are identified from the merged dataset based on the user-specified data quality metrics. The input dataset, augmented with the top ranked samples, is returned to the user.Type: GrantFiled: October 25, 2023Date of Patent: January 7, 2025Assignee: International Business Machines CorporationInventors: Nitin Gupta, Shashank Mujumdar, Ruhi Sharma Mittal, Hima Patel
-
Publication number: 20240411750Abstract: A functionality intent is extracted from a natural language input, the functionality intent comprising an operation on a dataset. A portion of source code implementing the functionality intent is generated. Using a result of executing an executable version of the portion of source code on the dataset, a next functionality intent is recommended, the next functionality intent expressed in natural language form.Type: ApplicationFiled: June 6, 2023Publication date: December 12, 2024Applicant: International Business Machines CorporationInventors: Vitobha Munigala, Shanmukha Chaitanya Guttula, Hima Patel
-
Publication number: 20240265196Abstract: Processing within a computing environment is facilitated using a corpus processing system to assess and enhance quality of a corpus of unstructured documents for a specified task. The processing includes referencing, by a corpus processing engine, the corpus of unstructured documents to obtain unstructured document data, and applying, by a corpus quality metrics engine, a set of quality metrics to the document data to obtain a set of quality metric scores. Further, the process includes automatically selecting, by a quality metric selection engine, a subset of task-relevant quality metrics using the quality metric scores and the specified task, and automatically transforming, at least in part, multiple documents of the corpus to remediate one or more identified issues with the documents. The automatically transforming results in remediated documents tuned for the specified task, which are provided for the specified task to be performed.Type: ApplicationFiled: February 6, 2023Publication date: August 8, 2024Inventors: Shashank MUJUMDAR, Vitobha MUNIGALA, Hima PATEL
-
Publication number: 20240202573Abstract: A method, computer program product, and computer system for transforming sets of source data having different formats into respective sets of target data having a same format. N source patterns are determined and respectively describe N different formats in which N sets of source data items are formatted, where N?1. A target format pattern is determined and describes a target format in which a target data items are formatted. N graphs are generated and respectively describe transformations of the N source patterns to the target pattern. Each graph includes multiple transformation paths. Each transformation path transforms the source pattern to the target pattern in a manner that maps source strings in the source pattern to each target string in the target pattern. A single transformation path is selected from the multiple transformation paths resulting in N single transformation paths having been selected.Type: ApplicationFiled: December 19, 2022Publication date: June 20, 2024Inventors: Nagarjuna Surabathina, Nitin Gupta, Shramona Chakraborty, Hima Patel, Sameep Mehta, Ramkumar Ramalingam, Matu Agarwal
-
Publication number: 20240193166Abstract: A method, computer program, and computer system are provided for collecting and annotating data based on user preference. Unlabeled data corresponding to one or more entries within a dataset is received. Pseudo-labeled data is generated based on the unlabeled data. Based on one or more quality metrics, each entry from among the pseudo-labeled data is determining to be included within a final dataset. A user is prompted for annotations corresponding to entries of the pseudo-labeled data included within the final dataset. A determination is made as to whether additional data is needed based on comparing the final dataset to the one or more quality metrics, and the additional information is collected if the final dataset does not meet the quality metrics.Type: ApplicationFiled: December 9, 2022Publication date: June 13, 2024Inventors: Shashank Mujumdar, Ruhi Sharma Mittal, Nitin Gupta, Hima Patel
-
Patent number: 11966453Abstract: Embodiments are disclosed for a method. The method includes receiving an annotation set for a machine learning model. The annotation set includes multiple data points relevant to a task for the machine learning model. The method also includes determining total weights corresponding to the data points. The total weights are determined based on multiple ordering constraints indicating multiple data classes and corresponding weights. The corresponding weights represent a relative priority of the data classes with respect to each other. The method further includes generating an ordered annotation set from the annotation set. The ordered annotation set includes the data points in a sequence based on the determined total weights.Type: GrantFiled: February 15, 2021Date of Patent: April 23, 2024Assignee: International Business Machines CorporationInventors: Naveen Panwar, Anush Sankaran, Kuntal Dey, Hima Patel, Sameep Mehta
-
Patent number: 11928126Abstract: A computer implemented method transforms data. Responsive to receiving a data transformation of an input string to an output string, a computer system identifies mappable tokens in the input string that are mappable to the output string. The computer system creates a set of initial mappings for a set of common tokens in the mappable tokens. The set of initial mappings maps the set of common tokens from the input string to the output string. The computer system creates a set of user mappings that maps the mappable tokens from input string to the output string using a user input to the set of initial mappings. The computer system generates program code that transform input strings to output strings using the set of user mappings that maps the mappable tokens from input string to the output string, wherein the program code is used to transform input strings to output strings.Type: GrantFiled: August 22, 2022Date of Patent: March 12, 2024Assignee: International Business Machines CorporationInventors: Shanmukha Chaitanya Guttula, Pranay Kumar Lohia, Nitin Gupta, Hima Patel
-
Publication number: 20240061858Abstract: A computer implemented method transforms data. Responsive to receiving a data transformation of an input string to an output string, a computer system identifies mappable tokens in the input string that are mappable to the output string. The computer system creates a set of initial mappings for a set of common tokens in the mappable tokens. The set of initial mappings maps the set of common tokens from the input string to the output string. The computer system creates a set of user mappings that maps the mappable tokens from input string to the output string using a user input to the set of initial mappings. The computer system generates program code that transform input strings to output strings using the set of user mappings that maps the mappable tokens from input string to the output string, wherein the program code is used to transform input strings to output strings.Type: ApplicationFiled: August 22, 2022Publication date: February 22, 2024Inventors: Shanmukha Chaitanya Guttula, Pranay Kumar Lohia, Nitin Gupta, Hima Patel
-
Patent number: 11836219Abstract: One embodiment provides a method, including: receiving a sample set for training a machine-learning model, wherein the sample set includes a plurality of classes, wherein classes within the plurality of classes have an imbalance in a number of samples; creating an enlarged minority class by generating new samples from the samples within the minority class and adding the new samples to the minority class; selecting subset samples from both the samples within the enlarged minority class and the majority class; weighting each of the subset samples based upon user input defining goals for attributes of a training sample set to be used in training the machine-learning model; and generating, using the neural network, the training sample set by re-running the selecting in view of the weighting.Type: GrantFiled: November 3, 2021Date of Patent: December 5, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ruhi Sharma Mittal, Lokesh Nagalapatti, Hima Patel, Nitin Gupta
-
Publication number: 20230274160Abstract: Methods, systems, and computer program products for automatically detecting periods of normal activity by analyzing observability data in IT operations environments are provided herein. A computer-implemented method includes obtaining multiple types of data related to one or more artificial intelligence-related information technology operations; modelling at least a portion of the obtained data as time series data; automatically identifying, from the time series data, one or more time periods associated with one or more given levels of data activity; and performing one or more automated actions, in at least one artificial intelligence-related information technology operations environment, based at least in part on the data corresponding to the one or more identified time periods.Type: ApplicationFiled: February 28, 2022Publication date: August 31, 2023Inventors: Shashank Mujumdar, Hima Patel, Sambaran Bandyopadhyay, Pooja Aggarwal, Anbang Xu, Hau-Wen Chang, Harshit Kumar, Katherine Guo, Rama Kalyani T. Akkiraju, Gargi B. Dasgupta
-
Publication number: 20230177113Abstract: Methods, systems, and computer program products for privacy-preserving class label standardization in federated learning settings are provided herein. A computer-implemented method includes determining, using one or more data privacy-preserving techniques, a signature for each of one or more classes of data for each of multiple client devices within a federated learning environment; identifying one or more signature matches across at least a portion of the multiple client devices; generating one or more class labels for the one or more classes of data associated with the one or more signature matches; labeling, across the at least a portion of the multiple client devices, the one or more classes of data associated with the one or more signature matches with the one or more generated class labels; and performing one or more automated actions based at least in part on the one or more labeled classes of data.Type: ApplicationFiled: December 2, 2021Publication date: June 8, 2023Inventors: Shonda Adena Witherspoon, Ramasuri Narayanam, Hima Patel, Sameep Mehta
-
Publication number: 20230169070Abstract: A computer implemented method, computer system, and computer program product for transforming mapped data fields of enterprise applications. A number of processor units receiving a matching from a source data field to a target data field. The set of processor units receiving a number of annotated examples of transformations from a source format to a target format. Based on the annotated examples, the set of processor units autogenerating a query language expression for transforming data items from the source format to the target format.Type: ApplicationFiled: November 29, 2021Publication date: June 1, 2023Inventors: Ramkumar Ramalingam, Nagarjuna Surabathina, Thanmayi Mruthyunjaya, Nitin Gupta, Pranay Kumar Lohia, Shanmukha Chaitanya Guttula, Hima Patel, Sameep Mehta, Matu Agarwal, Mudit Mehrotra
-
Publication number: 20230136125Abstract: One embodiment provides a method, including: receiving a sample set for training a machine-learning model, wherein the sample set includes a plurality of classes, wherein classes within the plurality of classes have an imbalance in a number of samples; creating an enlarged minority class by generating new samples from the samples within the minority class and adding the new samples to the minority class; selecting subset samples from both the samples within the enlarged minority class and the majority class; weighting each of the subset samples based upon user input defining goals for attributes of a training sample set to be used in training the machine-learning model; and generating, using the neural network, the training sample set by re-running the selecting in view of the weighting.Type: ApplicationFiled: November 3, 2021Publication date: May 4, 2023Inventors: Ruhi Sharma Mittal, Lokesh Nagalapatti, Hima Patel, Nitin Gupta
-
Publication number: 20230106490Abstract: Methods, systems, and computer program products for automatically improving data annotations by processing annotation properties and user feedback are provided herein. A computer-implemented method includes obtaining data annotation pairs, each comprising an input data annotation in a first format and a corresponding output data annotation in a second format; determining, within at least a portion of the data annotation pairs, one or more non-diffs; identifying, across the at least a portion of data annotation pairs, data annotation properties associated with multiple intents by processing the non-diffs using property-related rules; modifying at least a portion of the data annotation pairs based on the identified data annotation properties; outputting the modified data annotation pairs to at least one user; and generating a final collection of data annotation pairs by processing at least a portion of the modified data annotation pairs and user feedback received in response to the outputting.Type: ApplicationFiled: October 6, 2021Publication date: April 6, 2023Inventors: Shanmukha Chaitanya Guttula, Nitin Gupta, Pranay Kumar Lohia, Hima Patel
-
Patent number: 11580092Abstract: A method for automatically detecting errors in at least one data entry in a database, the at least one data entry including an input string of characters that do not match at least one predefined string of characters. The method includes generating a first image map; generating at least one classification parameter by comparing the first image map to a second image map, the second image map based at least partially on the predefined string of characters; determining that the input string of characters correlates to the predefined string of characters; and modifying the at least one data entry to match the predefined string of characters in response to determining that the input string of characters correlates to the predefined string of characters. Various other methods and systems for automatically detecting errors in at least one data entry in a database are also disclosed.Type: GrantFiled: December 23, 2020Date of Patent: February 14, 2023Assignee: Visa International Service AssociationInventor: Hima Patel
-
Publication number: 20230021563Abstract: Methods, systems, and computer program products for federated data standardization using data privacy techniques are provided herein. A computer-implemented method includes obtaining multiple datasets from multiple clients in accordance with one or more data privacy techniques; determining one or more similar data columns across at least a portion of the multiple datasets; generating one or more column labels for the one or more similar data columns; standardizing at least a portion of data within the one or more similar data columns by processing the one or more generated column labels using at least one federated learning technique; and performing one or more automated actions based at least in part on results of the standardizing of the at least a portion of data within the one or more similar data columns.Type: ApplicationFiled: July 23, 2021Publication date: January 26, 2023Inventors: Ramasuri Narayanam, Hima Patel, Sameep Mehta
-
Publication number: 20220405631Abstract: Techniques for qualitatively assessing unlabeled data in an unsupervised machine learning environment are disclosed. In one example, a method comprises the following steps. A dataset of unlabeled data points is converted into a graph structure. Nodes of the graph structure represent the unlabeled data points in the dataset and weighted edges between at least a portion of the nodes represent similarity between the unlabeled data points represented by the nodes. A metric is computed for each node of the graph structure. A value generated by the metric for a given node represents a measure of dissimilarity between the corresponding unlabeled data point of the given node and one or more other unlabeled data points of one or more other nodes. A subset of the dataset is generated by removing one or more unlabeled data points from the dataset based on one or more values of the computed metric.Type: ApplicationFiled: June 22, 2021Publication date: December 22, 2022Inventors: Ramasuri Narayanam, Hima Patel, Lokesh Nagalapatti, Ruhi Sharma Mittal