Patents by Inventor Sameep Mehta

Sameep Mehta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10936704
    Abstract: One embodiment provides a method, including: assigning a machine learning model signature to a machine learning model, wherein the machine learning model signature is generated using (i) data points and (ii) corresponding data labels from training data; receiving input comprising identification of a target machine learning model; acquiring a target signature for the target machine learning model by generating a signature for the target machine learning model using (i) data points from the assigned machine learning model signature and (ii) labels assigned to those data points by the target machine learning model; determining a stolen score by comparing the target signature to the machine learning model signature and identifying the number of data labels that match between the target signature and the machine learning model signature; and classifying the target machine learning model as stolen based upon the stolen score reaching a predetermined threshold.
    Type: Grant
    Filed: February 21, 2018
    Date of Patent: March 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sameep Mehta, Rakesh R. Pimplikar, Karibik Sankaranarayanan
  • Publication number: 20210035014
    Abstract: Aspects of the present invention provide an approach for reducing bias in active learning. In an embodiment, a data point is selected from a training dataset for a current training iteration while monitoring for data bias at each addition of data to a virtual training dataset. In addition, a machine learning model is examined for bias after adding the selected data point to the virtual training dataset. When data bias and/or model bias is detected, the data point is considered for potential label modification. The selected data point is modified and, if the raw value of the modified data point is within a predefined tolerance and within a bin of a desired class, the modified data point having a label of the target class is retained. Otherwise, it can be discarded.
    Type: Application
    Filed: July 31, 2019
    Publication date: February 4, 2021
    Inventors: Kuntal Dey, Sameep Mehta, Manish Anand Bhide
  • Publication number: 20210034698
    Abstract: One embodiment provides a method, including: receiving, from a client, (i) a task of annotating information, (ii) a set of instructions for performing the task, and (iii) client annotations for a subset of the information within the task; assigning the subset to a plurality of annotators; obtaining (i) annotator annotations for the subset and (ii) a response time for providing the annotator annotation for each piece of information within the subset; identifying improvements to the set of instructions by (i) comparing the annotator annotations to the client annotations and (ii) identifying discrepancies made by the annotators in view of the response time; and generating a new set of instructions, wherein the generating comprises (i) identifying at least one feature of the information that distinguishes correctly annotated information from incorrectly annotated information and (ii) generating an instruction from the at least one feature.
    Type: Application
    Filed: July 31, 2019
    Publication date: February 4, 2021
    Inventors: Shashank Mujumdar, Nitin Gupta, Arvind Agarwal, Sameep Mehta
  • Publication number: 20210012404
    Abstract: Methods, systems, and computer program products for generating a framework for prioritizing machine learning model offerings via a platform are provided herein. A computer-implemented method includes processing, via a computing platform, a machine learning model input by a first user and metadata corresponding to the machine learning model input by the first user; automatically comparing, via the computing platform, the metadata corresponding to the machine learning model with metadata corresponding to one or more existing machine learning models stored by the computing platform; automatically calculating, via the computing platform, initial pricing information for the machine learning model based on the comparison; and outputting, via an interactive user interface of the computing platform, the machine learning model to one or more additional users for purchase in accordance with the calculated initial pricing information.
    Type: Application
    Filed: July 8, 2019
    Publication date: January 14, 2021
    Inventors: Kalapriya Kannan, Samiulla Zakir Hussain Shaikh, Pranay Kumar Lohia, Vijay Arya, Sameep Mehta
  • Patent number: 10885571
    Abstract: One embodiment provides a method, including: receiving, at a data service provider, a request from an information purchaser, wherein the request comprises (i) a budget identifying an amount of money to be spent on information and (ii) an objective function identifying a type of information that the information purchaser is requesting; accessing at least a subset of at least one information set of at least one information seller, wherein each of the at least one information sets comprises an information set available for purchase from the information seller; identifying whether at least one accessed information set that fulfills the received request; and providing, if at least one accessed information set fulfills the received request, a recommendation of an information set for purchase by the information purchaser, wherein the provided recommendation comprises at least one of the identified information sets that fulfills the received request.
    Type: Grant
    Filed: May 16, 2018
    Date of Patent: January 5, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Akshar Kaul, Manish Kesarwani, Gagandeep Singh, Sameep Mehta
  • Publication number: 20200387518
    Abstract: One embodiment provides a method, including: receiving, from a user, a dataset for encryption before its storage at a data storage location, wherein the dataset comprises a plurality of portions; identifying (i) attributes of the dataset and (ii) dataset dependencies; generating a recommendation for an encryption scheme to be used for the dataset, wherein the generating comprises (i) generating, based upon the attributes and the dataset dependencies, a recommendation of an encryption scheme for each portion of the dataset and (ii) identifying, based upon the dataset dependencies, a key label for each portion of the dataset, wherein the key label identified for a portion of the dataset that is dependent on another portion of the dataset is the same as the key label identified for said another portion of the dataset; and providing, to the user, (i) the generated recommendation and (ii) a description identifying reasons for the generated recommendation.
    Type: Application
    Filed: June 6, 2019
    Publication date: December 10, 2020
    Inventors: Manish Kesarwani, Akshar Kaul, Gagandeep Singh, Sameep Mehta, Hong Min, James Willis Pickel
  • Publication number: 20200380367
    Abstract: A method, computer system, and a computer program product for generating deep learning model insights using provenance data is provided. Embodiments of the present invention may include collecting provenance data. Embodiments of the present invention may include generating model insights based on the collected provenance data. Embodiments of the present invention may include generating a training model based on the generated model insights. Embodiments of the present invention may include reducing the training model size. Embodiments of the present invention may include creating a final trained model.
    Type: Application
    Filed: June 3, 2019
    Publication date: December 3, 2020
    Inventors: Nitin Gupta, HIMANSHU GUPTA, Rajmohan Chandrahasan, Sameep Mehta, Pranay Kumar Lohia
  • Publication number: 20200372056
    Abstract: A processor may receive a record. The record may include one or more segments of text. The processor may tag each segment of text with an indicator. The indicator may denote a specific instance of bias in each of a respective segment of text. The processor may automatically generate a summary of the record. The summary of the record may include a set of segments of text. The set of segments of text may have a different overall bias than the record. The processor may display the summary of the record to a user.
    Type: Application
    Filed: May 23, 2019
    Publication date: November 26, 2020
    Inventors: Manish Anand Bhide, Kuntal Dey, Nishtha Madaan, Seema Nagar, Sameep Mehta
  • Publication number: 20200372101
    Abstract: A processor may receive a record. The record may include one or more segments of text. The processor may automatically generate a first summary of the record. The processor may determine an overall bias of the first summary. The overall bias of the first summary may be identified from one or more instances of bias in the first summary. The processor may generate a second summary of the record. The second summary of the record may include an indicator of the overall bias of the first summary. The indicator may include a description of a type of overall bias of the first summary and a numerical value of the overall bias of the first summary. The processor may determine an overall bias of the second summary. The processor may display the second summary of the record to a user.
    Type: Application
    Filed: May 23, 2019
    Publication date: November 26, 2020
    Inventors: Manish Anand Bhide, Kuntal Dey, Nishtha Madaan, Seema Nagar, Sameep Mehta
  • Publication number: 20200372398
    Abstract: A method, computer system, and a computer program product for utilizing provenance data to improve machine learning is provided. Embodiments of the present invention may include collecting provenance data. Embodiments of the present invention may include identifying model quality improvements based on the collected provenance data. Embodiments of the present invention may include identifying related models based on the collected provenance data. Embodiments of the present invention may include recommending model quality improvements to a user.
    Type: Application
    Filed: May 22, 2019
    Publication date: November 26, 2020
    Inventors: Samiulla Zakir Hussain Shaikh, HIMANSHU GUPTA, Rajmohan Chandrahasan, Sameep Mehta, Manish Anand Bhide
  • Publication number: 20200364235
    Abstract: One embodiment provides a method, including: receiving, from a user, (i) a dataset and (ii) an intended output from the dataset that is generated in view of a given analytical framework for the dataset, wherein the intended output identifies an output that the user wants from the dataset and wherein the dataset is related to an analytical domain; identifying a plurality of dataset functions related to the analytical domain; determining one or more dataset functions for each of one or more operations identified, wherein the one or more operations are identified using the repository to identify operations used to result in an intended output similar to the received intended output; and recommending an ordered subset of the one or more dataset functions to be used to transform the dataset to the intended output, wherein the ordered subset comprises (i) one dataset function for each of the one or more operations and (ii) an order for performing the one or more operations.
    Type: Application
    Filed: May 14, 2019
    Publication date: November 19, 2020
    Inventors: Kalapriya Kannan, Sameep Mehta
  • Patent number: 10824755
    Abstract: One embodiment provides a method, including: receiving, at a third-party storage provider and from a data owner, a plurality of encrypted documents, wherein each of the plurality of encrypted documents is encrypted by the data owner using at least one encryption key; receiving, from a query user, an encrypted query, wherein the query is encrypted using the at least one encryption key; computing an edit distance value between the encrypted query and at least a portion of the plurality of encrypted documents, wherein the computing comprises communicating with an entity to work together to compute the edit distance value; the communicating comprising (i) providing, from the third-party storage provider to the entity, an encrypted function of an edit distance matrix and (ii) receiving an encrypted edit distance value computed by the entity from the encrypted function; and returning the encrypted edit distance value to the query user.
    Type: Grant
    Filed: October 11, 2018
    Date of Patent: November 3, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Akshar Kaul, Sameep Mehta, Shashank Srivastava
  • Patent number: 10824721
    Abstract: One embodiment provides a method for delaying malicious attacks on machine learning models that a trained using input captured from a plurality of users, including: deploying a model, said model designed to be used with an application, for responding to requests received from users, wherein the model comprises a machine learning model that has been previously trained using a data set; receiving input from one or more users; determining, using a malicious input detection technique, if the received input comprises malicious input; if the received input comprises malicious input, removing the malicious input from the input to be used to retrain the model; retraining the model using received input that is determined to not be malicious input; and providing, using the retrained model, a response to a received user query, the retrained model delaying the effect of malicious input on provided responses by removing malicious input from retraining input.
    Type: Grant
    Filed: May 22, 2018
    Date of Patent: November 3, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Manish Kesarwani, Atul Kumar, Vijay Arya, Rakesh R. Pimplikar, Sameep Mehta
  • Publication number: 20200327424
    Abstract: One embodiment provides a method, including: receiving a target unstructured document for determining whether the target unstructured document comprises biased information; identifying an objective of the target unstructured document by extracting, from the target unstructured document, (i) entities and (ii) relationships between the entities; creating a structured knowledge base, wherein the creating comprises (i) creating an entry in the structured knowledge base corresponding to the target unstructured document, (ii) identifying other unstructured documents having a similarity to the target unstructured document, and (iii) generating an entry in the structured knowledge base corresponding to each of the other unstructured documents; applying a bias detection technique on the structured knowledge base; and providing an indication of whether the target unstructured document comprises bias.
    Type: Application
    Filed: April 15, 2019
    Publication date: October 15, 2020
    Inventors: Pranay Kumar Lohia, Rajmohan Chandrahasan, Himanshu Gupta, Samiulla Zakir Hussain Shaikh, Sameep Mehta, Atul Kumar
  • Publication number: 20200302005
    Abstract: An article is automatically augmented. The article and one or more comments are received. Comment elements are extracted from the one or more comments, and article elements are extracted from the article. Alignment scores are generated for comment-article pairs based on the extracted comment and article elements. Further, it is determined that at least one comment-article pair has an alignment score at or above a threshold alignment score. At least one augmentation feature is then generated.
    Type: Application
    Filed: March 22, 2019
    Publication date: September 24, 2020
    Inventors: Manish Anand Bhide, Nishtha Madaan, Seema Nagar, Sameep Mehta, Kuntal Dey
  • Publication number: 20200302006
    Abstract: An article is automatically augmented. The article and one or more comments are received. Comment elements are extracted from the one or more comments, and article elements are extracted from the article. Alignment scores are generated for comment-article pairs based on the extracted comment and article elements. Further, it is determined that at least one comment-article pair has an alignment score at or above a threshold alignment score. At least one augmentation feature is then generated.
    Type: Application
    Filed: July 15, 2019
    Publication date: September 24, 2020
    Inventors: Manish Anand Bhide, Nishtha Madaan, Seema Nagar, Sameep Mehta, Kuntal Dey
  • Patent number: 10783161
    Abstract: A method includes determining, by a controller, a portion of data that is selected by a user. The portion of data includes source data that is to be transformed by at least one shaping function. The method also includes generating, by the controller, a first output recommendation data that communicates at least one recommended shaping function to apply to the portion of data. The first output recommendation data is generated based on patterns of shaping functions that have been previously chosen. The patterns of shaping functions that have been previously chosen can be chosen by a plurality of system users. The method also includes determining whether to apply the at least one recommended shaping function to the portion of data. The method also includes applying the at least one recommended shaping function based on the determining.
    Type: Grant
    Filed: December 15, 2017
    Date of Patent: September 22, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Manish Bhide, Shabharesh Gudla, Sameep Mehta, Prishni Rateria, Samiulla Shaikh, Neelesh K. Shukla, Paul S. Taylor
  • Patent number: 10740209
    Abstract: Methods, systems, and computer program products for tracking missing data using provenance traces and data simulation are provided herein. A computer-implemented method includes generating, for each of multiple stages in a data curation sequence, a machine learning model of the data curation sequence, wherein the model is based on historical input records within the data curation sequence, historical output records within the data curation sequence, and provenance data within the data curation sequence; creating a simulated output record based on a detected anomaly corresponding to the data curation sequence; predicting the content of absent input records that precede the simulated output record in the data curation sequence and provenance data corresponding to the simulated output record; and outputting, to a user, in response to a query pertaining to the detected anomaly, the predicted input records and information relating the predicted input records to the detected anomaly.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: August 11, 2020
    Assignee: International Business Machines Corporation
    Inventors: Salil Joshi, Hima Prasad Karanam, Manish Kesarwani, Sameep Mehta
  • Patent number: 10742401
    Abstract: One embodiment provides a method, including: receiving, from a data owner, an input string of plaintext data comprising a plurality of characters for storage in a database of a third-party storage provider; arranging the plurality of characters of the input string as a half pyramid, wherein the half pyramid comprises a plurality of rows, each row comprising at least one more character than a preceding row; encrypting, using a secure encryption scheme and based upon a key, each row of the half pyramid independently from each other row of the half pyramid; and storing, in the database of the third-party storage provider, the encrypted rows of the half pyramid. Other aspects are claimed and described.
    Type: Grant
    Filed: December 19, 2017
    Date of Patent: August 11, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Akshar Kaul, Manish Kesarwani, Sameep Mehta, Prasad G. Naldurg, Gagandeep Singh
  • Publication number: 20200250264
    Abstract: Embodiments provide a computer implemented method in a data processing system comprising a processor and a memory comprising instructions, which are executed by the processor to cause the processor to implement the method of removing a cognitive terminology from a news article at a news portal, the method including: receiving, by the processor, a first news article from a user; configuring, by the processor, a cognitive terminology filter list to add one or more entities and one or more cognitive terminology types associated with each entity in the cognitive terminology filter list; dividing, by the processor, the first news article into a plurality of text segments; identifying, by the processor, one or more key entities and one or more inter-entity relationships of each text segment; detecting, by the processor, one or more cognitive terminologies in the first news article; and providing, by the processor, one or more suggestions to remove the one or more cognitive terminologies.
    Type: Application
    Filed: January 31, 2019
    Publication date: August 6, 2020
    Inventors: Manish A. Bhide, Sameep Mehta, Nishtha Madaan, Kuntal Dey