Patents by Inventor Rajesh Bordawekar

Rajesh Bordawekar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240257164
    Abstract: Tokenized rows of a training portion of a database are selected, each of the selected tokenized rows having a first token value stored in a first column of the database. Training row vectors are grouped into clusters. From the clusters, prototypes are generated, each prototype comprising a numerical representation of a cluster. From input tokens, an input row vector is generated, the input row vector comprising a numerical representation of input tokens representing data in an input row of the database, the input row excluded from the training portion, each input token comprising a textual representation of data in a cell of the input row. Based on similarity with the input row vector, a prototype is selected. Data derived from the selected prototype is inserted into the first column of the input row.
    Type: Application
    Filed: January 31, 2023
    Publication date: August 1, 2024
    Applicant: International Business Machines Corporation
    Inventors: Matthew Harrison Tong, Apoorva Nitsure, Rajesh Bordawekar
  • Publication number: 20240220488
    Abstract: A count of unique values in a column of a database table is determined. A query on the database table is performed, wherein a technique for performing the query is selected based on the count of unique values.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Inventors: Rajesh Bordawekar, Jose Luis Pontes Correia Neves, Apoorva Nitsure
  • Patent number: 12026462
    Abstract: Methods, systems and computer program products for determining recommended parameters for use in generating a word embedding model are provided. Aspects include storing a plurality of meaningful test cases. Each meaningful test case includes a test data profile and one or more test model parameters used to create a word embedding model that has been classified as yielding meaningful results. Aspects include receiving a production data set to be used in generating a new word embedding model. The production data set includes data stored in a relational database having a plurality of columns and a plurality of rows. Aspects include generating a data profile associated with the production data set. Aspects include generating a recommendation for one or more production model parameters for use in building a word embedding model based on the data profile associated with the production data set and the plurality of meaningful test cases.
    Type: Grant
    Filed: November 29, 2018
    Date of Patent: July 2, 2024
    Assignee: International Business Machines Corporation
    Inventors: Thomas Conti, Rajesh Bordawekar, Stephen Warren, Christopher Harding, Jose Neves
  • Publication number: 20240202214
    Abstract: Clustering data points of a relational database having special data types is performed by establishing logarithmic bins in which the data is collected. Special data types include (i) zero; (ii) positive and negative values; (iii) infinity (positive and negative); (iv) not-a-number values (NaNs); (v) out-of-range values; and (vi) IEEE DECFloat (decimal floating-point) values. The numerical data is mapped to bins according to their values and redistributed among the bins based on median bin value. An occupancy-based partitioning process assures each bin has no more than a pre-defined threshold percentage of the data. Assigning data bins to clusters facilitates prediction of placement of input values into a particular cluster for response to database queries.
    Type: Application
    Filed: December 19, 2022
    Publication date: June 20, 2024
    Inventor: Rajesh Bordawekar
  • Publication number: 20240126767
    Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to a process to interpret results of a semantic clustering Structured Query Language (SQL) Cognitive Intelligence (CI) query. A system can comprise a memory that stores computer executable components, and a processor that executes the computer executable components stored in the memory, wherein the computer executable components can comprise an interpretability component that can identify dominant traits of a query input to determine a ranking of query results by identifying influential tokens of the query input based on data statistics and observing the dominant traits in influential tokens of a query output. In one or more embodiments, the interpretability component can identify dominant traits of the query input by incorporating co-occurrence measurements.
    Type: Application
    Filed: October 12, 2022
    Publication date: April 18, 2024
    Inventors: Apoorva Nitsure, Rajesh Bordawekar
  • Patent number: 11948056
    Abstract: Data-parallel ensemble training using gradient boosted trees includes training an ensemble of trees. The training includes splitting a training dataset into several data portions. Each data portion is assigned to each thread group from a set of thread groups. The training further includes executing a stage, in which each thread group, in parallel, trains a respective ensemble of decision trees. Executing the stage includes performing, by each thread group, in parallel, machine learning operations for the respective ensemble of decision trees using the data portion assigned to each thread group. Further, each thread group validates, in parallel, the respective ensemble of decision trees using a data portion assigned to another thread group. Execution of the stage is repeated until a predetermined threshold is satisfied. Further, a prediction is inferenced using the ensemble of decision trees that is formed using the respective ensemble of trees from each of the thread groups.
    Type: Grant
    Filed: December 8, 2020
    Date of Patent: April 2, 2024
    Assignee: International Business Machines Corporation
    Inventors: Rajesh Bordawekar, Tin Kam Ho
  • Patent number: 11934401
    Abstract: Systems, computer-implemented methods or computer program products to facilitate receiving results of a semantic structured query language (SQL) query and employing sparse hash-table based sketches to interpret a semantic structured query language (SQL) query result. A computing component stores a first space-efficient structure sketch in a compressed serialize form. The computing component can load a second space-efficient data structure sketch along with the first space-efficient data structure sketch and can compute one or more interpretability scores by extracting co-occurrence information from the first space-efficient data structure sketch. The second space-efficient data structure sketch can include a sketch for containment check.
    Type: Grant
    Filed: August 4, 2022
    Date of Patent: March 19, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rajesh Bordawekar, Prabhakar Kudva
  • Publication number: 20240045866
    Abstract: Systems, computer-implemented methods or computer program products to facilitate receiving results of a semantic structured query language (SQL) query and employing sparse hash-table based sketches to interpret a semantic structured query language (SQL) query result. A computing component stores a first space-efficient structure sketch in a compressed serialize form. The computing component can load a second space-efficient data structure sketch along with the first space-efficient data structure sketch and can compute one or more interpretability scores by extracting co-occurrence information from the first space-efficient data structure sketch. The second space-efficient data structure sketch can include a sketch for containment check.
    Type: Application
    Filed: August 4, 2022
    Publication date: February 8, 2024
    Inventors: Rajesh Bordawekar, Prabhakar Kudva
  • Patent number: 11847113
    Abstract: A system, apparatus, and a method for training with multi-modal data in a relational database, including generating a first database including a multi-view of the multi-modal data, retrieving a second set of data from an external source via a network, and training a first model according the first database and the second set of data. The first model outputs relationships of the first database with the multi-view and the second set of data.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: December 19, 2023
    Assignee: International Business Machines Corporation
    Inventors: Rajesh Bordawekar, Bortik Bandyopadhyay
  • Patent number: 11741099
    Abstract: A computer-implemented method of performing queries using Artificial Intelligence (AI) database embeddings includes the operations of generating a plurality of vector embeddings describing a training data from a database for training a machine learning model. A test vector embedding is generated from the plurality of vector embeddings based on training data for unseen data from one or more rows of the database. One or more vectors from the plurality of vector embeddings describing the training data that are a closest match to the test vector embedding are identified. A task is determined based upon the unseen data. The determined task is performed using the trained machine learning model.
    Type: Grant
    Filed: February 28, 2021
    Date of Patent: August 29, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rajesh Bordawekar, Apoorva Nitsure
  • Publication number: 20230185788
    Abstract: A computer-implemented method accelerates cognitive intelligence queries to a database using semantic keys. A computing device receives a database. The computing device inserts one or more key-value pairs into the database, where the key is a semantic key that is generated from a binary codeword built from a locality sensitive hashing of one or more vectors in a database embedding model of the database, and where the value is a tuple in the database that identifies entries in the database that share predefined features. The computing device uses the one or more key-value pairs for accelerating cognitive intelligence queries to the database.
    Type: Application
    Filed: December 9, 2021
    Publication date: June 15, 2023
    Inventor: RAJESH BORDAWEKAR
  • Patent number: 11650987
    Abstract: From a first attribute-value pair in a record, new data is created including a first token. Using a first model and using a processor and a memory, each token is vectorized into new data including a corresponding vector. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value for which a semantic similarity computation is to be performed. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. The set of most similar rows is used to compute a response to a database query.
    Type: Grant
    Filed: January 2, 2019
    Date of Patent: May 16, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rajesh Bordawekar, Jose Neves
  • Publication number: 20220277008
    Abstract: A computer-implemented method of performing queries using Artificial Intelligence (AI) database embeddings includes the operations of generating a plurality of vector embeddings describing a training data from a database for training a machine learning model. A test vector embedding is generated from the plurality of vector embeddings based on training data for unseen data from one or more rows of the database. One or more vectors from the plurality of vector embeddings describing the training data that are a closest match to the test vector embedding are identified. A task is determined based upon the unseen data. The determined task is performed using the trained machine learning model.
    Type: Application
    Filed: February 28, 2021
    Publication date: September 1, 2022
    Inventors: Rajesh Bordawekar, Apoorva Nitsure
  • Patent number: 11429579
    Abstract: A computer-implemented method according to one embodiment includes identifying a relational database; determining columns of interest within the relational database; creating an unordered group of string tokens for each row of the relational database, utilizing the determined columns of interest; assigning weights for one or more columns within the relational database to one or more string tokens within each unordered group of string tokens to create a plurality of weighted unordered groups of string tokens; and determining a meaning vector for an identifier of each row of the relational database, utilizing the plurality of weighted unordered groups of string tokens.
    Type: Grant
    Filed: October 28, 2019
    Date of Patent: August 30, 2022
    Assignee: International Business Machines Corporation
    Inventor: Rajesh Bordawekar
  • Publication number: 20220269686
    Abstract: Systems, computer-implemented methods and/or computer program products to facilitate interpretation of a result of execution of a query over a structured database are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a determination component that determines a result of execution of a query over a structured database. The computer executable components also can comprise an interpretation component that interprets data underlying the result of execution of the query to determine one or more reasons that the result is provided in response to the query.
    Type: Application
    Filed: February 24, 2021
    Publication date: August 25, 2022
    Inventors: Rajesh Bordawekar, Apoorva Nitsure
  • Patent number: 11410031
    Abstract: Methods, systems and computer program products for updating a word embedding model are provided. Aspects include receiving a first data set comprising a relational database having a plurality of words. Aspects also include generating a word embedding model comprising a plurality of word vectors by training a neural network using unsupervised machine learning based on the first data set. Each word vector of the plurality of word vector corresponds to a unique word of the plurality of words. Aspects also include storing the plurality of word vectors and a representation of a hidden layer of the neural network. Aspects also include receiving a second data set comprising data that has been added to the relational database. Aspects also include updating the word embedding model based on the second data set and the stored representation of the hidden layer of the neural network.
    Type: Grant
    Filed: November 29, 2018
    Date of Patent: August 9, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Thomas Conti, Stephen Warren, Rajesh Bordawekar, Jose Neves, Christopher Harding
  • Publication number: 20220180253
    Abstract: Data-parallel ensemble training using gradient boosted trees includes training an ensemble of trees. The training includes splitting a training dataset into several data portions. Each data portion is assigned to each thread group from a set of thread groups. The training further includes executing a stage, in which each thread group, in parallel, trains a respective ensemble of decision trees. Executing the stage includes performing, by each thread group, in parallel, machine learning operations for the respective ensemble of decision trees using the data portion assigned to each thread group. Further, each thread group validates, in parallel, the respective ensemble of decision trees using a data portion assigned to another thread group. Execution of the stage is repeated until a predetermined threshold is satisfied. Further, a prediction is inferenced using the ensemble of decision trees that is formed using the respective ensemble of trees from each of the thread groups.
    Type: Application
    Filed: December 8, 2020
    Publication date: June 9, 2022
    Inventors: Rajesh Bordawekar, Tin Kam Ho
  • Patent number: 11244224
    Abstract: A first observation window in a first time series is identified. The first observation window is preceded by a first portion of the first time series. A neural network is trained using the first portion of the first time series and the first observation window, and weights are extracted from the middle layers of the neural network. A first feature vector is generated based on the weights. A second observation window in a second time series is identified, where the second observation window is preceded by a first portion of the second time series. A second feature vector associated with the second observation window is determined. The second feature vector is based at least in part on the first set of weights. A similarity between the first and second observation windows is determined based on comparing the first feature vector and the second feature vector.
    Type: Grant
    Filed: March 20, 2018
    Date of Patent: February 8, 2022
    Assignee: International Business Machines Corporation
    Inventors: Rajesh Bordawekar, Tin Kam Ho
  • Patent number: 11182414
    Abstract: A computer-implemented method, cognitive intelligence system and computer program product adapt a relational database containing multiple data types. Non-text tokens in the relational database are converted to a textual form. Text is produced based on relations of tokens in the relational database. A set of word vectors is produced for the tokens based on the text. A cognitive intelligence query expressed as a structured query language (SQL) query may be applied to the relational database using the set of word vectors. The form of non-text tokens is one of a numeric value, an SQL type, an image, a video, a time series, latitude and longitude, or chemical structures. A single word embedding model may be applied over one or more tokens in the text. A plurality of sets of preliminary word vectors are computed by applying more than one embedding model over all tokens in the text. The preliminary word vector sets are merged to form the set of word vectors.
    Type: Grant
    Filed: March 20, 2017
    Date of Patent: November 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Bortik Bandyopadhyay, Rajesh Bordawekar, Tin Kam Ho
  • Patent number: 11176176
    Abstract: From a first attribute-value pair in a record, new data comprising a first token is created. From each token using a processor and a memory, new data including a corresponding vector is computed. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value requiring correction. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. From values corresponding to the target attribute in the set of most similar rows, a replacement value is determined. The value requiring correction in the target row is replaced with the replacement value.
    Type: Grant
    Filed: November 20, 2018
    Date of Patent: November 16, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rajesh Bordawekar, Tin Kam Ho