Patents by Inventor Rajesh Bordawekar
Rajesh Bordawekar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240257164Abstract: Tokenized rows of a training portion of a database are selected, each of the selected tokenized rows having a first token value stored in a first column of the database. Training row vectors are grouped into clusters. From the clusters, prototypes are generated, each prototype comprising a numerical representation of a cluster. From input tokens, an input row vector is generated, the input row vector comprising a numerical representation of input tokens representing data in an input row of the database, the input row excluded from the training portion, each input token comprising a textual representation of data in a cell of the input row. Based on similarity with the input row vector, a prototype is selected. Data derived from the selected prototype is inserted into the first column of the input row.Type: ApplicationFiled: January 31, 2023Publication date: August 1, 2024Applicant: International Business Machines CorporationInventors: Matthew Harrison Tong, Apoorva Nitsure, Rajesh Bordawekar
-
Publication number: 20240220488Abstract: A count of unique values in a column of a database table is determined. A query on the database table is performed, wherein a technique for performing the query is selected based on the count of unique values.Type: ApplicationFiled: December 30, 2022Publication date: July 4, 2024Inventors: Rajesh Bordawekar, Jose Luis Pontes Correia Neves, Apoorva Nitsure
-
Patent number: 12026462Abstract: Methods, systems and computer program products for determining recommended parameters for use in generating a word embedding model are provided. Aspects include storing a plurality of meaningful test cases. Each meaningful test case includes a test data profile and one or more test model parameters used to create a word embedding model that has been classified as yielding meaningful results. Aspects include receiving a production data set to be used in generating a new word embedding model. The production data set includes data stored in a relational database having a plurality of columns and a plurality of rows. Aspects include generating a data profile associated with the production data set. Aspects include generating a recommendation for one or more production model parameters for use in building a word embedding model based on the data profile associated with the production data set and the plurality of meaningful test cases.Type: GrantFiled: November 29, 2018Date of Patent: July 2, 2024Assignee: International Business Machines CorporationInventors: Thomas Conti, Rajesh Bordawekar, Stephen Warren, Christopher Harding, Jose Neves
-
Publication number: 20240202214Abstract: Clustering data points of a relational database having special data types is performed by establishing logarithmic bins in which the data is collected. Special data types include (i) zero; (ii) positive and negative values; (iii) infinity (positive and negative); (iv) not-a-number values (NaNs); (v) out-of-range values; and (vi) IEEE DECFloat (decimal floating-point) values. The numerical data is mapped to bins according to their values and redistributed among the bins based on median bin value. An occupancy-based partitioning process assures each bin has no more than a pre-defined threshold percentage of the data. Assigning data bins to clusters facilitates prediction of placement of input values into a particular cluster for response to database queries.Type: ApplicationFiled: December 19, 2022Publication date: June 20, 2024Inventor: Rajesh Bordawekar
-
Publication number: 20240126767Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to a process to interpret results of a semantic clustering Structured Query Language (SQL) Cognitive Intelligence (CI) query. A system can comprise a memory that stores computer executable components, and a processor that executes the computer executable components stored in the memory, wherein the computer executable components can comprise an interpretability component that can identify dominant traits of a query input to determine a ranking of query results by identifying influential tokens of the query input based on data statistics and observing the dominant traits in influential tokens of a query output. In one or more embodiments, the interpretability component can identify dominant traits of the query input by incorporating co-occurrence measurements.Type: ApplicationFiled: October 12, 2022Publication date: April 18, 2024Inventors: Apoorva Nitsure, Rajesh Bordawekar
-
Patent number: 11948056Abstract: Data-parallel ensemble training using gradient boosted trees includes training an ensemble of trees. The training includes splitting a training dataset into several data portions. Each data portion is assigned to each thread group from a set of thread groups. The training further includes executing a stage, in which each thread group, in parallel, trains a respective ensemble of decision trees. Executing the stage includes performing, by each thread group, in parallel, machine learning operations for the respective ensemble of decision trees using the data portion assigned to each thread group. Further, each thread group validates, in parallel, the respective ensemble of decision trees using a data portion assigned to another thread group. Execution of the stage is repeated until a predetermined threshold is satisfied. Further, a prediction is inferenced using the ensemble of decision trees that is formed using the respective ensemble of trees from each of the thread groups.Type: GrantFiled: December 8, 2020Date of Patent: April 2, 2024Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11934401Abstract: Systems, computer-implemented methods or computer program products to facilitate receiving results of a semantic structured query language (SQL) query and employing sparse hash-table based sketches to interpret a semantic structured query language (SQL) query result. A computing component stores a first space-efficient structure sketch in a compressed serialize form. The computing component can load a second space-efficient data structure sketch along with the first space-efficient data structure sketch and can compute one or more interpretability scores by extracting co-occurrence information from the first space-efficient data structure sketch. The second space-efficient data structure sketch can include a sketch for containment check.Type: GrantFiled: August 4, 2022Date of Patent: March 19, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rajesh Bordawekar, Prabhakar Kudva
-
Publication number: 20240045866Abstract: Systems, computer-implemented methods or computer program products to facilitate receiving results of a semantic structured query language (SQL) query and employing sparse hash-table based sketches to interpret a semantic structured query language (SQL) query result. A computing component stores a first space-efficient structure sketch in a compressed serialize form. The computing component can load a second space-efficient data structure sketch along with the first space-efficient data structure sketch and can compute one or more interpretability scores by extracting co-occurrence information from the first space-efficient data structure sketch. The second space-efficient data structure sketch can include a sketch for containment check.Type: ApplicationFiled: August 4, 2022Publication date: February 8, 2024Inventors: Rajesh Bordawekar, Prabhakar Kudva
-
Patent number: 11847113Abstract: A system, apparatus, and a method for training with multi-modal data in a relational database, including generating a first database including a multi-view of the multi-modal data, retrieving a second set of data from an external source via a network, and training a first model according the first database and the second set of data. The first model outputs relationships of the first database with the multi-view and the second set of data.Type: GrantFiled: June 21, 2021Date of Patent: December 19, 2023Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Bortik Bandyopadhyay
-
Patent number: 11741099Abstract: A computer-implemented method of performing queries using Artificial Intelligence (AI) database embeddings includes the operations of generating a plurality of vector embeddings describing a training data from a database for training a machine learning model. A test vector embedding is generated from the plurality of vector embeddings based on training data for unseen data from one or more rows of the database. One or more vectors from the plurality of vector embeddings describing the training data that are a closest match to the test vector embedding are identified. A task is determined based upon the unseen data. The determined task is performed using the trained machine learning model.Type: GrantFiled: February 28, 2021Date of Patent: August 29, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rajesh Bordawekar, Apoorva Nitsure
-
Publication number: 20230185788Abstract: A computer-implemented method accelerates cognitive intelligence queries to a database using semantic keys. A computing device receives a database. The computing device inserts one or more key-value pairs into the database, where the key is a semantic key that is generated from a binary codeword built from a locality sensitive hashing of one or more vectors in a database embedding model of the database, and where the value is a tuple in the database that identifies entries in the database that share predefined features. The computing device uses the one or more key-value pairs for accelerating cognitive intelligence queries to the database.Type: ApplicationFiled: December 9, 2021Publication date: June 15, 2023Inventor: RAJESH BORDAWEKAR
-
Patent number: 11650987Abstract: From a first attribute-value pair in a record, new data is created including a first token. Using a first model and using a processor and a memory, each token is vectorized into new data including a corresponding vector. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value for which a semantic similarity computation is to be performed. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. The set of most similar rows is used to compute a response to a database query.Type: GrantFiled: January 2, 2019Date of Patent: May 16, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rajesh Bordawekar, Jose Neves
-
Publication number: 20220277008Abstract: A computer-implemented method of performing queries using Artificial Intelligence (AI) database embeddings includes the operations of generating a plurality of vector embeddings describing a training data from a database for training a machine learning model. A test vector embedding is generated from the plurality of vector embeddings based on training data for unseen data from one or more rows of the database. One or more vectors from the plurality of vector embeddings describing the training data that are a closest match to the test vector embedding are identified. A task is determined based upon the unseen data. The determined task is performed using the trained machine learning model.Type: ApplicationFiled: February 28, 2021Publication date: September 1, 2022Inventors: Rajesh Bordawekar, Apoorva Nitsure
-
Patent number: 11429579Abstract: A computer-implemented method according to one embodiment includes identifying a relational database; determining columns of interest within the relational database; creating an unordered group of string tokens for each row of the relational database, utilizing the determined columns of interest; assigning weights for one or more columns within the relational database to one or more string tokens within each unordered group of string tokens to create a plurality of weighted unordered groups of string tokens; and determining a meaning vector for an identifier of each row of the relational database, utilizing the plurality of weighted unordered groups of string tokens.Type: GrantFiled: October 28, 2019Date of Patent: August 30, 2022Assignee: International Business Machines CorporationInventor: Rajesh Bordawekar
-
Publication number: 20220269686Abstract: Systems, computer-implemented methods and/or computer program products to facilitate interpretation of a result of execution of a query over a structured database are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a determination component that determines a result of execution of a query over a structured database. The computer executable components also can comprise an interpretation component that interprets data underlying the result of execution of the query to determine one or more reasons that the result is provided in response to the query.Type: ApplicationFiled: February 24, 2021Publication date: August 25, 2022Inventors: Rajesh Bordawekar, Apoorva Nitsure
-
Patent number: 11410031Abstract: Methods, systems and computer program products for updating a word embedding model are provided. Aspects include receiving a first data set comprising a relational database having a plurality of words. Aspects also include generating a word embedding model comprising a plurality of word vectors by training a neural network using unsupervised machine learning based on the first data set. Each word vector of the plurality of word vector corresponds to a unique word of the plurality of words. Aspects also include storing the plurality of word vectors and a representation of a hidden layer of the neural network. Aspects also include receiving a second data set comprising data that has been added to the relational database. Aspects also include updating the word embedding model based on the second data set and the stored representation of the hidden layer of the neural network.Type: GrantFiled: November 29, 2018Date of Patent: August 9, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Thomas Conti, Stephen Warren, Rajesh Bordawekar, Jose Neves, Christopher Harding
-
Publication number: 20220180253Abstract: Data-parallel ensemble training using gradient boosted trees includes training an ensemble of trees. The training includes splitting a training dataset into several data portions. Each data portion is assigned to each thread group from a set of thread groups. The training further includes executing a stage, in which each thread group, in parallel, trains a respective ensemble of decision trees. Executing the stage includes performing, by each thread group, in parallel, machine learning operations for the respective ensemble of decision trees using the data portion assigned to each thread group. Further, each thread group validates, in parallel, the respective ensemble of decision trees using a data portion assigned to another thread group. Execution of the stage is repeated until a predetermined threshold is satisfied. Further, a prediction is inferenced using the ensemble of decision trees that is formed using the respective ensemble of trees from each of the thread groups.Type: ApplicationFiled: December 8, 2020Publication date: June 9, 2022Inventors: Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11244224Abstract: A first observation window in a first time series is identified. The first observation window is preceded by a first portion of the first time series. A neural network is trained using the first portion of the first time series and the first observation window, and weights are extracted from the middle layers of the neural network. A first feature vector is generated based on the weights. A second observation window in a second time series is identified, where the second observation window is preceded by a first portion of the second time series. A second feature vector associated with the second observation window is determined. The second feature vector is based at least in part on the first set of weights. A similarity between the first and second observation windows is determined based on comparing the first feature vector and the second feature vector.Type: GrantFiled: March 20, 2018Date of Patent: February 8, 2022Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11182414Abstract: A computer-implemented method, cognitive intelligence system and computer program product adapt a relational database containing multiple data types. Non-text tokens in the relational database are converted to a textual form. Text is produced based on relations of tokens in the relational database. A set of word vectors is produced for the tokens based on the text. A cognitive intelligence query expressed as a structured query language (SQL) query may be applied to the relational database using the set of word vectors. The form of non-text tokens is one of a numeric value, an SQL type, an image, a video, a time series, latitude and longitude, or chemical structures. A single word embedding model may be applied over one or more tokens in the text. A plurality of sets of preliminary word vectors are computed by applying more than one embedding model over all tokens in the text. The preliminary word vector sets are merged to form the set of word vectors.Type: GrantFiled: March 20, 2017Date of Patent: November 23, 2021Assignee: International Business Machines CorporationInventors: Bortik Bandyopadhyay, Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11176176Abstract: From a first attribute-value pair in a record, new data comprising a first token is created. From each token using a processor and a memory, new data including a corresponding vector is computed. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value requiring correction. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. From values corresponding to the target attribute in the set of most similar rows, a replacement value is determined. The value requiring correction in the target row is replaced with the replacement value.Type: GrantFiled: November 20, 2018Date of Patent: November 16, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rajesh Bordawekar, Tin Kam Ho