Patents by Inventor Rajesh Bordawekar
Rajesh Bordawekar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11244224Abstract: A first observation window in a first time series is identified. The first observation window is preceded by a first portion of the first time series. A neural network is trained using the first portion of the first time series and the first observation window, and weights are extracted from the middle layers of the neural network. A first feature vector is generated based on the weights. A second observation window in a second time series is identified, where the second observation window is preceded by a first portion of the second time series. A second feature vector associated with the second observation window is determined. The second feature vector is based at least in part on the first set of weights. A similarity between the first and second observation windows is determined based on comparing the first feature vector and the second feature vector.Type: GrantFiled: March 20, 2018Date of Patent: February 8, 2022Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11182414Abstract: A computer-implemented method, cognitive intelligence system and computer program product adapt a relational database containing multiple data types. Non-text tokens in the relational database are converted to a textual form. Text is produced based on relations of tokens in the relational database. A set of word vectors is produced for the tokens based on the text. A cognitive intelligence query expressed as a structured query language (SQL) query may be applied to the relational database using the set of word vectors. The form of non-text tokens is one of a numeric value, an SQL type, an image, a video, a time series, latitude and longitude, or chemical structures. A single word embedding model may be applied over one or more tokens in the text. A plurality of sets of preliminary word vectors are computed by applying more than one embedding model over all tokens in the text. The preliminary word vector sets are merged to form the set of word vectors.Type: GrantFiled: March 20, 2017Date of Patent: November 23, 2021Assignee: International Business Machines CorporationInventors: Bortik Bandyopadhyay, Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11176176Abstract: From a first attribute-value pair in a record, new data comprising a first token is created. From each token using a processor and a memory, new data including a corresponding vector is computed. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value requiring correction. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. From values corresponding to the target attribute in the set of most similar rows, a replacement value is determined. The value requiring correction in the target row is replaced with the replacement value.Type: GrantFiled: November 20, 2018Date of Patent: November 16, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11163761Abstract: Structured and semi-structured databases and files are processed using natural language processing techniques to impute data for null value tokens in database records from other records that have non-null values for the same attributes. Vector embedding techniques are used, including, in some cases, appropriately tagging null value tokens to reduce or eliminate their undue impact on semantic vectors generating using a neural network.Type: GrantFiled: March 20, 2020Date of Patent: November 2, 2021Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Tin Kam Ho
-
Publication number: 20210311937Abstract: A system, apparatus, and a method for training with multi-modal data in a relational database, including generating a first database including a multi-view of the multi-modal data, retrieving a second set of data from an external source via a network, and training a first model according the first database and the second set of data. The first model outputs relationships of the first database with the multi-view and the second set of data.Type: ApplicationFiled: June 21, 2021Publication date: October 7, 2021Inventors: Rajesh Bordawekar, Bortik Bandyopadhyay
-
Publication number: 20210294794Abstract: Structured and semi-structured databases and files are processed using natural language processing techniques to impute data for null value tokens in database records from other records that have non-null values for the same attributes. Vector embedding techniques are used, including, in some cases, appropriately tagging null value tokens to reduce or eliminate their undue impact on semantic vectors generating using a neural network.Type: ApplicationFiled: March 20, 2020Publication date: September 23, 2021Inventors: Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11100100Abstract: A computer-implemented method, cognitive intelligence server and computer program product adapt a relational database containing numeric data types. At least one numeric token in the relational database is converted to a textual form. Text is produced based on relations of tokens in the relational database. A set of word vectors is produced based on the text. A cognitive intelligence query, expressed as a structured query language (SQL) query, may be applied to the relational database using the set of word vectors. At least one numeric token in the relational database may be converted to a typed string comprising a heading for a column in the relational database for which the token appears and the numeric value. Converting at least one numeric token in the relational database may comprise clustering tokens in a column of the relational database using a clustering algorithm and replacing each token in the column by a cluster identifier.Type: GrantFiled: March 20, 2017Date of Patent: August 24, 2021Assignee: International Business Machines CorporationInventors: Bortik Bandyopadhyay, Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11080273Abstract: A computer-implemented method, a cognitive intelligence system and computer program product adapt a relational database containing image data types. At least one image token in the relational database is converted to a textual form. Text is produced based on relations of tokens in the relational database. A set of word vectors is produced based on the text. A cognitive intelligence query expressed as a structured query language (SQL) query may be applied to the relational database using the set of word vectors. An image token may be converted to textual form by converting the image to a tag, by using a neural network classification model and replacing the image token with a corresponding cluster identifier, by binary comparison or by a user-specified similarity function. An image token may be converted to a plurality of textual forms using more than one conversion method.Type: GrantFiled: March 20, 2017Date of Patent: August 3, 2021Assignee: International Business Machines CorporationInventors: Bortik Bandyopadhyay, Rajesh Bordawekar, Tin Kam Ho
-
Patent number: 11074253Abstract: A system and a method for performing queries, including generating text representations of features of various types of data, building a multi-modal word embedding model to capture relationships between the various types of data, and based on the multi-modal word embedding model, performing an inductive reasoning query.Type: GrantFiled: November 2, 2018Date of Patent: July 27, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rajesh Bordawekar, Bortik Bandyopadhyay
-
Publication number: 20210124724Abstract: A computer-implemented method according to one embodiment includes identifying a relational database; determining columns of interest within the relational database; creating an unordered group of string tokens for each row of the relational database, utilizing the determined columns of interest; assigning weights for one or more columns within the relational database to one or more string tokens within each unordered group of string tokens to create a plurality of weighted unordered groups of string tokens; and determining a meaning vector for an identifier of each row of the relational database, utilizing the plurality of weighted unordered groups of string tokens.Type: ApplicationFiled: October 28, 2019Publication date: April 29, 2021Inventor: Rajesh Bordawekar
-
Patent number: 10984030Abstract: A computer-implemented method, a cognitive intelligence system and computer program product adapt a relational database containing multiple data types. Non-text tokens in the relational database are converted to a textual form. Text is produced based on relations of tokens in the relational database. A set of pre-trained word vectors for the text is retrieved from an external database. The set of pre-trained word vectors is initialized for tokens common to both the relational database and an external database. The set of pre-trained vectors is used to create a cognitive intelligence query expressed as a structure query language (SQL) query. Content of the relational database is used for training while initializing the set of pre-trained word vectors for tokens common to both the relational database and the external database. The first set of word vectors may be immutable or mutable with updates controlled via parameters.Type: GrantFiled: March 20, 2017Date of Patent: April 20, 2021Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Oded Shmueli
-
Patent number: 10831738Abstract: Apparatuses and Methods for sorting a data set. A data storage is divided into a plurality of buckets that is each associated with a respective key value. A plurality of stripes is identified in each bucket. A plurality of data stripe sets is defined that has one stripe within each respective bucket. A first and a second in-place partial bucket radix sort are performed on data items contained within the first and second data stripe sets, respectively, using an initial radix. Incorrectly sorted data items in the first bucket are grouped by a first processor and incorrectly sorted data items in the second bucket are grouped by a second processor into a respective incorrect data item group within each bucket. A radix sort is then performed using the initial radix on the items within the respective incorrect data item group. A first level sorted output is produced.Type: GrantFiled: December 22, 2017Date of Patent: November 10, 2020Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Daniel Brand, Minsik Cho, Ulrich Finkler, Ruchir Puri
-
Patent number: 10831752Abstract: A method, computer program product and/or system is disclosed. According to an aspect of this invention, one or more processors receive a query of a first database, where the query includes: (i) an operand, and (ii) an operator indicating a distance-based similarity measure. One or more processors further determine a result set based on the query, wherein the result set includes a plurality of records, and wherein a record is included in the result set based on a vector nearest-neighbor computation between: (i) a first vector corresponding to the operand, and (ii) a second vector corresponding to the record, wherein the second vector is included in a vector space model that is based on a textual representation of the first database.Type: GrantFiled: April 25, 2018Date of Patent: November 10, 2020Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Oded Shmueli
-
Publication number: 20200210431Abstract: from a first attribute-value pair in a record, new data is created including a first token. Using a first model and using a processor and a memory, each token is vectorized into new data including a corresponding vector. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value for which a semantic similarity computation is to be performed. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. The set of most similar rows is used to compute a response to a database query.Type: ApplicationFiled: January 2, 2019Publication date: July 2, 2020Applicant: International Business Machines CorporationInventors: Rajesh Bordawekar, Jose Neves
-
Patent number: 10685002Abstract: An information processing system, computer readable storage medium, and method for accelerated radix sort processing of data elements in an array in memory. The information processing system stores an array of data elements in a buffer memory in an application specific integrated circuit radix sort accelerator. The array has a head end and a tail end. The system radix sort processing, with a head processor, data elements starting at the head end of the array and progressively advancing radix sort processing data elements toward the tail end of the array. The system radix sort processing, with a tail processor, data elements starting at the tail end of the array and progressively advancing radix sort processing data elements toward the head end of the array, the tail processor radix sort processing data elements in the array contemporaneously with the head processor radix sort processing data elements in the array.Type: GrantFiled: December 29, 2017Date of Patent: June 16, 2020Assignee: International Business Machines CorporationInventors: Rajesh Bordawekar, Daniel Brand, Minsik Cho, Brian R. Konigsburg, Ruchir Puri
-
Publication number: 20200175360Abstract: Methods, systems and computer program products for updating a word embedding model are provided. Aspects include receiving a first data set comprising a relational database having a plurality of words. Aspects also include generating a word embedding model comprising a plurality of word vectors by training a neural network using unsupervised machine learning based on the first data set. Each word vector of the plurality of word vector corresponds to a unique word of the plurality of words. Aspects also include storing the plurality of word vectors and a representation of a hidden layer of the neural network. Aspects also include receiving a second data set comprising data that has been added to the relational database. Aspects also include updating the word embedding model based on the second data set and the stored representation of the hidden layer of the neural network.Type: ApplicationFiled: November 29, 2018Publication date: June 4, 2020Inventors: Thomas Conti, Stephen Warren, Rajesh Bordawekar, Jose Neves, Christopher Harding
-
Publication number: 20200175390Abstract: Methods, systems and computer program products for determining recommended parameters for use in generating a word embedding model are provided. Aspects include storing a plurality of meaningful test cases. Each meaningful test case includes a test data profile and one or more test model parameters used to create a word embedding model that has been classified as yielding meaningful results. Aspects include receiving a production data set to be used in generating a new word embedding model. The production data set includes data stored in a relational database having a plurality of columns and a plurality of rows. Aspects include generating a data profile associated with the production data set. Aspects include generating a recommendation for one or more production model parameters for use in building a word embedding model based on the data profile associated with the production data set and the plurality of meaningful test cases.Type: ApplicationFiled: November 29, 2018Publication date: June 4, 2020Inventors: Thomas Conti, Rajesh Bordawekar, Stephen Warren, Christopher Harding, Jose Neves
-
Publication number: 20200159853Abstract: From a first attribute-value pair in a record, new data comprising a first token is created. From each token using a processor and a memory, new data including a corresponding vector is computed. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value requiring correction. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. From values corresponding to the target attribute in the set of most similar rows, a replacement value is determined. The value requiring correction in the target row is replaced with the replacement value.Type: ApplicationFiled: November 20, 2018Publication date: May 21, 2020Applicant: International Business Machines CorporationInventors: Rajesh Bordawekar, Tin Kam Ho
-
Publication number: 20200142989Abstract: A system and a method for performing queries, including generating text representations of features of various types of data, building a multi-modal word embedding model to capture relationships between the various types of data, and based on the multi-modal word embedding model, performing an inductive reasoning query.Type: ApplicationFiled: November 2, 2018Publication date: May 7, 2020Inventors: Rajesh Bordawekar, Bortik Bandyopadhyay
-
Publication number: 20190332705Abstract: A method, computer program product and/or system is disclosed. According to an aspect of this invention, one or more processors receive a query of a first database, where the query includes: (i) an operand, and (ii) an operator indicating a distance-based similarity measure. One or more processors further determine a result set based on the query, wherein the result set includes a plurality of records, and wherein a record is included in the result set based on a vector nearest-neighbor computation between: (i) a first vector corresponding to the operand, and (ii) a second vector corresponding to the record, wherein the second vector is included in a vector space model that is based on a textual representation of the first database.Type: ApplicationFiled: April 25, 2018Publication date: October 31, 2019Inventors: Rajesh Bordawekar, Oded Shmueli