Patents Assigned to Io-Tahoe LLC.
  • Patent number: 11106642
    Abstract: A system and computer implemented method for cataloging database metadata using a probabilistic signature matching process are provided. The method includes receiving an input name to be matched to keys in a data corpus; dividing the received input name into a plurality of text segments; identifying a set of matching keys by matching each of the plurality text segments against keys in the data corpus; analyzing the set of matching keys to construct a tag; and cataloging the metadata with the matching key as the construct tag.
    Type: Grant
    Filed: December 26, 2018
    Date of Patent: August 31, 2021
    Assignee: Io-Tahoe LLC.
    Inventors: Tomoya Wada, Winnie Cheng, Rohit Mahajan, Alex Mylnikov
  • Patent number: 11074235
    Abstract: A method and an inclusion dependency determination system (IDDS) for determining inclusion dependency between columns of tables in a target database to establish primary key (PK)-foreign key (FK) relationships among data in the columns with minimized disk input and output operations are provided. The IDDS determines dependency characteristic data (DCD) of each column and arranges the columns by applying one or more predefined rules to the columns based on a minimum value of the data of each column. The IDDS determines pairs of arranged columns that demonstrate a possibility of inclusion dependency based on the DCD and identifies a first column and a second column of each determined pair as a candidate PK and a candidate FK respectively. The IDDS determines inclusion dependency between the candidate PK and the candidate FK on comparing data of the candidate PK with the data of the candidate FK using dynamically determined search techniques.
    Type: Grant
    Filed: August 10, 2017
    Date of Patent: July 27, 2021
    Assignee: IO-Tahoe LLC
    Inventors: Ram Dayal Goyal, Rohit Mahajan
  • Publication number: 20210026820
    Abstract: A system and method for data entries deduplication are provided. The method includes indexing an input data set, wherein the input data set is in a tabular formant and the indexing includes providing a unique Row identifier (RowID), wherein rows are the data entries; computing attribute similarity for each column across each pair of rows; computing, for each pair of rows, row-to-row similarity as a weighted sum of attribute similarities; clustering pairs of rows based on their row-to-row similarities; and providing an output data set including at least the clustered pairs of rows.
    Type: Application
    Filed: July 24, 2020
    Publication date: January 28, 2021
    Applicant: Io-Tahoe LLC
    Inventors: Rohit MAHAJAN, Winnie CHENG
  • Publication number: 20200311608
    Abstract: A system and method for determining a relationship among data sets. The method includes selecting a first data set from a first table, and a second data set from a second table, forming an inclusion dependency pair of data based on the selected first data set and the selected second data set, determining a resultant of the inclusion dependency pair, and determining a primary key-foreign key relationship between the first data set and the second data set based on the determined resultant.
    Type: Application
    Filed: June 12, 2020
    Publication date: October 1, 2020
    Applicant: Io-Tahoe LLC
    Inventors: Yongming XU, Ram Dayal GOYAL
  • Publication number: 20200278954
    Abstract: A system and method for performing a hash bucketing process on data in motion are presented. The method includes applying a first hash function on an input dataset to map the input dataset to a bucket, wherein the first hash function results with a first hash value; applying a second hash function on the first hash value to map the input dataset to a record in the bucket; generating metadata based on the input dataset, wherein the metadata at least points to the original location of the input dataset; and storing the generated metadata in the record in the bucket.
    Type: Application
    Filed: September 20, 2019
    Publication date: September 3, 2020
    Applicant: Io-Tahoe LLC
    Inventors: Alex MYLNIKOV, Rohit MAHAJAN
  • Publication number: 20200278973
    Abstract: A system and method for continuous processing of data streams residing in distributed data sources. The method includes: receiving a plurality of data streams from a plurality of distributed data sources; processing each of the plurality of data streams using a plurality commands, the plurality commands are executed in parallel using a graph execution engine; and transporting, through a transport layer, each of the plurality of data streams using the plurality commands to at least one data sink.
    Type: Application
    Filed: September 20, 2019
    Publication date: September 3, 2020
    Applicant: Io-Tahoe LLC
    Inventors: Alex MYLNIKOV, Rohit MAHAJAN
  • Publication number: 20200210388
    Abstract: A system and computer implemented method for cataloging database metadata using a probabilistic signature matching process are provided. The method includes receiving an input name to be matched to keys in a data corpus; dividing the received input name into a plurality of text segments; identifying a set of matching keys by matching each of the plurality text segments against keys in the data corpus; analyzing the set of matching keys to construct a tag; and cataloging the metadata with the matching key as the construct tag.
    Type: Application
    Filed: December 26, 2018
    Publication date: July 2, 2020
    Applicant: Io-Tahoe LLC.
    Inventors: Tomoya Wada, Winnie Cheng, Rohit Mahajan, Alex Mylnikov
  • Publication number: 20200210478
    Abstract: A system and computer implemented method for cataloging database metadata using a signature matching process are provided. The method includes receiving an input name to be matched to a key in a seed table; generating a first fingerprint by decomposing the received input name into a first set n-grams; generating, based on the received input name, a second fingerprint using a predetermined pronunciation schema, wherein the second fingerprint is a phonetic fingerprint; generating a third fingerprint by decomposing the second fingerprint into a second set of n-grams; identifying a matching key by matching any combination of the first fingerprint, the second fingerprint, and the third fingerprint against keys in the seed table; and cataloging the metadata with the matching key as a tag.
    Type: Application
    Filed: December 26, 2018
    Publication date: July 2, 2020
    Applicant: Io-Tahoe LLC.
    Inventors: Tomoya WADA, Winnie CHENG, Rohit MAHAJAN, Alex MYLNIKOV
  • Patent number: 10692015
    Abstract: A method and a machine learning relationship determination system (MLRDS) for determining primary key-foreign key (PK-FK) relationships among data in tables of a target database through machine learning (ML) are provided. The MLRDS selects columns of the tables in the target database and identifies inclusion dependency (ID) pairs from the selected columns. The MLRDS receives training data and validation data from a source database, computes PK-FK features for the inclusion dependency pairs, the training data, and the validation data, and generates trained ML models and validated ML models using the PK-FK features. The MLRDS determines an optimum algorithm decision threshold for a selected machine learning classification algorithm (MLCA), using which the MLRDS determines a resultant on whether the inclusion dependency pair is a PK-FK pair or a non-PK-FK pair. The MLRDS performs majority voting on the resultant for multiple MLCAs to confirm the PK-FK relationships between the inclusion dependency pairs.
    Type: Grant
    Filed: July 15, 2016
    Date of Patent: June 23, 2020
    Assignee: Io-Tahoe LLC
    Inventors: Yongming Xu, Ram Dayal Goyal
  • Publication number: 20200104379
    Abstract: A method and system for tagging database columns are presented. The method includes receiving an input column name of at least one column in a database; performing signature matching of the input column name to contents of a seed table; determining a first confidence score for the signature matching; and tagging a matching value in the seed table as a tag for the input column name, when a first confidence score exceeds a first threshold value.
    Type: Application
    Filed: September 28, 2018
    Publication date: April 2, 2020
    Applicant: Io-Tahoe LLC.
    Inventors: Tomoya WADA, Winnie CHENG, Rohit MAHAJAN, Alex MYLNIKOV