Patents by Inventor Hima P. Karanam

Hima P. Karanam has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10169418
    Abstract: Methods, systems, and computer program products for deriving a multi-pass matching algorithm for data de-duplication are provided herein. A method includes identifying multiple passes across multiple databases using a set of one or more blocking columns derived from a set of trained input data; identifying, in each of the multiple passes, one or more columns across the multiple databases that match one or more of the blocking columns; selecting a given pass from the multiple passes, wherein said given pass comprises a maximum number of matching columns within the multiple passes; determining, for the given pass, data that conform to the given pass comprising (i) a set of matching columns, (ii) one or more matching types and (iii) one or more weights; and determining one or more subsequent passes across the multiple databases iteratively by removing the data that conform to the given pass.
    Type: Grant
    Filed: September 24, 2014
    Date of Patent: January 1, 2019
    Assignee: International Business Machines Corporation
    Inventors: Hima P. Karanam, Albert Maier, Marvin Mendelssohn, Heather Stimpson, Dan Dan Zheng
  • Patent number: 10163063
    Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
    Type: Grant
    Filed: March 7, 2012
    Date of Patent: December 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
  • Patent number: 10095780
    Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
    Type: Grant
    Filed: February 7, 2017
    Date of Patent: October 9, 2018
    Assignee: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
  • Patent number: 9996607
    Abstract: Described herein are methods, systems and computer program products for entity resolution. Entity resolution, also known as entity matching or record linkage, seeks to identify equivalent data objects between or among datasets. An example method includes creating a deterministic model by defining an entity to be resolved, selecting two datasets for comparison, defining matching predicates for attributes of the datasets to select a set of candidate matches, and defining a precedence rule for the candidate matches to select a subset of the candidate matches. The method includes running the deterministic model on the two datasets. Running the deterministic model includes applying the matching predicates and the precedence rule to data in the datasets that correspond to the attributes. The method also includes applying a cardinality rule to results of the running, and outputting the matching candidates for which the cardinality rule is satisfied.
    Type: Grant
    Filed: October 31, 2014
    Date of Patent: June 12, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bogdan Alexe, Douglas R. Burdick, Mauricio A. Hernandez-Sherrington, Hima P. Karanam, Rajasekar Krishnamurthy, Lucian Popa, Shivakumar Vaithyanathan
  • Publication number: 20170147688
    Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
    Type: Application
    Filed: February 7, 2017
    Publication date: May 25, 2017
    Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
  • Publication number: 20160125067
    Abstract: Embodiments relate to entity resolution. One aspect includes creating a deterministic model by defining an entity to be resolved, selecting two datasets for comparison, defining matching predicates for attributes of the datasets to select a set of candidate matches, and defining a precedence rule for the candidate matches to select a subset of the candidate matches. An aspect further includes running the deterministic model on the two datasets. Running the deterministic model includes applying the matching predicates and the precedence rule to data in the datasets that correspond to the attributes. An aspect also includes applying a cardinality rule to results of the running, and outputting the matching candidates for which the cardinality rule is satisfied.
    Type: Application
    Filed: October 31, 2014
    Publication date: May 5, 2016
    Inventors: Bogdan Alexe, Douglas R. Burdick, Mauricio A. Hernandez-Sherrington, Hima P. Karanam, Rajasekar Krishnamurthy, Lucian Popa, Shivakumar Vaithyanathan
  • Publication number: 20160085807
    Abstract: Methods, systems, and computer program products for deriving a multi-pass matching algorithm for data de-duplication are provided herein. A method includes identifying multiple passes across multiple databases using a set of one or more blocking columns derived from a set of trained input data; identifying, in each of the multiple passes, one or more columns across the multiple databases that match one or more of the blocking columns; selecting a given pass from the multiple passes, wherein said given pass comprises a maximum number of matching columns within the multiple passes; determining, for the given pass, data that conform to the given pass comprising (i) a set of matching columns, (ii) one or more matching types and (iii) one or more weights; and determining one or more subsequent passes across the multiple databases iteratively by removing the data that conform to the given pass.
    Type: Application
    Filed: September 24, 2014
    Publication date: March 24, 2016
    Inventors: Hima P. Karanam, Albert Maier, Marvin Mendelssohn, Heather Stimpson, Dan Dan Zheng
  • Patent number: 9213574
    Abstract: A method, system and a computer program product for determining resources allocation in a distributed computing environment. An embodiment may include identifying resources in a distributed computing environment, computing provisioning parameters, computing configuration parameters and quantifying service parameters in response to a set of service level agreements (SLA). The embodiment may further include iteratively computing a completion time required for completion of the assigned task and a cost. Embodiments may further include computing an optimal resources configuration and computing at least one of an optimal completion time and an optimal cost corresponding to the optimal resources configuration. Embodiments may further include dynamically modifying the optimal resources configuration in response to at least one change in at least one of provisioning parameters, computing parameters and quantifying service parameters.
    Type: Grant
    Filed: January 30, 2010
    Date of Patent: December 15, 2015
    Assignee: International Business Machines Corporation
    Inventors: Tanveer A Faruquie, Hima P Karanam, Mukesh K Mohania, L Venkata Subramaniam, Girish Venkatachaliah
  • Patent number: 9104709
    Abstract: According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.
    Type: Grant
    Filed: March 16, 2012
    Date of Patent: August 11, 2015
    Assignee: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A Faruquie, Hima P Karanam, Mukesh K Mohania, L Venkata Subramaniam
  • Patent number: 8996524
    Abstract: Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
    Type: Grant
    Filed: March 8, 2012
    Date of Patent: March 31, 2015
    Assignee: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
  • Patent number: 8700542
    Abstract: Systems, methods, and computer products for optimally managing large rule sets are disclosed. Rule dependencies of rules within a set of rules may be determined as a function of rules execution frequency data generated from applying the rules over a data set. The rules within the set of rules may be clustered into rules clusters based on the determined rule dependencies, in which the rules clusters comprise disjoint subsets of the rules within the set of rules. Cluster frequency data for the rules clusters may be used to arrive at an optimal ordering. Each rule within the set of rules may be assigned a unique identification that may capture an execution order of the rules within the set of rules.
    Type: Grant
    Filed: December 15, 2010
    Date of Patent: April 15, 2014
    Assignee: International Business Machines Corporation
    Inventors: Mohan N. Dani, Tanveer A. Faruquie, Hima P. Karanam, L. Venkata Subramaniam, Girish Venkatachaliah
  • Patent number: 8560506
    Abstract: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.
    Type: Grant
    Filed: April 16, 2012
    Date of Patent: October 15, 2013
    Assignee: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
  • Patent number: 8560505
    Abstract: Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.
    Type: Grant
    Filed: December 7, 2011
    Date of Patent: October 15, 2013
    Assignee: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
  • Publication number: 20130238610
    Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
    Type: Application
    Filed: March 7, 2012
    Publication date: September 12, 2013
    Applicant: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
  • Publication number: 20130238611
    Abstract: Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
    Type: Application
    Filed: March 8, 2012
    Publication date: September 12, 2013
    Applicant: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
  • Publication number: 20130151487
    Abstract: Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.
    Type: Application
    Filed: December 7, 2011
    Publication date: June 13, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: SNIGDHA CHATURVEDI, TANVEER A. FARUQUIE, HIMA P. KARANAM, MARVIN MENDELSSOHN, MUKESH K. MOHANIA, L. VENKATA SUBRAMANIAM
  • Publication number: 20130151490
    Abstract: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.
    Type: Application
    Filed: April 16, 2012
    Publication date: June 13, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: SNIGDHA CHATURVEDI, TANVEER A. FARUQUIE, HIMA P. KARANAM, MARVIN MENDELSSOHN, MUKESH K. MOHANIA, L. VENKATA SUBRAMANIAM
  • Publication number: 20120179658
    Abstract: According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.
    Type: Application
    Filed: March 16, 2012
    Publication date: July 12, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Mukesh K. Mohania, L. Venkata Subramaniam
  • Publication number: 20120158619
    Abstract: Systems, methods, and computer products for optimally managing large rule sets are disclosed. Rule dependencies of rules within a set of rules may be determined as a function of rules execution frequency data generated from applying the rules over a data set. The rules within the set of rules may be clustered into rules clusters based on the determined rule dependencies, in which the rules clusters comprise disjoint subsets of the rules within the set of rules. Cluster frequency data for the rules clusters may be used to arrive at an optimal ordering.
    Type: Application
    Filed: December 15, 2010
    Publication date: June 21, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: MOHAN N. DANI, TANVEER A. FARUQUIE, HIMA P. KARANAM, L.V. SUBRAMANIAM, GIRISH VENKATACHALIAH
  • Publication number: 20120150825
    Abstract: According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.
    Type: Application
    Filed: December 13, 2010
    Publication date: June 14, 2012
    Applicant: International Business Machines Corporation
    Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Mukesh K. Mohania, L. Venkata Subramaniam