Patents by Inventor Marvin Mendelssohn
Marvin Mendelssohn has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10169418Abstract: Methods, systems, and computer program products for deriving a multi-pass matching algorithm for data de-duplication are provided herein. A method includes identifying multiple passes across multiple databases using a set of one or more blocking columns derived from a set of trained input data; identifying, in each of the multiple passes, one or more columns across the multiple databases that match one or more of the blocking columns; selecting a given pass from the multiple passes, wherein said given pass comprises a maximum number of matching columns within the multiple passes; determining, for the given pass, data that conform to the given pass comprising (i) a set of matching columns, (ii) one or more matching types and (iii) one or more weights; and determining one or more subsequent passes across the multiple databases iteratively by removing the data that conform to the given pass.Type: GrantFiled: September 24, 2014Date of Patent: January 1, 2019Assignee: International Business Machines CorporationInventors: Hima P. Karanam, Albert Maier, Marvin Mendelssohn, Heather Stimpson, Dan Dan Zheng
-
Patent number: 10163063Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.Type: GrantFiled: March 7, 2012Date of Patent: December 25, 2018Assignee: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Patent number: 10095780Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.Type: GrantFiled: February 7, 2017Date of Patent: October 9, 2018Assignee: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20170147688Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.Type: ApplicationFiled: February 7, 2017Publication date: May 25, 2017Inventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20160085807Abstract: Methods, systems, and computer program products for deriving a multi-pass matching algorithm for data de-duplication are provided herein. A method includes identifying multiple passes across multiple databases using a set of one or more blocking columns derived from a set of trained input data; identifying, in each of the multiple passes, one or more columns across the multiple databases that match one or more of the blocking columns; selecting a given pass from the multiple passes, wherein said given pass comprises a maximum number of matching columns within the multiple passes; determining, for the given pass, data that conform to the given pass comprising (i) a set of matching columns, (ii) one or more matching types and (iii) one or more weights; and determining one or more subsequent passes across the multiple databases iteratively by removing the data that conform to the given pass.Type: ApplicationFiled: September 24, 2014Publication date: March 24, 2016Inventors: Hima P. Karanam, Albert Maier, Marvin Mendelssohn, Heather Stimpson, Dan Dan Zheng
-
Patent number: 8996524Abstract: Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.Type: GrantFiled: March 8, 2012Date of Patent: March 31, 2015Assignee: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Patent number: 8682898Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.Type: GrantFiled: April 30, 2010Date of Patent: March 25, 2014Assignee: International Business Machines CorporationInventors: Sachindra Joshi, Tanveer Faruquie, Hima Prasad Karanam, Marvin Mendelssohn, Mukesh Kumar Mohania, Angel Marie Smith, L Venkata Subramaniam, Girish Venkatachaliah
-
Patent number: 8560506Abstract: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.Type: GrantFiled: April 16, 2012Date of Patent: October 15, 2013Assignee: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Patent number: 8560505Abstract: Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.Type: GrantFiled: December 7, 2011Date of Patent: October 15, 2013Assignee: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20130238610Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.Type: ApplicationFiled: March 7, 2012Publication date: September 12, 2013Applicant: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20130238611Abstract: Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.Type: ApplicationFiled: March 8, 2012Publication date: September 12, 2013Applicant: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20130151490Abstract: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.Type: ApplicationFiled: April 16, 2012Publication date: June 13, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: SNIGDHA CHATURVEDI, TANVEER A. FARUQUIE, HIMA P. KARANAM, MARVIN MENDELSSOHN, MUKESH K. MOHANIA, L. VENKATA SUBRAMANIAM
-
Publication number: 20130151487Abstract: Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.Type: ApplicationFiled: December 7, 2011Publication date: June 13, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: SNIGDHA CHATURVEDI, TANVEER A. FARUQUIE, HIMA P. KARANAM, MARVIN MENDELSSOHN, MUKESH K. MOHANIA, L. VENKATA SUBRAMANIAM
-
Publication number: 20110270808Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.Type: ApplicationFiled: April 30, 2010Publication date: November 3, 2011Applicant: International Business Machines CorporationInventors: Tanveer A. Faruquie, Sachindra Joshi, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, Angel Smith, L. V. Subramaniam, Girish Venkatachaliah
-
Patent number: 5181162Abstract: An object-oriented document management and production system in which documents are represented as collections of logical components, or "objects", that may be combined and physically mapped onto a page-by-page layout. Stored objects are organized, accessed and manipulated through a database management system. At a minimum, objects contain basic information-bearing constituents such as text, image, voice or graphics. Objects may also contain further data specifying appearance characteristics, relationships to other objects, and access restrictions.Type: GrantFiled: December 6, 1989Date of Patent: January 19, 1993Assignee: Eastman Kodak CompanyInventors: Robert M. Smith, David M. T. Ting, Jan H. Boer, Marvin Mendelssohn