Patents by Inventor Tanveer A Faruquie
Tanveer A Faruquie has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8682898Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.Type: GrantFiled: April 30, 2010Date of Patent: March 25, 2014Assignee: International Business Machines CorporationInventors: Sachindra Joshi, Tanveer Faruquie, Hima Prasad Karanam, Marvin Mendelssohn, Mukesh Kumar Mohania, Angel Marie Smith, L Venkata Subramaniam, Girish Venkatachaliah
-
Publication number: 20130332408Abstract: The present invention relates to data cleansing, and in particular performing the semantic standardization process within a database before the transform portion of the extract-transform-load (ETL) process. Provided are a method, system and computer program product for standardizing data within a database engine, configuring the standardization function to determine at least one standardized value for at least one data value by applying the standardization table in a context of at least one data value, receiving a database query identifying the standardization function, at least one database value and the context of the data, and invoking the standardization function.Type: ApplicationFiled: July 31, 2013Publication date: December 12, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Tanveer A. Faruquie, Mukesh K. Mohania, L. V. Subramaniam, Charles D. Wolfson
-
Publication number: 20130332407Abstract: The present invention relates to data cleansing, and in particular performing the semantic standardization process within a database before the transform portion of the extract-transform-load (ETL) process. Provided are a method, system and computer program product for standardizing data within a database engine, configuring the standardization function to determine at least one standardized value for at least one data value by applying the standardization table in a context of at least one data value, receiving a database query identifying the standardization function, at least one database value and the context of the data, and invoking the standardization function.Type: ApplicationFiled: June 11, 2012Publication date: December 12, 2013Applicant: International Business Machines CorporationInventors: Tanveer A. Faruquie, Mukesh K. Mohania, L. V. Subramaniam, Charles D. Wolfson
-
Patent number: 8560506Abstract: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.Type: GrantFiled: April 16, 2012Date of Patent: October 15, 2013Assignee: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Patent number: 8560505Abstract: Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.Type: GrantFiled: December 7, 2011Date of Patent: October 15, 2013Assignee: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20130238610Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.Type: ApplicationFiled: March 7, 2012Publication date: September 12, 2013Applicant: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20130238611Abstract: Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.Type: ApplicationFiled: March 8, 2012Publication date: September 12, 2013Applicant: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20130151487Abstract: Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.Type: ApplicationFiled: December 7, 2011Publication date: June 13, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: SNIGDHA CHATURVEDI, TANVEER A. FARUQUIE, HIMA P. KARANAM, MARVIN MENDELSSOHN, MUKESH K. MOHANIA, L. VENKATA SUBRAMANIAM
-
Publication number: 20130151490Abstract: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.Type: ApplicationFiled: April 16, 2012Publication date: June 13, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: SNIGDHA CHATURVEDI, TANVEER A. FARUQUIE, HIMA P. KARANAM, MARVIN MENDELSSOHN, MUKESH K. MOHANIA, L. VENKATA SUBRAMANIAM
-
Publication number: 20120179658Abstract: According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.Type: ApplicationFiled: March 16, 2012Publication date: July 12, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20120158619Abstract: Systems, methods, and computer products for optimally managing large rule sets are disclosed. Rule dependencies of rules within a set of rules may be determined as a function of rules execution frequency data generated from applying the rules over a data set. The rules within the set of rules may be clustered into rules clusters based on the determined rule dependencies, in which the rules clusters comprise disjoint subsets of the rules within the set of rules. Cluster frequency data for the rules clusters may be used to arrive at an optimal ordering.Type: ApplicationFiled: December 15, 2010Publication date: June 21, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: MOHAN N. DANI, TANVEER A. FARUQUIE, HIMA P. KARANAM, L.V. SUBRAMANIAM, GIRISH VENKATACHALIAH
-
Publication number: 20120150825Abstract: According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.Type: ApplicationFiled: December 13, 2010Publication date: June 14, 2012Applicant: International Business Machines CorporationInventors: Snigdha Chaturvedi, Tanveer A. Faruquie, Hima P. Karanam, Mukesh K. Mohania, L. Venkata Subramaniam
-
Publication number: 20110320442Abstract: Systems and associated methods providing a document corpus navigation interface including domain independent facets are described. Embodiments provide a list of domain independent facets, extract facet values from the document corpus, learn the facet values from the corpus, map each document to one of the values of each of the facets, and automatically determine a weight of the relationship.Type: ApplicationFiled: June 25, 2010Publication date: December 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Tanveer A. Faruquie, Mukesh K. Mohania, Ullas B. Nambiar
-
Publication number: 20110270808Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.Type: ApplicationFiled: April 30, 2010Publication date: November 3, 2011Applicant: International Business Machines CorporationInventors: Tanveer A. Faruquie, Sachindra Joshi, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, Angel Smith, L. V. Subramaniam, Girish Venkatachaliah
-
Patent number: 8010595Abstract: A system (30) and method are provided for single-pass execution of dynamic pages across multiple request-response cycles. The system (30) comprises a client (32) and server (34) in communication with one another. A container (35) resides on the server and handles requests made for the result of a dynamic page (36). The container controls the processing of the dynamic page. If the dynamic page requires additional information to continue processing, an intermediate request (44) is transmitted to the client, which responds with an intermediate response (46) containing the additional information. A notifier servlet (38) receives the intermediate response and passes the information to the dynamic page so that execution can resume without interruption.Type: GrantFiled: November 29, 2005Date of Patent: August 30, 2011Assignee: International Business Machines CorporationInventors: Tanveer A Faruquie, Sandeep Jindal, Abhishek Verma
-
Publication number: 20110191781Abstract: A method, system and a computer program product for determining resources allocation in a distributed computing environment. An embodiment may include identifying resources in a distributed computing environment, computing provisioning parameters, computing configuration parameters and quantifying service parameters in response to a set of service level agreements (SLA). The embodiment may further include iteratively computing a completion time required for completion of the assigned task and a cost. Embodiments may further include computing an optimal resources configuration and computing at least one of an optimal completion time and an optimal cost corresponding to the optimal resources configuration. Embodiments may further include dynamically modifying the optimal resources configuration in response to at least one change in at least one of provisioning parameters, computing parameters and quantifying service parameters.Type: ApplicationFiled: January 30, 2010Publication date: August 4, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Hima P. Karanam, Tanveer A. Faruquie, L. Venkata Subramaniam, Mukesh K. Mohania, Girish Venkatachaliah
-
Patent number: 7978853Abstract: Techniques for protecting information in an audio file are provided. The techniques include obtaining an audio file, detecting information beating one or more segments in a speech signal, wherein the information comprises information sought for protection, encrypting the information sought for protection by scrambling the one or more segments using a scrambling filter, and selectively decrypting an amount of the encrypted information, wherein the amount of the encrypted information to be decrypted depends on user access privilege, and wherein selectively decrypting the amount of the encrypted information protects said amount of the encrypted information. Techniques are also provided for protecting information in an audio file.Type: GrantFiled: January 31, 2008Date of Patent: July 12, 2011Assignee: International Business Machines CorporationInventors: Raghuram Krishnapuram, Nandita J. Mahajan, L. Venkat Subramaniam, Vivek Tyagi, Tanveer A. Faruquie
-
Patent number: 7974411Abstract: Techniques for protecting information in an audio file are provided. The techniques include obtaining an audio file, detecting information bearing one or more segments in a speech signal, wherein the information comprises information sought for protection, encrypting the information sought for protection by scrambling the one or more segments using a scrambling filter, and selectively decrypting an amount of the encrypted information, wherein the amount of the encrypted information to be decrypted depends on user access privilege, and wherein selectively decrypting the amount of the encrypted information protects said amount of the encrypted information. Techniques are also provided for protecting information in an audio file.Type: GrantFiled: January 31, 2008Date of Patent: July 5, 2011Assignee: International Business Machines CorporationInventors: Raghuram Krishnapuram, Nandita J. Mahajan, L. Venkat Subramaniam, Vivek Tyagi, Tanveer A. Faruquie
-
Patent number: 7904399Abstract: A method for determining a decision point in real-time for a data stream from a conversation includes receiving streaming conversational data; and determining when to classify the streaming conversational data, using a measure of certainty, by performing certainty calculations at a plurality of time instances during the conversation and by selecting a decision point in response to the certainty calculations, the decision point not being based on a fixed window of conversational data but being based on accumulated conversational data available at different ones of the plurality of time instances. Systems and computer program products are also provided.Type: GrantFiled: November 15, 2007Date of Patent: March 8, 2011Assignee: International Business Machines CorporationInventors: L. Venkata Subramaniam, Ganesh Ramakrishnan, Tanveer A Faruquie
-
Publication number: 20100332424Abstract: Techniques for identifying one or more inconsistencies between an unstructured document and a back-end fact-base are provided. The techniques include automatically parsing a query document and comparing the document with a back-end fact-base comprising facts relevant to the document, identifying one or more inconsistencies between information mentioned in the document and the facts stored in the back-end fact-base, and providing a response to the query document, wherein the response additionally includes the one or more identified inconsistencies.Type: ApplicationFiled: June 30, 2009Publication date: December 30, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Indrajit Bhattacharya, Tanveer A. Faruquie, Shantanu Godbole, Mukesh K. Mohania, Ullas B. Nambiar