Patents by Inventor Sauraj Goswami
Sauraj Goswami has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9355171Abstract: Documents likely to be near-duplicates are clustered based on document vectors that represent word-occurrence patterns in a relatively low-dimensional space. Edit distance between documents is defined based on comparing their document vectors. In one process, initial clusters are formed by applying a first edit-distance constraint relative to a root document of each cluster. The initial clusters can be merged subject to a second edit-distance constraint that limits the maximum edit distance between any two documents in the cluster. The second edit-distance constraint can be defined such that whether it is satisfied can be determined by comparing cluster structures rather than individual documents.Type: GrantFiled: August 27, 2010Date of Patent: May 31, 2016Assignee: Hewlett Packard Enterprise Development LPInventors: Joy Thomas, Sauraj Goswami, Vamsi Salaka
-
Patent number: 8938384Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.Type: GrantFiled: July 16, 2012Date of Patent: January 20, 2015Assignee: Stratify, Inc.Inventor: Sauraj Goswami
-
Publication number: 20130191111Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.Type: ApplicationFiled: July 16, 2012Publication date: July 25, 2013Applicant: Stratify, Inc.Inventor: Sauraj GOSWAMI
-
Patent number: 8224641Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.Type: GrantFiled: November 19, 2008Date of Patent: July 17, 2012Assignee: Stratify, Inc.Inventor: Sauraj Goswami
-
Patent number: 8224642Abstract: An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.Type: GrantFiled: November 20, 2008Date of Patent: July 17, 2012Assignee: Stratify, Inc.Inventor: Sauraj Goswami
-
Patent number: 8086597Abstract: A query of at least one mark-up language document has a path expression comprising a conjunction, a first filter and a second filter. The first filter has a first probe. The second filter has a second probe. The first and second filters form a between filter having start and stop values specified by the first and second probes. A plan to process the query is generated based on, at least in part, a range defined by the start and stop values. An index of mark-up language documents is defined by another path expression; the index comprises values of mark-up language documents that satisfy the other path expression; the values are key values of the index. The plan is to perform a single scan of the key values from the start value to the stop value to identify at least one key value that satisfies the between filter.Type: GrantFiled: June 28, 2007Date of Patent: December 27, 2011Assignee: International Business Machines CorporationInventors: Andrey Balmin, Sauraj Goswami
-
Patent number: 7970756Abstract: A system for executing a query on data that has been partitioned into a plurality of partitions is provided. The system includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The system further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruning decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.Type: GrantFiled: November 10, 2008Date of Patent: June 28, 2011Assignee: International Business Machines CorporationInventors: Thomas Abel Beavin, Sauraj Goswami, Terence Patrick Purcell
-
Publication number: 20110087668Abstract: Documents likely to be near-duplicates are clustered based on document vectors that represent word-occurrence patterns in a relatively low-dimensional space. Edit distance between documents is defined based on comparing their document vectors. In one process, initial clusters are formed by applying a first edit-distance constraint relative to a root document of each cluster. The initial clusters can be merged subject to a second edit-distance constraint that limits the maximum edit distance between any two documents in the cluster. The second edit-distance constraint can be defined such that whether it is satisfied can be determined by comparing cluster structures rather than individual documents.Type: ApplicationFiled: August 27, 2010Publication date: April 14, 2011Applicant: Stratify, Inc.Inventors: Joy Thomas, Sauraj Goswami, Vamsi Salaka
-
Patent number: 7895189Abstract: Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that generate an index plan that produces a superset of data comprising the query result. In some embodiments, a computer-implemented method, computer program product, and data processing system produce a maximal-index-satisfiable query tree.Type: GrantFiled: June 28, 2007Date of Patent: February 22, 2011Assignee: International Business Machines CorporationInventors: Andrey Balmin, Sauraj Goswami
-
Patent number: 7840774Abstract: Various embodiments of a computer-implemented method, system and computer program product maintain a logical page having a predetermined size. Data is added to an uncompressed area of the logical page. The uncompressed area of the logical page is associated with an uncompressed area of a physical page. The logical page also has a compressed area associated with a compressed area of a physical page. In response to exhausting the uncompressed area, data in the uncompressed area is included in the compressed area. The uncompressed area is adjusted.Type: GrantFiled: September 9, 2005Date of Patent: November 23, 2010Assignee: International Business Machines CorporationInventors: Jeffrey Allen Berger, You-Chin Fuh, Sauraj Goswami, Balakrishna Raghavendra Iyer, Michael R. Shadduck, James Zu-Chia Teng, Stephen Walter Turnbaugh
-
Patent number: 7783855Abstract: Various embodiments of a computer-implemented method, system and computer program product are provided. A first plurality of key entries of a first index page are compressed in accordance with an order specified by a first keymap of the first index page. The first keymap also indicates respective positions of the key entries of the first plurality of key entries. A second keymap is generated indicating the order and also indicating respective post-compression positions of the key entries of the first plurality of key entries. The compressed first plurality of key entries is stored on a second index page with the second keymap.Type: GrantFiled: December 22, 2006Date of Patent: August 24, 2010Assignee: International Business Machines CorporationInventors: Sauraj Goswami, You-Chin Fuh, Michael R. Shadduck, James Zu-Chia Teng
-
Publication number: 20100125448Abstract: An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.Type: ApplicationFiled: November 20, 2008Publication date: May 20, 2010Applicant: Stratify, Inc.Inventor: Sauraj Goswami
-
Publication number: 20100125447Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.Type: ApplicationFiled: November 19, 2008Publication date: May 20, 2010Applicant: Stratify, Inc.Inventor: Sauraj Goswami
-
Patent number: 7650352Abstract: A partial index availability system places, in a restricted state, all pages in the index associated with a structure modification, when an error occurs in processing a log of the said structure modification. This maintains traversability of the rest of the index that is not in restricted state. The system locates and marks a left sentinel and a right sentinel associated with a non-leaf page that is in a restricted state preventing an undo of a transaction. The sentinels prevent a transaction from accessing an uncommitted change associated with the non-leaf page. After a recovery procedure is run the entire index is made available. During the period between the placement of the index pages in LPL or rebuild pending to the time of final removal of these pages from their restrictive states as a result of a recovery procedure being run, the users are given access to the non-restricted portion of the index.Type: GrantFiled: March 23, 2006Date of Patent: January 19, 2010Assignee: International Business Machines CorporationInventors: You-Chin Fuh, Sauraj Goswami, Jeffrey William Josten, James Zu-Chia Teng
-
Publication number: 20090070303Abstract: A system for executing a query on data that has been partitioned into a plurality of partitions is provided. The system includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The system further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruning decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.Type: ApplicationFiled: November 10, 2008Publication date: March 12, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: THOMAS ABEL BEAVIN, SAURAJ GOSWAMI, TERENCE PATRICK PURCELL
-
Publication number: 20090006447Abstract: Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that identify a range filter in a mark-up language query. In response to receiving a query of at least one mark-up language document, the query comprising a plurality of singleton filters, at least one group of the plurality of singleton filters are identified. Each group of comprises at least two singleton filters, wherein each group is equivalent to a range filter having a start value and a stop value. The start value and stop value are based on at least two singleton filters of each group. A query plan is generated to process the query based on, at least in part, a range defined by the start value and the stop value of the at least two singleton filters of each group.Type: ApplicationFiled: June 28, 2007Publication date: January 1, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Andrey Balmin, Sauraj Goswami
-
Publication number: 20090006314Abstract: Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that generate an index plan that produces a superset of data comprising the query result. In some embodiments, a computer-implemented method, computer program product, and data processing system produce a maximal-index-satisfiable query tree.Type: ApplicationFiled: June 28, 2007Publication date: January 1, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Andrey Balmin, Sauraj Goswami
-
Patent number: 7461060Abstract: Methods for executing a query on data that has been partitioned into a plurality of partitions are provided. The method includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The method further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruni.ng decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.Type: GrantFiled: October 4, 2005Date of Patent: December 2, 2008Assignee: International Business Machines CorporationInventors: Thomas Abel Beavin, Sauraj Goswami, Terence Patrick Purcell
-
Publication number: 20080294604Abstract: A method for estimating a selectivity of a join predicate in an XQuery expression is provided. The method provides for determining a first sequence size of a first sequence in the join predicate, determining a second sequence size of a second sequence in the join predicate, determining a type of comparison operator used between the first sequence and the second sequence, estimating the selectivity of the join predicate based on the first sequence size, the second sequence size, and the type of comparison operator used, selecting an execution plan for the XQuery expression based on the selectivity of the join predicate estimated, and executing the XQuery expression using the execution plan selected.Type: ApplicationFiled: May 25, 2007Publication date: November 27, 2008Applicant: International Business MachinesInventor: Sauraj GOSWAMI
-
Publication number: 20070226235Abstract: A partial index availability system places, in a restricted state, all pages in the index associated with a structure modification, when an error occurs in processing a log of the said structure modification. This maintains traversability of the rest of the index that is not in restricted state. The system locates and marks a left sentinel and a right sentinel associated with a non-leaf page that is in a restricted state preventing an undo of a transaction. The sentinels prevent a transaction from accessing an uncommitted change associated with the non-leaf page. After a recovery procedure is run the entire index is made available. During the period between the placement of the index pages in LPL or rebuild pending to the time of final removal of these pages from their restrictive states as a result of a recovery procedure being run, the users are given access to the non-restricted portion of the index.Type: ApplicationFiled: March 23, 2006Publication date: September 27, 2007Applicant: International Business Machines CorporationInventors: You-Chin Fuh, Sauraj Goswami, Jeffrey Josten, James Teng