Patents by Inventor Sauraj Goswami

Sauraj Goswami has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9355171
    Abstract: Documents likely to be near-duplicates are clustered based on document vectors that represent word-occurrence patterns in a relatively low-dimensional space. Edit distance between documents is defined based on comparing their document vectors. In one process, initial clusters are formed by applying a first edit-distance constraint relative to a root document of each cluster. The initial clusters can be merged subject to a second edit-distance constraint that limits the maximum edit distance between any two documents in the cluster. The second edit-distance constraint can be defined such that whether it is satisfied can be determined by comparing cluster structures rather than individual documents.
    Type: Grant
    Filed: August 27, 2010
    Date of Patent: May 31, 2016
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Joy Thomas, Sauraj Goswami, Vamsi Salaka
  • Patent number: 8938384
    Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.
    Type: Grant
    Filed: July 16, 2012
    Date of Patent: January 20, 2015
    Assignee: Stratify, Inc.
    Inventor: Sauraj Goswami
  • Publication number: 20130191111
    Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.
    Type: Application
    Filed: July 16, 2012
    Publication date: July 25, 2013
    Applicant: Stratify, Inc.
    Inventor: Sauraj GOSWAMI
  • Patent number: 8224641
    Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.
    Type: Grant
    Filed: November 19, 2008
    Date of Patent: July 17, 2012
    Assignee: Stratify, Inc.
    Inventor: Sauraj Goswami
  • Patent number: 8224642
    Abstract: An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.
    Type: Grant
    Filed: November 20, 2008
    Date of Patent: July 17, 2012
    Assignee: Stratify, Inc.
    Inventor: Sauraj Goswami
  • Patent number: 8086597
    Abstract: A query of at least one mark-up language document has a path expression comprising a conjunction, a first filter and a second filter. The first filter has a first probe. The second filter has a second probe. The first and second filters form a between filter having start and stop values specified by the first and second probes. A plan to process the query is generated based on, at least in part, a range defined by the start and stop values. An index of mark-up language documents is defined by another path expression; the index comprises values of mark-up language documents that satisfy the other path expression; the values are key values of the index. The plan is to perform a single scan of the key values from the start value to the stop value to identify at least one key value that satisfies the between filter.
    Type: Grant
    Filed: June 28, 2007
    Date of Patent: December 27, 2011
    Assignee: International Business Machines Corporation
    Inventors: Andrey Balmin, Sauraj Goswami
  • Patent number: 7970756
    Abstract: A system for executing a query on data that has been partitioned into a plurality of partitions is provided. The system includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The system further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruning decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.
    Type: Grant
    Filed: November 10, 2008
    Date of Patent: June 28, 2011
    Assignee: International Business Machines Corporation
    Inventors: Thomas Abel Beavin, Sauraj Goswami, Terence Patrick Purcell
  • Publication number: 20110087668
    Abstract: Documents likely to be near-duplicates are clustered based on document vectors that represent word-occurrence patterns in a relatively low-dimensional space. Edit distance between documents is defined based on comparing their document vectors. In one process, initial clusters are formed by applying a first edit-distance constraint relative to a root document of each cluster. The initial clusters can be merged subject to a second edit-distance constraint that limits the maximum edit distance between any two documents in the cluster. The second edit-distance constraint can be defined such that whether it is satisfied can be determined by comparing cluster structures rather than individual documents.
    Type: Application
    Filed: August 27, 2010
    Publication date: April 14, 2011
    Applicant: Stratify, Inc.
    Inventors: Joy Thomas, Sauraj Goswami, Vamsi Salaka
  • Patent number: 7895189
    Abstract: Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that generate an index plan that produces a superset of data comprising the query result. In some embodiments, a computer-implemented method, computer program product, and data processing system produce a maximal-index-satisfiable query tree.
    Type: Grant
    Filed: June 28, 2007
    Date of Patent: February 22, 2011
    Assignee: International Business Machines Corporation
    Inventors: Andrey Balmin, Sauraj Goswami
  • Patent number: 7840774
    Abstract: Various embodiments of a computer-implemented method, system and computer program product maintain a logical page having a predetermined size. Data is added to an uncompressed area of the logical page. The uncompressed area of the logical page is associated with an uncompressed area of a physical page. The logical page also has a compressed area associated with a compressed area of a physical page. In response to exhausting the uncompressed area, data in the uncompressed area is included in the compressed area. The uncompressed area is adjusted.
    Type: Grant
    Filed: September 9, 2005
    Date of Patent: November 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey Allen Berger, You-Chin Fuh, Sauraj Goswami, Balakrishna Raghavendra Iyer, Michael R. Shadduck, James Zu-Chia Teng, Stephen Walter Turnbaugh
  • Patent number: 7783855
    Abstract: Various embodiments of a computer-implemented method, system and computer program product are provided. A first plurality of key entries of a first index page are compressed in accordance with an order specified by a first keymap of the first index page. The first keymap also indicates respective positions of the key entries of the first plurality of key entries. A second keymap is generated indicating the order and also indicating respective post-compression positions of the key entries of the first plurality of key entries. The compressed first plurality of key entries is stored on a second index page with the second keymap.
    Type: Grant
    Filed: December 22, 2006
    Date of Patent: August 24, 2010
    Assignee: International Business Machines Corporation
    Inventors: Sauraj Goswami, You-Chin Fuh, Michael R. Shadduck, James Zu-Chia Teng
  • Publication number: 20100125448
    Abstract: An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.
    Type: Application
    Filed: November 20, 2008
    Publication date: May 20, 2010
    Applicant: Stratify, Inc.
    Inventor: Sauraj Goswami
  • Publication number: 20100125447
    Abstract: Multiple nonoverlapping languages within a single document can be identified. In one embodiment, for each of a set of candidate languages, a set of non-overlapping languages is defined. The document is analyzed under the hypothesis that the whole document is in one language and that part of the document is in one language while the rest is in a different, non-overlapping language. Language(s) of the document are identified based on comparing these competing hypotheses across a number of language pairs. In another embodiment, transitions between non-overlapping character sets are used to segment a document, and each segment is scored separately for a subset of candidate languages. Language(s) of the document are identified based on the segment scores.
    Type: Application
    Filed: November 19, 2008
    Publication date: May 20, 2010
    Applicant: Stratify, Inc.
    Inventor: Sauraj Goswami
  • Patent number: 7650352
    Abstract: A partial index availability system places, in a restricted state, all pages in the index associated with a structure modification, when an error occurs in processing a log of the said structure modification. This maintains traversability of the rest of the index that is not in restricted state. The system locates and marks a left sentinel and a right sentinel associated with a non-leaf page that is in a restricted state preventing an undo of a transaction. The sentinels prevent a transaction from accessing an uncommitted change associated with the non-leaf page. After a recovery procedure is run the entire index is made available. During the period between the placement of the index pages in LPL or rebuild pending to the time of final removal of these pages from their restrictive states as a result of a recovery procedure being run, the users are given access to the non-restricted portion of the index.
    Type: Grant
    Filed: March 23, 2006
    Date of Patent: January 19, 2010
    Assignee: International Business Machines Corporation
    Inventors: You-Chin Fuh, Sauraj Goswami, Jeffrey William Josten, James Zu-Chia Teng
  • Publication number: 20090070303
    Abstract: A system for executing a query on data that has been partitioned into a plurality of partitions is provided. The system includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The system further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruning decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.
    Type: Application
    Filed: November 10, 2008
    Publication date: March 12, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: THOMAS ABEL BEAVIN, SAURAJ GOSWAMI, TERENCE PATRICK PURCELL
  • Publication number: 20090006447
    Abstract: Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that identify a range filter in a mark-up language query. In response to receiving a query of at least one mark-up language document, the query comprising a plurality of singleton filters, at least one group of the plurality of singleton filters are identified. Each group of comprises at least two singleton filters, wherein each group is equivalent to a range filter having a start value and a stop value. The start value and stop value are based on at least two singleton filters of each group. A query plan is generated to process the query based on, at least in part, a range defined by the start value and the stop value of the at least two singleton filters of each group.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Andrey Balmin, Sauraj Goswami
  • Publication number: 20090006314
    Abstract: Various embodiments of a computer-implemented method, computer program product, and data processing system are provided that generate an index plan that produces a superset of data comprising the query result. In some embodiments, a computer-implemented method, computer program product, and data processing system produce a maximal-index-satisfiable query tree.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Andrey Balmin, Sauraj Goswami
  • Patent number: 7461060
    Abstract: Methods for executing a query on data that has been partitioned into a plurality of partitions are provided. The method includes providing partitioned data including one or more columns and the plurality of partitions. The partitioned data includes a limit key value associated with each column for a given partition. The method further includes receiving a query including a predicate on one of the one or more columns of the partitioned data; and utilizing the predicate on the one of the one or more columns in a pruni.ng decision on at least one of the one or more partitions based on the limit key values associated with the plurality of partitions.
    Type: Grant
    Filed: October 4, 2005
    Date of Patent: December 2, 2008
    Assignee: International Business Machines Corporation
    Inventors: Thomas Abel Beavin, Sauraj Goswami, Terence Patrick Purcell
  • Publication number: 20080294604
    Abstract: A method for estimating a selectivity of a join predicate in an XQuery expression is provided. The method provides for determining a first sequence size of a first sequence in the join predicate, determining a second sequence size of a second sequence in the join predicate, determining a type of comparison operator used between the first sequence and the second sequence, estimating the selectivity of the join predicate based on the first sequence size, the second sequence size, and the type of comparison operator used, selecting an execution plan for the XQuery expression based on the selectivity of the join predicate estimated, and executing the XQuery expression using the execution plan selected.
    Type: Application
    Filed: May 25, 2007
    Publication date: November 27, 2008
    Applicant: International Business Machines
    Inventor: Sauraj GOSWAMI
  • Publication number: 20070226235
    Abstract: A partial index availability system places, in a restricted state, all pages in the index associated with a structure modification, when an error occurs in processing a log of the said structure modification. This maintains traversability of the rest of the index that is not in restricted state. The system locates and marks a left sentinel and a right sentinel associated with a non-leaf page that is in a restricted state preventing an undo of a transaction. The sentinels prevent a transaction from accessing an uncommitted change associated with the non-leaf page. After a recovery procedure is run the entire index is made available. During the period between the placement of the index pages in LPL or rebuild pending to the time of final removal of these pages from their restrictive states as a result of a recovery procedure being run, the users are given access to the non-restricted portion of the index.
    Type: Application
    Filed: March 23, 2006
    Publication date: September 27, 2007
    Applicant: International Business Machines Corporation
    Inventors: You-Chin Fuh, Sauraj Goswami, Jeffrey Josten, James Teng