Patents by Inventor Eugene J. Shekita

Eugene J. Shekita has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20140006380
    Abstract: Embodiments of the present invention provide a database processing system for efficient partitioning of a database table with column-major layout for executing one or more join operations. One embodiment comprises a method for partitioning a database table with column-major layout, partitioning only the join-columns by limiting the partitions by size and number, executing one or more join operations for joining the partitioned columns, and optionally de-partitioning the join result to the original order by sequentially writing and randomly reading table values using P cursors.
    Type: Application
    Filed: August 24, 2012
    Publication date: January 2, 2014
    Applicant: International Business Machines Corporation
    Inventors: Stefan ARNDT, Gopi K. Attaluri, Ronald J. Barber, Guy M. Lohman, Lin Qiao, Vijayshankar Raman, Eugene J. Shekita, Richard S. Sidle
  • Publication number: 20140006379
    Abstract: Embodiments of the present invention provide a database processing system for efficient partitioning of a database table with column-major layout for executing one or more join operations. One embodiment comprises a method for partitioning a database table with column-major layout, partitioning only the join-columns by limiting the partitions by size and number, executing one or more join operations for joining the partitioned columns, and optionally de-partitioning the join result to the original order by sequentially writing and randomly reading table values using P cursors.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Stefan Arndt, Gopi K. Attaluri, Ronald J. Barber, Guy M. Lohman, Lin Qiao, Vijayshankar Raman, Eugene J. Shekita, Richard S. Sidle
  • Publication number: 20130325901
    Abstract: A method for storing database information, including: storing a table having data values in a column major order, wherein the data values are stored in a list of blocks, assigning a tuple sequence number (TSN) to each data value in each column of the table according to a sequence order in the table, wherein data values that correspond to each other across a plurality of columns of the table have equivalent TSNs; assigning each data value to a partition based on a representation of the data value; and assigning a tuple map value to each data value, wherein the tuple map value identifies the partition in which each data value is located.
    Type: Application
    Filed: August 30, 2012
    Publication date: December 5, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ronald J. Barber, Min-Soo Kim, Sam S. Lightstone, Guy M. Lohman, Lin Qiao, Vijayshankar Raman, Eugene J. Shekita, Richard S. Sidle
  • Publication number: 20130325900
    Abstract: A method for storing database information includes storing a table having data values in a column major order. The data values are stored in a list of blocks. The method also includes assigning a tuple sequence number (TSN) to each data value in each column of the table according to a sequence order in the table. The data values that correspond to each other across a plurality of columns of the table have equivalent TSNs. The method also includes assigning each data value to a partition based on a representation of the data value. The method also includes assigning a tuple map value to each data value. The tuple map value identifies the partition in which each data value is located.
    Type: Application
    Filed: May 31, 2012
    Publication date: December 5, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ronald J. Barber, Min-Soo Kim, Sam S. Lightstone, Guy M. Lohman, Lin Qiao, Vijayshankar Raman, Eugene J. Shekita, Richard S. Sidle
  • Publication number: 20120323551
    Abstract: Systems and associated methods for highly parallel processing of parameterized simulations are described. Embodiments permit processing of stochastic data-intensive simulations in a highly parallel fashion in order to distribute the intensive workload. Embodiments utilize methods of seeding records in a database with a source of pseudo-random numbers, such as a compressed seed for a pseudo-random number generator, such that seeded records may be processed independently in a highly parallel fashion. Thus, embodiments provide systems and associated methods facilitating quicker data-intensive simulation by enabling highly parallel asynchronous simulations.
    Type: Application
    Filed: August 27, 2012
    Publication date: December 20, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kevin S. Beyer, Vuk Ercegovac, Peter Haas, Eugene J. Shekita, Fei Xu
  • Publication number: 20120323919
    Abstract: Embodiments of the invention relate to building a distributed reverse semantic index. In one general embodiment a plurality of documents are received with each document having at least one defined rule and or semantic. The documents are distributed among a plurality of nodes of a system. The documents are processed in a generally parallel fashion. Processing the documents includes processing text data of each of the document and breaking each document into fields to index the text data to create index data by deferring how to categorize the text data based upon the defined rule and or semantics. The indexed data is combined back together to create an indexer-agnostic semantic index including a plurality of the semantic index shards and to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
    Type: Application
    Filed: August 27, 2012
    Publication date: December 20, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Eugene J. Shekita, Asim V. Singh, Yuanyuan Tian, Kevin B. Wang
  • Publication number: 20120271795
    Abstract: A method for updating a scalable row-store, including: receiving an update to a key within a range of keys in a database table, wherein the database table is distributed across nodes in a cluster of computing devices; and replicating the update over a group of the nodes using a consensus-based replication algorithm, wherein the replication algorithm includes completing the update in response to receiving acknowledgement messages from a majority of the nodes in the group indicating that the majority has received notification of the update.
    Type: Application
    Filed: April 21, 2011
    Publication date: October 25, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jun Rao, Eugene J. Shekita, Sandeep Tata
  • Patent number: 8296304
    Abstract: Disclosed is a method, system, and program for handling redirects in documents. At least one equivalence class that includes documents that are connected through a redirect. Cycles for each equivalence class are detected, wherein documents in a cycle are marked so that they are not indexed. Incomplete chains for each equivalence class are detected, wherein documents in an incomplete chain are marked so that they are not indexed. A representative for each equivalence class is selected.
    Type: Grant
    Filed: January 26, 2004
    Date of Patent: October 23, 2012
    Assignee: International Business Machines Corporation
    Inventors: Marcus F. Fontoura, Andreas Neumann, Runping Qi, Eugene J. Shekita
  • Publication number: 20120254089
    Abstract: Embodiments of the invention relate to building a distributed reverse semantic index. In one general embodiment a plurality of documents are received with each document having at least one defined rule and or semantic. The documents are distributed among a plurality of nodes of a system. The documents are processed in a generally parallel fashion. Processing the documents includes processing text data of each of the document and breaking each document into fields to index the text data to create index data by deferring how to categorize the text data based upon the defined rule and or semantics. The indexed data is combined back together to create an indexer-agnostic semantic index including a plurality of the semantic index shards and to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
    Type: Application
    Filed: March 31, 2011
    Publication date: October 4, 2012
    Applicant: International Business Machines Corporation
    Inventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Eugene J. Shekita, Asim V. Singh, Yuanyuan Tian, Kevin B. Wang
  • Patent number: 8260784
    Abstract: Disclosed is a method of encoding JavaScript Object Notation (JSON) documents in an inverted index, wherein a tree representation of a JSON document is first generated, and, next, the JSON document is shredded into a list of <value, path, type, jdewey> tuples for each atom node, n, in the tree, where value is a label associated with n, path is a concatenation of node labels associated with ancestors of n, type is a description of a type of value, and jdewey of n is a partial Dewey code of its closest ancestor array node, if one exists, or empty, otherwise. Lastly, an inverted index is built using <path, type, value> as index term, and jdewey as payload. A method is also described to search the inverted index.
    Type: Grant
    Filed: February 13, 2009
    Date of Patent: September 4, 2012
    Assignee: International Business Machines Corporation
    Inventors: Kevin Scott Beyer, Jun Rao, Eugene J Shekita
  • Publication number: 20120059823
    Abstract: Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.
    Type: Application
    Filed: September 3, 2010
    Publication date: March 8, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ronald J. Barber, Harish Deshmukh, Ning Li, Bruce G. Lindsay, Sridhar Rajagopalan, Roger C. Raphael, Eugene J. Shekita, Paul S. Taylor
  • Publication number: 20110320184
    Abstract: Systems and associated methods for highly parallel processing of parameterized simulations are described. Embodiments permit processing of stochastic data-intensive simulations in a highly parallel fashion in order to distribute the intensive workload. Embodiments utilize methods of seeding records in a database with a source of pseudo-random numbers, such as a compressed seed for a pseudo-random number generator, such that seeded records may be processed independently in a highly parallel fashion. Thus, embodiments provide systems and associated methods facilitating quicker data-intensive simulation by enabling highly parallel asynchronous simulations.
    Type: Application
    Filed: June 29, 2010
    Publication date: December 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kevin S. Beyer, Vuk Ercegovac, Peter Haas, Eugene J. Shekita, Fei Xu
  • Patent number: 8032532
    Abstract: A method and system for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.
    Type: Grant
    Filed: May 21, 2008
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Andrei Z. Broder, Nadav Eiron, Felipe Marcus Fontoura, Ronny Lempel, Ning Li, John Ai McPherson, Jr., Andreas Neumann, Shila Ofek-Koifman, Runping Qi, Eugene J. Shekita
  • Patent number: 7991771
    Abstract: Disclosed is an evaluation technique for text search with black-box scoring functions, where it is unnecessary for the evaluation engine to maintain details of the scoring function. Included is a description of a system for dealing with blackbox searching, proofs of correctness, as well experimental evidence showing that the performance of the technique is comparable in efficiency to those techniques used in custom-built engines.
    Type: Grant
    Filed: November 21, 2006
    Date of Patent: August 2, 2011
    Assignee: International Business Machines Corporation
    Inventors: Kevin Scott Beyer, Robert W. Lyle, Sridhar Rajagopalan, Eugene J. Shekita
  • Publication number: 20110184933
    Abstract: According to one embodiment of the present invention, a method for processing join predicates in full-text indexes is provided. The method includes evaluating local predicates of an outer full text index to generate a first posting list of documents. For each document in the first posting list, the value of a join attribute is determined and an inner full text index is probed to obtain a second posting list of documents containing one of the join attributes determined for each document. Local predicates of an inner full text index are evaluated to generate a third posting list of documents, and the second posting list is merged with the third posting list to generate a merge list of documents. Documents in the first posting list may be paired up with documents in the merge list.
    Type: Application
    Filed: January 28, 2010
    Publication date: July 28, 2011
    Applicant: International Business Machines Corporation
    Inventors: Latha Sankar Colby, Quanzhong Li, Fatma Ozcan, Mir Hamid Pirahesh, Eugene J. Shekita, Zografoula Vagena
  • Publication number: 20110113010
    Abstract: Systems, methods and articles of manufacture are disclosed for synchronizing a primary data system with an auxiliary data system that processes data for the primary data system. In one embodiment, how current the primary data system and the auxiliary data system are may be determined. Requests sent from the primary data system that were not processed by the auxiliary data system may be determined. The requests may be resent to the auxiliary data system for processing.
    Type: Application
    Filed: November 11, 2009
    Publication date: May 12, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ronald J. Barber, Harish Deshmukh, Ning Li, Bruce G. Lindsay, Sridhar Rajagopalan, Roger C. Raphael, Eugene J. Shekita
  • Patent number: 7783626
    Abstract: Provided is a technique for building an index. A new indexi+1 is built and an anchor text tablei+1 and a duplicates tablei+1 are output using a storei, a delta store, and previously generated global analysis computationsi, wherein the previously generated global analysis computationsi include an anchor text tablei, a rank tablei, and a duplicates tablei. New global analysis computationsi+1 are generated using the anchor text tablei+1, the duplicates tablei+1, and the previously generated global analysis computationsi.
    Type: Grant
    Filed: August 17, 2007
    Date of Patent: August 24, 2010
    Assignee: International Business Machines Corporation
    Inventors: Marcus Felipe Fontoura, Reiner Kraft, Tony Kai-Chi Leung, John A. McPherson, Jr., Andreas Neumann, Runping Qi, Sridhar Rajagopalan, Eugene J. Shekita, Jason Yeong Zien
  • Publication number: 20100211572
    Abstract: Disclosed is a method of encoding JavaScript Object Notation (JSON) documents in an inverted index, wherein a tree representation of a JSON document is first generated, and, next, the JSON document is shredded into a list of <value, path, type, jdewey> tuples for each atom node, n, in the tree, where value is a label associated with n, path is a concatenation of node labels associated with ancestors of n, type is a description of a type of value, and jdewey of n is a partial Dewey code of its closest ancestor array node, if one exists, or empty, otherwise. Lastly, an inverted index is built using <path, type, value> as index term, and jdewey as payload. A method is also described to search the inverted index.
    Type: Application
    Filed: February 13, 2009
    Publication date: August 19, 2010
    Applicant: International Business Machines Corporation
    Inventors: Kevin Scott Beyer, Jun Rao, Eugene J. Shekita
  • Patent number: 7743060
    Abstract: Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.
    Type: Grant
    Filed: August 6, 2007
    Date of Patent: June 22, 2010
    Assignee: International Business Machines Corporation
    Inventors: Marcus Felipe Fontoura, Andreas Neumann, Sridhar Rajagopalan, Eugene J. Shekita, Jason Yeong Zien
  • Patent number: 7685138
    Abstract: A system, method, and computer program product to improve XML query processing efficiency with virtual cursors. Structural joins are a fundamental operation in XML query processing, and substantial work exists on index-based algorithms for executing them. Two well-known index features—path indices and ancestor information—are combined in a novel way to replace at least some of the physical index cursors in a structural join with virtual cursors. The position of a virtual cursor is derived from the path and ancestor information of a physical cursor. Virtual cursors can be easily incorporated into existing structural join algorithms. By eliminating index I/O and the processing cost of handling physical inverted lists, virtual cursors can improve the performance of holistic path queries by an order of magnitude or more.
    Type: Grant
    Filed: November 8, 2005
    Date of Patent: March 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Kevin S. Beyer, Marcus Felipe Fontoura, Sridhar Rajagopalan, Eugene J. Shekita, Beverly Yang