Patents by Inventor Vuk Ercegovac
Vuk Ercegovac has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20150363466Abstract: In one embodiment, a computer-implemented method includes selecting one or more sub-expressions of a query during compile time. One or more pilot runs are performed by one or more computer processors. The one or more pilot runs include a pilot run associated with each of one or more of the selected sub-expressions, and each pilot run includes at least partial execution of the associated selected sub-expression. The pilot runs are performed during execution time. Statistics are collected on the one or more pilot runs during performance of the one or more pilot runs. The query is optimized based at least in part on the statistics collected during the one or more pilot runs, where the optimization includes basing cardinality and cost estimates on the statistics collected during the pilot runs.Type: ApplicationFiled: June 11, 2014Publication date: December 17, 2015Inventors: Andrey Balmin, Vuk Ercegovac, Jesse E. Jackson, Konstantinos Karanasos, Marcel Kutsch, Fatma Ozcan, Chunyang Xia
-
Patent number: 9146983Abstract: A method for creating a semantically aggregated index in an indexer-agnostic index building system includes: extracting documents from a data source, each document including a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.Type: GrantFiled: August 24, 2012Date of Patent: September 29, 2015Assignee: International Business Machines CorporationInventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
-
Patent number: 9104749Abstract: A computer program product for an indexer-agnostic index building system includes a computer readable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations for creating a semantically aggregated index. The operations include: extracting documents from a data source, wherein each document includes a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.Type: GrantFiled: January 12, 2011Date of Patent: August 11, 2015Assignee: International Business Machines CorporationInventors: Alfredo Alba, Chad E DeLuca, Vuk Ercegovac, Thomas D Griffin, Jun Rao, Asim V Singh, Kevin B Wang
-
Publication number: 20150186493Abstract: Stratified sampling of a plurality of records is performed. A plurality of records are partitioned into a plurality of splits, wherein each split includes at least a portion of the plurality of records. The split of the plurality of splits is provided to a mapper. The mapper assigns at least a portion the records of the at least one split to a group based on a strata of the assigned records, and filters the records of the group based on a comparison of the weights of the records to a local threshold of the mapper. The mapper updates the local threshold of the mapper by communicating with a coordinator. The mapper shuffles the group to a reducer, where the reducer filters the records of the group based on the weights of the records. The reducer provides a stratified sampling of the plurality of records based on the group.Type: ApplicationFiled: December 27, 2013Publication date: July 2, 2015Applicant: International Business Machines CorporationInventors: Andrey Balmin, Vuk Ercegovac, Peter J. Haas, Liping Peng, John Sismanis
-
Patent number: 8954967Abstract: Described herein are methods, systems, apparatuses and products for adaptive parallel data processing. An aspect provides providing a map phase in which at least one map function is applied in parallel on different partitions of input data at different mappers in a parallel data processing system; providing a communication channel between mappers using a distributed meta-data store, wherein said map phase comprises mapper data processing adapted responsive to communication with said distributed meta-data store; and providing data accessible by at least one reduce phase node in which at least one reduce function is applied. Other embodiments are disclosed.Type: GrantFiled: May 31, 2011Date of Patent: February 10, 2015Assignee: International Business Machines CorporationInventors: Andrey Balmin, Kevin Scott Beyer, Vuk Ercegovac, Rares Vernica
-
Publication number: 20140281746Abstract: Embodiments relate to a method and computer program product for error handling. The method includes performing at least one query operation. The processing of query operation also includes generating error information data based at least an error encountered during performance of the query operation and generating a data result relating to any portion of the query operation successfully completed. The data result is processed together with the error information data based on encountering any errors. The data result and error information are provided together in a package but separated by an indicator to distinguish between them.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventors: Vuk Ercegovac, Carl-Christian Kanne
-
Publication number: 20140281748Abstract: An aspect of error handling includes a parsing block for pre-processing a document indexing application, a filtering block for discarding irrelevant documents, a transformation block to clean up and annotate input data by identifying a document language, and a processor configured for grouping inputs to collect documents for a same entity in a single spot. The processor processes a query operation. An aspect of error handling also includes a data package including a data result component that includes data generated based on successful completion of at least a portion of the query operation. The data package also includes an error information data component based on one or more errors encountered during processing of the query operation. An indicator separates the error information data from the data result. The system also includes a memory associated with a distributed file system for storing a final write output relating to the query operation.Type: ApplicationFiled: September 13, 2013Publication date: September 18, 2014Applicant: International Business Machines CorporationInventors: Vuk Ercegovac, Carl-Christian Kanne
-
Publication number: 20120323920Abstract: A method for creating a semantically aggregated index in an indexer-agnostic index building system includes: extracting documents from a data source, each document including a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.Type: ApplicationFiled: August 24, 2012Publication date: December 20, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
-
Publication number: 20120323551Abstract: Systems and associated methods for highly parallel processing of parameterized simulations are described. Embodiments permit processing of stochastic data-intensive simulations in a highly parallel fashion in order to distribute the intensive workload. Embodiments utilize methods of seeding records in a database with a source of pseudo-random numbers, such as a compressed seed for a pseudo-random number generator, such that seeded records may be processed independently in a highly parallel fashion. Thus, embodiments provide systems and associated methods facilitating quicker data-intensive simulation by enabling highly parallel asynchronous simulations.Type: ApplicationFiled: August 27, 2012Publication date: December 20, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kevin S. Beyer, Vuk Ercegovac, Peter Haas, Eugene J. Shekita, Fei Xu
-
Publication number: 20120323919Abstract: Embodiments of the invention relate to building a distributed reverse semantic index. In one general embodiment a plurality of documents are received with each document having at least one defined rule and or semantic. The documents are distributed among a plurality of nodes of a system. The documents are processed in a generally parallel fashion. Processing the documents includes processing text data of each of the document and breaking each document into fields to index the text data to create index data by deferring how to categorize the text data based upon the defined rule and or semantics. The indexed data is combined back together to create an indexer-agnostic semantic index including a plurality of the semantic index shards and to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.Type: ApplicationFiled: August 27, 2012Publication date: December 20, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Eugene J. Shekita, Asim V. Singh, Yuanyuan Tian, Kevin B. Wang
-
Publication number: 20120311581Abstract: Described herein are methods, systems, apparatuses and products for adaptive parallel data processing. An aspect provides providing a map phase in which at least one map function is applied in parallel on different partitions of input data at different mappers in a parallel data processing system; providing a communication channel between mappers using a distributed meta-data store, wherein said map phase comprises mapper data processing adapted responsive to communication with said distributed meta-data store; and providing data accessible by at least one reduce phase node in which at least one reduce function is applied. Other embodiments are disclosed.Type: ApplicationFiled: May 31, 2011Publication date: December 6, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Andrey Balmin, Kevin Scott Beyer, Vuk Ercegovac, Rares Vernica
-
Publication number: 20120254089Abstract: Embodiments of the invention relate to building a distributed reverse semantic index. In one general embodiment a plurality of documents are received with each document having at least one defined rule and or semantic. The documents are distributed among a plurality of nodes of a system. The documents are processed in a generally parallel fashion. Processing the documents includes processing text data of each of the document and breaking each document into fields to index the text data to create index data by deferring how to categorize the text data based upon the defined rule and or semantics. The indexed data is combined back together to create an indexer-agnostic semantic index including a plurality of the semantic index shards and to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.Type: ApplicationFiled: March 31, 2011Publication date: October 4, 2012Applicant: International Business Machines CorporationInventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Eugene J. Shekita, Asim V. Singh, Yuanyuan Tian, Kevin B. Wang
-
Publication number: 20120179684Abstract: A computer program product for an indexer-agnostic index building system includes a computer readable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations for creating a semantically aggregated index. The operations include: extracting documents from a data source, wherein each document includes a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.Type: ApplicationFiled: January 12, 2011Publication date: July 12, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
-
Publication number: 20110320184Abstract: Systems and associated methods for highly parallel processing of parameterized simulations are described. Embodiments permit processing of stochastic data-intensive simulations in a highly parallel fashion in order to distribute the intensive workload. Embodiments utilize methods of seeding records in a database with a source of pseudo-random numbers, such as a compressed seed for a pseudo-random number generator, such that seeded records may be processed independently in a highly parallel fashion. Thus, embodiments provide systems and associated methods facilitating quicker data-intensive simulation by enabling highly parallel asynchronous simulations.Type: ApplicationFiled: June 29, 2010Publication date: December 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kevin S. Beyer, Vuk Ercegovac, Peter Haas, Eugene J. Shekita, Fei Xu
-
Publication number: 20090228528Abstract: A system, method, and computer program product for updating a partitioned index of a dataset. A document is indexed by separating it into indexable sections, such that different ones of the indexable sections may be contained in different partitions of the partitioned index. The partitioned index is updated using an updated version of the document by updating only those sections of the index corresponding to sections of the document that have been updated in the updated version.Type: ApplicationFiled: March 6, 2008Publication date: September 10, 2009Applicant: International Business Machines CorporationInventors: Vuk Ercegovac, Vanja Josifovski, Ning Li, Mauricio Mediano, Eugene J. Shekita
-
Patent number: 6681222Abstract: A unified database/text retrieval system converts exact database type queries into text inclusion type queries suitable for text retrieval systems through the use of pseudo keywords. Boolean combination of the text inclusion type query elements may be readily manipulated for optimization and applied to a unified index for rapid search results. Absolute relevance values and relevance multiplier values may be added to the query elements to provide a relevance-based sorting not only of text but also of exact match type search results. Relevance values may be deduced automatically from a variety of sources.Type: GrantFiled: July 16, 2001Date of Patent: January 20, 2004Assignee: Quip IncorporatedInventors: Navin Kabra, Raghu Ramakrishnan, Uri Shaft, Vuk Ercegovac
-
Publication number: 20030014396Abstract: A unified database/text retrieval system converts exact database type queries into text inclusion type queries suitable for text retrieval systems through the use of pseudo keywords. Boolean combination of the text inclusion type query elements may be readily manipulated for optimization and applied to a unified index for rapid search results. Absolute relevance values and relevance multiplier values may be added to the query elements to provide a relevance-based sorting not only of text but also of exact match type search results. Relevance values may be deduced automatically from a variety of sources.Type: ApplicationFiled: July 16, 2001Publication date: January 16, 2003Inventors: Navin Kabra, Raghu Ramakrishnan, Uri Shaft, Vuk Ercegovac