Query Execution Plan Patents (Class 707/718)
-
Patent number: 11681709Abstract: A system and method of joining remote tables by a federated server is provided. A method includes receiving a data join request from a client device; sending a first block fetch request to a first data source based on the data join request; receiving a first set of block data from the first data source; sending a second block fetch request to a second data source based on the data join request and a bind array containing the data of join column in the first data source; receiving a second set of block data from the second data source; and sending an output to the client device in response to the data join request in the form of rows from an outer table and an inner table.Type: GrantFiled: February 10, 2021Date of Patent: June 20, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Chang Sheng Liu, Ya Qiong Liu, Lei Cui
-
Patent number: 11681561Abstract: A computer-implemented method and system for receiving, at a first computing device, an application programming interface (API) request from a second computing device, wherein the API request includes at least a first request field and a second request field, evaluating at least the first request field to determine a first complexity measure, assigning a first field score to at least the first request field based on the first complexity measure, evaluating at least the second request field to determine a second complexity measure, assigning a second field score to at least the second request field based on the second complexity measure, and combining the first field score and the second field score to generate a total field score for the API request for use in an API request complexity model for constraining a processing of the received API request from the second computing device.Type: GrantFiled: December 10, 2020Date of Patent: June 20, 2023Assignee: Shopify Inc.Inventors: Evan Jan Huus, Klass Neufeld, Scott Walkinshaw, Christopher John Butcher, Ali Kiyan Azarbar
-
Patent number: 11663185Abstract: A method of validating metadata pages that map to user data in a data storage system is provided. The method includes (a) obtaining first information stored for a first metadata page and second information stored for a second metadata page, the first and second metadata pages having a relationship to each other within a hierarchy of metadata pages for accessing user data; (b) performing a consistency check between the first information and the second information, the consistency check producing a first result in response to the relationship being verified and a second result otherwise; and (c) in response to the consistency check yielding the second result, performing a corrective action to restore consistency between the first and second information. An apparatus, system, and computer program product for performing a similar method are also provided.Type: GrantFiled: July 31, 2020Date of Patent: May 30, 2023Assignee: EMC IP Holding Company LLCInventors: Vamsi K. Vankamamidi, Philippe Armangau, Geng Han, Yousheng Liu
-
Patent number: 11663203Abstract: Optimizing database statements using a query compiler including receiving, by a query compiler from a client computing system, a state specification of a graphical user interface; compiling, by the query compiler, a database statement from the state specification, including: optimizing the database statement by repositioning, within the database statement, a limit clause such that the limit clause is processed by the database before at least one join clause; and sending, by the query compiler, the optimized database statement to a database on a cloud-based data warehouse.Type: GrantFiled: October 1, 2020Date of Patent: May 30, 2023Assignee: SIGMA COMPUTING, INC.Inventors: Max H. Seiden, Deepanshu Utkarsh
-
Patent number: 11658996Abstract: A computer implemented method to detect a data breach in a network-connected computing system, the method including storing, at a trusted secure computing device, at least a portion of network traffic communicated with the computer system; the computing device generating a copy of data distributed across a network; the computing device identifying information about the network attack stored in the copy of the data; the computing device generating a signature for the network attack based on the information about the network attack, the signature including rules for identifying the network attack in network traffic; and identifying an occurrence of the network attack in the stored network traffic based on the signature.Type: GrantFiled: December 19, 2017Date of Patent: May 23, 2023Assignee: British Telecommunications Public Limited CompanyInventor: Fadi El-Moussa
-
Patent number: 11650990Abstract: Data tables that are located in a distributed data warehouse are joined with a target join-calculating algorithm that has been selected from a number of table joining algorithms which have been compared to each other based on the execution costs of each of the number of table joining algorithms.Type: GrantFiled: February 27, 2017Date of Patent: May 16, 2023Assignee: Alibaba Group Holding LimitedInventors: Dong Xu, Weiguang Sun, Jiehong Lian, Longzhong Wang
-
Patent number: 11645305Abstract: Example resource management systems and methods are described. In one implementation, a resource manager is configured to manage data processing tasks associated with multiple data elements. An execution platform is coupled to the resource manager and includes multiple execution nodes configured to store data retrieved from multiple remote storage devices. Each execution node includes a cache and a processor, where the cache and processor are independent of the remote storage devices. A metadata manager is configured to access metadata associated with at least a portion of the multiple data elements.Type: GrantFiled: May 16, 2022Date of Patent: May 9, 2023Assignee: Snowflake Inc.Inventors: Thierry Cruanes, Benoit Dageville, Marcin Zukowski
-
Patent number: 11645280Abstract: A function reference for a function is identified in a query. A plurality of processing environments that can provide the function is identified. Function costs for the function to process in the processing environments are obtained. Input data transfer costs are acquired for providing input data identified in the query to each of the functions. A specific one of the functions from a specific processing environment is selected based on the function costs and the input data transfer costs. A query execution plan for executing the query with the specific function is generated. The query execution plan is provided to a database engine for execution.Type: GrantFiled: December 20, 2017Date of Patent: May 9, 2023Assignee: Teradata US, Inc.Inventors: John Mark Morris, Bhashyam Ramesh
-
Patent number: 11645283Abstract: Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: receiving an incoming query statement, wherein the incoming query statement comprises a query statement expression that includes an input variable; predicting an input variable value associated to the input variable; selecting an access path for runtime execution of the query statement in dependence on the predicted input variable value; and performing runtime execution of the query statement using the selected access path.Type: GrantFiled: April 26, 2021Date of Patent: May 9, 2023Assignee: International Business Machined CorporationInventors: Li Cao, Shuo Li, Xiaobo Wang, Xin Peng Liu, Sheng Yan Sun
-
Patent number: 11645281Abstract: The subject technology receives a query, the query including a set of statements for performing the query. The subject technology populates a compilation context based at least in part the query. The subject technology invokes a compiler to perform a compilation process based on the compilation context. The subject technology performs a lookup operation on a stored plan cache for an exact match based on information from the compilation context. The subject technology, in response to determining an exact match, determines whether the particular query plan requires re-compilation based on a data dependent optimization. The subject technology determines whether a plan cache entry corresponding to the particular query plan includes a data property constraint. The subject technology determines whether the data property constraint still holds based on a set of data properties.Type: GrantFiled: August 30, 2022Date of Patent: May 9, 2023Assignee: Snowflake Inc.Inventors: Thierry Cruanes, Xuelai Cui, Sangyong Hwang, Allison Waingold Lee, Boyung Lee, Nicola Dan Onose, William Waddington, Jiaqi Yan, Li Yan, Yongsik Yoon
-
Patent number: 11640617Abstract: Metric forecasting techniques and systems in a digital medium environment are described that leverage similarity of elements, one to another, in order to generate a forecast value for a metric for a particular element. In one example, training data is received that describes a time series of values of the metric for a plurality of elements. The model is trained to generate the forecast value of the metric, the training using machine learning of a neural network based on the training data. The training includes generating dimensional-transformation data configured to transform the training data into a simplified representation to determine similarity of the plurality of elements, one to another, with respect to the metric over the time series. The training also includes generating model parameters of the neural network based on the simplified representation to generate the forecast value of the metric.Type: GrantFiled: March 21, 2017Date of Patent: May 2, 2023Assignee: Adobe Inc.Inventors: Chunyuan Li, Hung Hai Bui, Mohammad Ghavamzadeh, Georgios Theocharous
-
Patent number: 11640610Abstract: Provided are a system, method, and computer program product for generating synthetic data. The method includes receiving a plurality of data types associated with an environment to be evaluated and receiving a plurality of correlations of one data type to another data type. The method also includes generating a correlation graph of the plurality of data types based on the plurality of correlations and generating a directed acyclic graph of the plurality of data types based on the correlation graph. The method further includes generating a hierarchical graph of the plurality of data types by applying a path traversal technique to the directed acyclic graph and generating a synthetic dataset by repeatedly traversing the hierarchical graph to generate a plurality of records of data.Type: GrantFiled: December 29, 2020Date of Patent: May 2, 2023Assignee: Visa International Service AssociationInventors: Xiao Tian, Claudia Carolina Barcenas Cardenas, Shi Cao, Chiranjeet Chetia, Jianhua Huang, Marc Corbalan Vila
-
Patent number: 11636108Abstract: A method builds a regression model for predicting processing times for federated queries using a variety of data sources. The method includes obtaining federated queries (e.g., from benchmarks), and generates a plurality of federated query plans for each federated query. Each federated query plan corresponds to executing a respective federated query using a respective data source as the federation engine. The method includes forming feature vectors for each federated query plan based on cost estimations for executing the respective federated query plan and cost estimations for data transfer. The method further includes training a regression model, using the feature vectors for the plurality of federated query plans, to predict runtimes for executing federated queries using the variety of data sources as a federation engine. Some implementations use the trained regression model to determine a suitable federation engine for a given federated query.Type: GrantFiled: September 17, 2019Date of Patent: April 25, 2023Assignee: TABLEAU SOFTWARE, LLCInventors: Liqi Xu, Richard L. Cole, Daniel Ting
-
Patent number: 11630830Abstract: A format conversion engine for Apache Hadoop that converts data from its original format to a database-like format at certain time points for use by a low latency (LL) query engine. The format conversion engine comprises a daemon that is installed on each data node in a Hadoop cluster. The daemon comprises a scheduler and a converter. The scheduler determines when to perform the format conversion and notifies the converter when the time comes. The converter converts data on the data node from its original format to a database-like format for use by the low latency (LL) query engine.Type: GrantFiled: July 6, 2020Date of Patent: April 18, 2023Assignee: Cloudera Inc.Inventors: Marcel Kornacker, Justin Erickson, Nong Li, Lenni Kuff, Henry Noel Robinson, Alan Choi, Alex Behm
-
Patent number: 11625398Abstract: A cardinality of a query is estimated by creating a join plan for the query. The join plan is converted to a graph representation. A subtree graph kernel matrix is generated for the graph representation of the join plan. The subtree graph kernel matrix is submitted to a trained model for cardinality prediction which produces a predicted cardinality of the query.Type: GrantFiled: December 12, 2018Date of Patent: April 11, 2023Assignee: Teradata US, Inc.Inventor: Dhiren Kumar Bhuyan
-
Patent number: 11620288Abstract: Systems and methods are disclosed for mapping search nodes to a search head in a data intake and query system based on a tenant identifier in order to execute a query received by the data intake and query system. The mapping may allow same or similar search nodes to be used to execute queries that are associated with a particular tenant identifier, in order to take advantage of caching and local data stored with those search nodes. In some cases, search nodes can be mapped based on the tenant identifier using a hashing algorithm, such as a consistent hashing algorithm.Type: GrantFiled: February 25, 2022Date of Patent: April 4, 2023Assignee: Splunk Inc.Inventors: Alexandros Batsakis, Scott Calvert, Alexander Douglas James, Bei Li, Ashish Mathew, James Monschke, Sogol Moshtaghi, Christopher Madden Pride, Xiaowei Wang
-
Patent number: 11615086Abstract: Joining data using a disjunctive operator is described. An example computer-implemented method can include generating a query plan for a query, wherein there is a join operator expression for each of a plurality of disjunctive predicates and each join operator expression includes at least a conjunctive predicate and a disjunctive operator. The method may also include generating a bloom filter for each of the plurality of disjunctive operators. The method may further include evaluating each of the plurality of join operator expressions using a corresponding one of the plurality of disjunctive operators and bloom filter for each of the plurality of disjunctive predicates to generate a result set.Type: GrantFiled: August 2, 2022Date of Patent: March 28, 2023Assignee: Snowflake Inc.Inventors: Thierry Cruanes, Florian Andreas Funke, Guangyan Hu, Jiaqi Yan
-
Patent number: 11611570Abstract: A computer implemented method to generate a signature of a network attack for a network-connected computing system, the signature including rules for identifying the network attack, the method including generating, at a trusted secure computing device, a copy of data distributed across a network; the computing device identifying information about the network attack stored in the copy of the data; and the computing device generating the signature for the network attack based on the information about the network attack so as to subsequently identify the network attack occurring on a computer network.Type: GrantFiled: December 19, 2017Date of Patent: March 21, 2023Assignee: British Telecommunications Public Limited CompanyInventor: Fadi El-Moussa
-
Patent number: 11604795Abstract: Systems and methods are disclosed for executing a query that includes an indication to process data managed by an external data system. The system identifies the external data system that manages the data to be processed and generates a subquery for the external data system indicating that the results of the subquery are to be sent to one worker node of multiple worker nodes. The system instructs the one worker node to distribute the results received from the external data system to multiple worker nodes for processing.Type: GrantFiled: July 31, 2018Date of Patent: March 14, 2023Assignee: Splunk Inc.Inventors: Sourav Pal, Arindam Bhattacharjee
-
Patent number: 11593334Abstract: An apparatus, method and computer program product for physical database design and tuning in relational database management systems. A relational database management system executes in a computer system, wherein the relational database management system manages a relational database comprised of one or more tables storing data. A Deep Reinforcement Learning based feedback loop process also executes in the computer system for recommending one or more tuning actions for the physical database design and tuning of the relational database management system, wherein the Deep Reinforcement Learning based feedback loop process uses a neural network framework to select the tuning actions based on one or more query workloads performed by the relational database management system.Type: GrantFiled: December 27, 2019Date of Patent: February 28, 2023Assignee: Teradata US, Inc.Inventors: Louis Martin Burger, Emiran Curtmola, Sanjay Nair, Frank Roderic Vandervort, Douglas P. Brown
-
Patent number: 11593371Abstract: A relational database management system (RDBMS) accepts a workload comprised of one or more queries against a relational database. The RDBMS evolves a default cost profile into a plurality of cost profiles using fixed or dynamic evolution, wherein each of the cost profiles captures one or more cost parameters for the workload. The cost profiles are represented by a multi-dimensional matrix that has one or more dimensions, and each of the dimensions represents one of the cost parameters. The RDBMS dynamically determines which of the cost profiles is an optimal cost profile for the workload by mapping the cost profiles to the workload using a random walk scoring algorithm or a biased walk scoring algorithm that searches the multi-dimensional matrix to identify the optimal cost profile. The RDBMS selects and performs one or more query execution plans for the workload based on the optimal cost profile for the workload.Type: GrantFiled: August 18, 2020Date of Patent: February 28, 2023Assignee: Teradata US, Inc.Inventors: Wellington Marcos Cabrera Arevalo, Kassem Awada, Mahbub Hasan, Allen N. Diaz, Mohammed Al-Kateb, Awny Kayed Al-Omari
-
Patent number: 11593382Abstract: A computer-implemented method, a computer program product, and a computer system for detecting an inappropriate data type of a column in a database and correcting an encoding for the column. The computer system detects in a table a candidate column that has a mismatching type definition, using database usage statistics. The computer system determines whether conversion of the candidate column is possible. In response to determining that the conversion of the candidate column is possible, the computer system converts values in the candidate column with a first data type to values in a new column with a second data type. The computer system appends the new column in the table. The computer system registers the new column and the second data type in a metadata catalog. The computer system generates a query plan operator for processing a query for the new column.Type: GrantFiled: March 22, 2021Date of Patent: February 28, 2023Assignee: International Business Machines CorporationInventors: Felix Beier, Knut Stolze, Reinhold Geiselhart, Luis Eduardo Oliveira Lizardo
-
Patent number: 11586604Abstract: Embodiments of the present disclosure relate to a method, system, and computer program product for generating one or more in-memory data structures for data access. According to the method, target data associated with a database is identified. Further, the method determines at least one data structure for the target data based on at least one access pattern of the target data in a plurality of historical queries against the database, wherein the target data is accessed in execution of the plurality of historical queries. The method further implements the at least one data structure in a memory to store the target data. The at least one data structure is used for further access to the target data in execution of a further query against the database.Type: GrantFiled: July 2, 2020Date of Patent: February 21, 2023Assignee: International Business Machines CorporationInventors: Xiaobo Wang, Shuo Li, Sheng Yan Sun, Peng Hui Jiang
-
Patent number: 11586612Abstract: Disclosed herein is a system and method for enabling creating and managing structured tables on a blockchain thereby facilitating retrieval of data records with standard database operational commands and in a structured manner. Creating structured tables as per a defined schema provides for data storage in a manner which enables easy integration with existing business applications. Further, the system and method provides for storing unstructured data records in a structured manner in the blockchain.Type: GrantFiled: September 26, 2019Date of Patent: February 21, 2023Assignee: Innoplexus AGInventor: Abhijit Keskar
-
Patent number: 11573960Abstract: A computer-implemented method provides application-based query transformations. The method includes determining an application is initiated. The method includes identifying a set of execution units included in the application. The execution units are based on of a set of queries in the application and a set of actions in the application. The method also includes building a query dependency graph (QDG) comprising a plurality of nodes, wherein each node of the plurality of nodes is correlated to an execution unit, and each node is linked to at least one additional node, the link indicating a relative execution order and a common attribute each node and the additional node. The method includes merging, based on a performance architecture, two or more of the set of execution units into a section. The method includes processing the application according to the QDG.Type: GrantFiled: March 18, 2021Date of Patent: February 7, 2023Assignee: International Business Machines CorporationInventors: Shuo Li, Xiaobo Wang, Hong Mei Zhang, Sheng Yan Sun
-
Patent number: 11567937Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.Type: GrantFiled: May 12, 2021Date of Patent: January 31, 2023Assignee: Oracle International CorporationInventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
-
Patent number: 11567956Abstract: A format conversion engine for Apache Hadoop that converts data from its original format to a database-like format at certain time points for use by a low latency (LL) query engine. The format conversion engine comprises a daemon that is installed on each data node in a Hadoop cluster. The daemon comprises a scheduler and a converter. The scheduler determines when to perform the format conversion and notifies the converter when the time comes. The converter converts data on the data node from its original format to a database-like format for use by the low latency (LL) query engine.Type: GrantFiled: July 6, 2020Date of Patent: January 31, 2023Assignee: Cloudera, Inc.Inventors: Marcel Kornacker, Justin Erickson, Nong Li, Lenni Kuff, Henry Noel Robinson, Alan Choi, Alex Behm
-
Patent number: 11561977Abstract: According to some embodiments, a system to manage a query plan cache for a Database Management System (“DBMS”) includes a DBMS query plan cache data store. The DBMS query plan cache data store may contain, for example, electronic records representing a plurality of query plans each associated with a set of instructions created in response to a query previously submitted by a user. A DBMS query plan cache management platform may then calculate a utility score for each query plan in the DBMS query plan cache data store. At least one query plan may be evicted from the DBMS query plan cache data store based on the calculated utility score, wherein the evicting is not based on a size of the DBMS query plan cache.Type: GrantFiled: May 17, 2021Date of Patent: January 24, 2023Assignee: SAP SEInventors: Sung Gun Lee, Sanghee Lee, Hyung Jo Yoon, Boyeong Jeon
-
Patent number: 11556710Abstract: A computer system processes a group of inputs. A group of entities that is input for processing is intercepted. The intercepted group is expanded into individual entities. Each of the individual entities is processed to produce results for each individual entity. The results for each individual entity are intercepted and merged to produce results for the group of entities. Embodiments of the present invention further include a method and program product for processing a group of inputs in substantially the same manner described above.Type: GrantFiled: May 11, 2018Date of Patent: January 17, 2023Assignee: International Business Machines CorporationInventors: Brian S. Dreher, Sheng Hua Bao, Xiaoyang Gao, Yanyan Han
-
Patent number: 11556538Abstract: Methods, systems, and computer-readable storage media for receiving, by a current database system, a query plan file representative of a captured query plan from a source database system, receiving, by the current database system, a set of definitions including one or more definitions, each definition in the set of definitions corresponding to an object that is implicated by the query plan, the object being included in a set of objects, and determining, by the current database system, that each definition in the set of definitions is identical to a respective definition of a corresponding object within the current database system, and in response: executing the captured query plan in the current database system to provide a query result.Type: GrantFiled: May 15, 2020Date of Patent: January 17, 2023Assignee: SAP SEInventors: Youngbin Bok, Jaehyok Chong, Won Jun Chang, Sungguk Lim
-
Patent number: 11550833Abstract: An architecture for semantic search over encrypted data that improves upon existing encrypted data search techniques by providing a solution that is space-efficient on both the cloud and client sides, considers the semantic meaning of the user's query, and returns a list of documents accurately ranked by their similarity to the query. Different search schemes are presented based on S3C architecture (namely, FKSS, SKSS, and KSWF) that are fine-tuned for different types of datasets. The system requires only a single plaintext query to be entered and is easily portable to thin-clients, making it simple and quick for users to use. The system is also shown to be secure and resistant to attacks.Type: GrantFiled: October 24, 2018Date of Patent: January 10, 2023Assignee: University of Louisiana at LafayetteInventors: Jason Woodworth, Mohsen Amini Salehi
-
Patent number: 11544290Abstract: Embodiments for providing intelligent data replication and distribution in a computing environment. Data access patterns of one or more queries issued to a plurality of data partitions may be forecasted. Data may be dynamically distributed and replicated to one or more existing data partitions or additional of the plurality of data partitions according to the forecasting.Type: GrantFiled: January 13, 2020Date of Patent: January 3, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Stefano Braghin, Srikumar Venugopal
-
Patent number: 11544261Abstract: A system for optimizing data requests in an electronic data storage environment may be configured to receive and identify data requests to perform operations on data stored in a data storage environment. The system may further to implement tuning algorithms on the data requests upon identifying that the data requests are causing the data storage environment to perform below optimal performance. The present invention may be implemented as a system, a computer program product, or a computer-implemented method.Type: GrantFiled: October 1, 2020Date of Patent: January 3, 2023Assignee: BANK OF AMERICA CORPORATIONInventors: Krishna Rangarao Mamadapur, Jigesh Rajendra Safary
-
Patent number: 11537596Abstract: A method includes registering a type of data file. Registering the type of data file includes storing metadata describing the type of data file, the metadata including a file storage service and a parser for the type of data file. The method includes receiving a first data file of the type from a first domain, the first data file having raw data, storing the first data file, storing one or more access rules and a lineage of the first data file, parsing the first data file using the parser to generate a content from the raw data, storing the content separately from the raw data, providing the first data file and the content to a search service, and automatically updating one or more second data files from one or more other domains based on the content of the first data file using the search service and the lineage.Type: GrantFiled: June 22, 2020Date of Patent: December 27, 2022Assignee: Schlumberger Technology CorporationInventors: Hrvoje Markovic, Hans Eric Klumpen, RajKumar Kannan
-
Patent number: 11537604Abstract: A system and method for processing of queries including receiving a query including a set operation and a sort operation, wherein the set operation includes a first data structure and a second data structure and the sort operation requests a result set that is sorted based on a column or attribute of the first data structure and a column or attribute of the second data structure; generating a query plan in which a sort operation occurs prior to the set operation; determining a first, partial set of one or more resultant rows responsive to the query; sending the first, partial set of one or more resultant rows responsive to the query to a client; determining a second, partial set of one or more resultant rows responsive to the query; and sending the second, partial set of one or more resultant rows to the client.Type: GrantFiled: November 25, 2020Date of Patent: December 27, 2022Assignee: PROGRESS SOFTWARE CORPORATIONInventors: Mohammed Sayeed Akthar, Sunil Jardosh
-
Patent number: 11531704Abstract: What is disclosed is an improved approach to perform automatic partitioning, without requiring any expertise on the part of the user. A three stage processing pipeline is provided to generate candidate partition schemes, to evaluate the candidate using real table structures that are empty, and to then implement a selected scheme with production data for evaluation. In addition, an improved approach is described to perform automatic interval partitioning, where the inventive concept implements interval partitioning that does not impose these implicit constraints on the partition key column.Type: GrantFiled: September 11, 2020Date of Patent: December 20, 2022Inventors: George Eadon, Ramesh Kumar, Ananth Raghavan
-
Patent number: 11520780Abstract: Systems and techniques are described for efficient, general-purpose, and potentially decentralized databases, distributed storage systems, version control systems, and/or other types of data repositories. Data is represented in a database system in such a way that any value is represented by a unique identifier which is derived from the value itself. Any database peer in the system will derive an identical identifier from the same logical value. The identifier for a value may be derived using a variety of mechanisms, including, without limitation, a hash function known to all peers in the system. The values may be organized hierarchically as a tree of nodes. Any two peers storing the same logical value will deterministically represent that value with a graph, such as the described “Prolly” tree, having the same topology and hash value, irrespective of possibly differing sequences of mutations which caused each to arrive at the same final value.Type: GrantFiled: May 10, 2021Date of Patent: December 6, 2022Assignee: Salesforce, Inc.Inventors: Aaron Boodman, Rafael Weinstein, Erik Arvidsson, Chris Masone, Dan Willhite, Benjamin Kalman
-
Patent number: 11507577Abstract: Methods, systems, apparatuses, and computer program products are provided for determining a query plan. A query is received that comprises a request for a data result for each of a plurality of original time windows. The plurality of original time windows included in the query are identified. An initial window representation is generated that identifies a set of connections between windows in a window set that includes at least the original time windows. A revised window representation is generated that includes an alternative set of connections between windows in the window set based at least on an execution cost for at least one window. The revised window representation is selected to obtain the data result for each of the plurality of original time windows. A revised query plan based on the revised window representation is provided to obtain the data result for each of the plurality of original time windows.Type: GrantFiled: May 28, 2020Date of Patent: November 22, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Alexander Raizman, Wentao Wu, Philip A. Bernstein
-
Patent number: 11507590Abstract: Techniques are introduced herein for maintaining geometry-type data on persistent storage and in memory. Specifically, a DBMS that maintains a database table, which includes at least one column storing spatial data objects (SDOs), also maintains metadata for the database table that includes definition data for one or more virtual columns of the table. According to an embodiment, the definition data includes one or more expressions that calculate minimum bounding box values for SDOs stored in the geometry-type column in the table. The one or more expressions in the metadata maintained for the table are used to create one or more in-memory columns that materialize the bounding box data for the represented SDOs. When a query that uses spatial-type operators to perform spatial filtering over data in the geometry-type column is received, the DBMS replaces the spatial-type operators with operators that operate over the scalar bounding box information materialized in memory.Type: GrantFiled: June 17, 2020Date of Patent: November 22, 2022Assignee: Oracle International CorporationInventors: Siva Ravada, Ying Hu, Zhen Hua Liu, Shasank Kisan Chavan, Aurosish Mishra, Vikas Arora
-
Patent number: 11500871Abstract: A computer-implemented method is disclosed that includes operations of receiving a query to be executed, the query including an indication of a data source at which input data is be to obtained, wherein the query is to be executed on the input data, determining a schema of the input data, determining fields of the input data that are required for execution of the query by analyzing a sequence of operators forming the query, determining one or more alterations to the query to improve efficiency of the execution of the query based on the fields of input data required for the execution, and generating an altered query be altering the query in accordance with the one or more alterations. The method may further include converting the query to a directed acyclic graph (DAG) and providing the DAG to a distributed processing engine configured to execute the DAG.Type: GrantFiled: October 19, 2020Date of Patent: November 15, 2022Assignee: SPLUNK Inc.Inventors: Chinmay Madhav Kulkarni, Lin Ma, Amir Malekpour, Mohan Rajagopalan, John C. Reed, Ram Sriharsha
-
Patent number: 11494379Abstract: Disclosed herein are systems and methods for pre-filter deduplication for multidimensional two-sided interval joins. In an embodiment, a data platform receives query instructions for a two-sided N dimensional interval join, where N is an integer greater than 1. The two-sided N dimensional interval join has an interval-join predicate that compares intervals determined from the input relations in each of N dimensions. The data platform implements the two-sided N dimensional interval join as a query-plan section that includes an N dimensional band join that is followed by a deduplication operator that is followed by a filter that applies the interval-join predicate. The N dimensional band join includes a hash join keyed to N dimensional domain cells overlapped at least in part by intervals determined from the input relations in each of the N dimensions. The deduplication operator removes duplicate rows from a potential-duplicates subset of the output of the N dimensional band join.Type: GrantFiled: April 23, 2021Date of Patent: November 8, 2022Assignee: Snowflake Inc.Inventors: Matthias Carl Adams, Spyridon Triantafyllis, Lars Volker, Kevin Wang
-
Patent number: 11487772Abstract: The present disclosure provides a multi-party data joint query method, a device, a server and a storage medium. The multi-party data joint query method executed by a manager includes: analyzing a multi-party joint query sentence to obtain a logical execution plan; processing the logical execution plan according to providers of respective nodes in the logical execution plan to obtain a physical execution plan of each provider; and generating a query instruction of each provider according to the physical execution plan of each provider, and sending the query instruction to respective provider. The query instruction is configured to instruct the providers to perform a query cooperatively.Type: GrantFiled: December 26, 2019Date of Patent: November 1, 2022Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.Inventors: Zhi Feng, Yu Zhang, Sen Zhang
-
Patent number: 11487795Abstract: Disclosed is a template-based automatic question and answer method for software bug. An entity relationship triple is extracted from a bug corpus and a natural language pattern is acquired; an entity relationship in the triple is determined; a query template corresponding to the natural language pattern is acquired; an entity in a question q proposed by a user is replaced with an entity type to acquire a question q?; then, the entity type in q? and an entity type in the natural language pattern are compared and searched for and a similarity is calculated; then, a SPARQL query pattern of the question q is acquired according to the similarity and the entity in the question q; and finally, the SPARQL query pattern of the question q is executed so as to acquire an answer to the question q.Type: GrantFiled: August 28, 2019Date of Patent: November 1, 2022Inventors: Xiaobing Sun, Jinting Lu, Bin Li
-
Patent number: 11487708Abstract: Techniques for visual data preparation are described. An interactive visual data preparation service provides a user with a graphical user interface that presents values from a sample taken of a dataset along with statistical information associated with those values. A user uses the graphical user interface to test out various transformations to the sample dataset by applying transformations and viewing near-immediate results of those transformations as applied to the sample. The desired set of transformations is represented as a recipe object, which can be used to perform data preparation against the overall dataset or other datasets on behalf of the user or other users.Type: GrantFiled: November 11, 2020Date of Patent: November 1, 2022Assignee: Amazon Technologies, Inc.Inventors: Surbhi Dangi, Gopinath Duddi, Amit Gul Phagwani, Romi Boimer, Ronald Stephen Kyker
-
Patent number: 11481398Abstract: A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.Type: GrantFiled: December 9, 2020Date of Patent: October 25, 2022Assignee: Databricks Inc.Inventors: Alexander Behm, Ankur Dave, Ryan Deng, Shoumik Palkar
-
Patent number: 11468073Abstract: Techniques are provided for gathering statistics in a database system. The techniques involve gathering some statistics using an “on-the-fly” technique, some statistics through a “high-frequency” technique, and yet other statistics using a “prediction” technique. The technique used to gather each statistic is based, at least in part, on the overhead required to gather the statistic. For example, low-overhead statistics may be gathered “on-the-fly” using the same process that is performing the operation that affects the statistic, while statistics whose gathering incurs greater overhead may be gathered in the background, while the database is live, using the high-frequency technique. The prediction technique may be used for relatively-high overhead statistics that can be predicted based on historical data and the current value of predictor statistics.Type: GrantFiled: August 6, 2019Date of Patent: October 11, 2022Assignee: Oracle International CorporationInventors: Mohamed Zait, Yuying Zhang, Hong Su, Jiakun Li
-
Patent number: 11468065Abstract: An information processing apparatus according to the present application includes an acquiring unit and a selecting unit. The acquiring unit acquires a plurality of pieces of second triple information hierarchized based on a conceptual system in a plurality of pieces of first triple information indicating a relationship about three types of elements and statistical information indicating the number of pieces of the first triple information associated with each of the plurality of pieces of the second triple information. The selecting unit selects, based on the statistical information acquired by the acquiring unit and based on a predetermined standard related to the statistical information, from among the plurality of pieces of the second triple information, a plurality of pieces of target triple information to be used for a clustering process.Type: GrantFiled: February 21, 2019Date of Patent: October 11, 2022Assignee: YAHOO JAPAN CORPORATIONInventors: Kiyoshi Nitta, Iztok Savnik
-
Patent number: 11461327Abstract: The subject technology receives a query, the query including a set of statements for performing the query. The subject technology populates a compilation context based at least in part the query. The subject technology provides the compilation context to a compiler. The subject technology invokes the compiler to perform a compilation process based on the compilation context, the compilation process comprising performing a lookup operation on a stored plan cache for an exact match based on information from the compilation context, the stored plan cache including a set of stored query plans, and determining whether the exact match of a particular query plan is found in the stored plan cache to avoid compiling the query using the compilation context.Type: GrantFiled: April 8, 2022Date of Patent: October 4, 2022Assignee: Snowflake Inc.Inventors: Thierry Cruanes, Xuelai Cui, Sangyong Hwang, Allison Waingold Lee, Boyung Lee, Nicola Dan Onose, William Waddington, Jiaqi Yan, Li Yan, Yongsik Yoon
-
Patent number: 11461195Abstract: A method for processing query fault, where a database server receives a query statement and generates a corresponding query plan tree including multiple layers of operators in a pipeline relationship, and each layer includes operation symbols having logical relationship with each other. The server executes the query statement according to the query plan tree, extracts intermediate status information of a faulty operator when a fault occurs in a process of executing the query statement, updates operation symbols of the faulty operator and a logical relationship among the operation symbols according to the query plan tree and the intermediate status information to obtain a reconstructed query plan tree, and continues to execute the query statement according to the reconstructed query plan tree after the fault is recovered.Type: GrantFiled: November 23, 2020Date of Patent: October 4, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Jinwei Zhu, Qingqing Zhou, Pinggao Zhou
-
Patent number: 11461304Abstract: Signature-based cache optimization for data preparation includes: performing a first set of sequenced data preparation operations on one or more sets of data to generate a plurality of transformation results; caching one or more of the plurality of transformation results and one or more corresponding operation signatures, a cached operation signature being derived based at least in part on a subset of sequenced operations that generated a corresponding result; receiving a specification of a second set of sequenced operations; determining an operation signature associated with the second set of sequenced operations; identifying a cached result among the cached results based at least in part on the determined operation signature; and outputting the cached result.Type: GrantFiled: March 10, 2020Date of Patent: October 4, 2022Assignee: DataRobot, Inc.Inventors: Dave Brewster, Victor Tze-Yeuan Tso