Patents Assigned to Cloudera, Inc.
  • Patent number: 10599664
    Abstract: Systems and methods for very fast grouping of “similar” SQL queries according to user-supplied similarity criteria. The user-supplied similarity criteria include a threshold quantifying the degree of similarity between SQL queries and common artifacts included in the queries. A similarity-characterizing data structure allows for the very fast grouping of “similar” SQL queries. Because the computation is distributed among multiple compute nodes, a small cluster of compute nodes takes a short time to compute the similarity-characterizing data on a workload of tens of millions of queries. The user can supply the similarity criteria through a UI or a command line tool. Furthermore, the user can adjust the degree of similarity by supplying new similarity criteria. Accordingly, the system can display in real time or near real time, updated SQL groupings corresponding to the newly supplied similarity criteria using the originally computed similarity-characterizing data structure.
    Type: Grant
    Filed: April 24, 2017
    Date of Patent: March 24, 2020
    Assignee: Cloudera, Inc.
    Inventors: Rituparna Agrawal, Anupam Singh, Prithviraj Pandian
  • Patent number: 10572306
    Abstract: Embodiments are disclosed for a utilization-aware approach to cluster scheduling, to address this resource fragmentation and to improve cluster utilization and job throughput. In some embodiments a resource manager at a master node considers actual usage of running tasks and schedules opportunistic work on underutilized worker nodes. The resource manager monitors resource usage on these nodes and preempts opportunistic containers in the event this over-subscription becomes untenable. In doing so, the resource manager effectively utilizes wasted resources, while minimizing adverse effects on regularly scheduled tasks.
    Type: Grant
    Filed: May 15, 2017
    Date of Patent: February 25, 2020
    Assignee: Cloudera, Inc.
    Inventor: Karthik Kambatla
  • Patent number: 10514948
    Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.
    Type: Grant
    Filed: November 9, 2017
    Date of Patent: December 24, 2019
    Assignee: Cloudera, Inc.
    Inventors: Vikas Singh, Sudhanshu Arora, Philip Zeyliger, Marcelo Masiero Vanzin, Chang She
  • Patent number: 10346432
    Abstract: A compaction policy imposing soft limits to optimize system efficiency is used to select various rowsets on which to perform compaction, each rowset storing keys within an interval called a keyspace. For example, the disclosed compaction policy results in a decrease in a height of the tablet, removes overlapping rowsets, and creates smaller sized rowsets. The compaction policy is based on the linear relationship shared between the keyspace height and the cost associated with performing an operation (e.g., an insert operation) in that keyspace. Accordingly, various factors determining which rowsets are to be compacted, how large the compacted rowsets are to be made, and when to perform the compaction, are considered within the disclosed compaction policy. Furthermore, a system and method for performing compaction on the selected datasets in a log-structured database is also provided.
    Type: Grant
    Filed: March 17, 2016
    Date of Patent: July 9, 2019
    Assignee: Cloudera, Inc.
    Inventor: Todd Lipcon
  • Patent number: 10255335
    Abstract: Techniques are described for analyzing usage of data stored in a data storage system without accessing the stored data. In some embodiments, workload data indicative of queries executed at the data storage system on stored data is received. This workload data can include query logs generated during execution of the queries. The workload data is processed to identify data elements such as tables, columns, and views associated with the stored data as well as information regarding usage of the identified data elements. Usage can include operations performed on the data elements during execution of the queries. Based on this processing relationships between the identified data elements can be inferred and visualizations generated that convey information regarding usage of the data stored at the data storage system.
    Type: Grant
    Filed: November 7, 2016
    Date of Patent: April 9, 2019
    Assignee: Cloudera, Inc.
    Inventor: Yihua Ding
  • Patent number: 10187461
    Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.
    Type: Grant
    Filed: April 13, 2016
    Date of Patent: January 22, 2019
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 10171635
    Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.
    Type: Grant
    Filed: August 18, 2014
    Date of Patent: January 1, 2019
    Assignee: Cloudera, Inc.
    Inventors: David Alves, Todd Lipcon
  • Patent number: 10120904
    Abstract: Systems and methods are disclosed for resource management in a distributed computing environment. In some embodiments, a resource manager for a large distributed cluster needs to be able to provide resource responses very quickly. But each query may also not be accurate in initial resource request and will often have to come back to the resource manager multiple times. An artifact may provide low latency query responses by using resource request caching that can handle re-requests of resources. According to some embodiments, a queuing mechanism may take into account resources currently expended and any resource requirement estimates available in order to make queuing decisions that meet policies set by an administrator. In some embodiments, scheduling decisions are distribute across a cluster of computing systems while still maintaining approximate compliance with resource management policies set by an administrator.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: November 6, 2018
    Assignee: Cloudera, Inc.
    Inventor: Jairam Ranganathan
  • Patent number: 10007864
    Abstract: An image processing system involves a camera, at least one processor associated with the camera, non-transitory storage, a lexical database of terms and image classification software. The image processing system uses the image classification software to assign hyponyms and associated probabilities to an image and then builds a subset hierarchical tree of hypernyms from the lexical database of terms. The processor then scores the hypernyms and identifies at least one hypernym for the image that has a score that is calculated to have a value that is greater than one of: a pre-specified threshold score, or all other calculated level scores within the subset hierarchical tree. The associated methods are also disclosed.
    Type: Grant
    Filed: February 8, 2018
    Date of Patent: June 26, 2018
    Assignee: Cloudera, Inc.
    Inventors: Micha Gorelick, Hilary Mason, Grant Custer
  • Patent number: 9990399
    Abstract: A low latency query engine for APACHE HADOOP™ that provides real-time or near real-time, ad hoc query capability, while completing batch-processing of MapReduce. In one embodiment, the low latency query engine comprises a daemon that is installed on data nodes in a HADOOP™ cluster for handling query requests and all internal requests related to query execution. In a further embodiment, the low latency query engine comprises a daemon for providing name service and metadata distribution. The low latency query engine receives a query request via client, turns the request into collections of plan fragments and coordinates parallel and optimized execution of the plan fragments on remote daemons to generate results at a much faster speed than existing batch-oriented processing frameworks.
    Type: Grant
    Filed: May 13, 2016
    Date of Patent: June 5, 2018
    Assignee: Cloudera, Inc.
    Inventors: Marcel Kornacker, Justin Erickson, Nong Li, Lenni Kuff, Henry Noel Robinson, Alan Choi, Alex Behm
  • Patent number: 9977826
    Abstract: A computerized method for generating and evaluating natural language-generated text involves receiving, in a computer, data input by a user, generating, using a natural language generation technique, multiple instances of text stories based upon both contents of a corpus and the received data; analyzing the multiple instances of text stories as a weighted combination of computed geographic scores, distance scores, information content scores, replacement scores and extra aspect scores, providing a ranked set of the generated text stories to a user, receiving a selection of one of the text stories in the ranked set, and storing the selected story.
    Type: Grant
    Filed: October 21, 2015
    Date of Patent: May 22, 2018
    Assignee: Cloudera, Inc.
    Inventors: Micha Gorelick, Hilary Mason, Grant Custer
  • Patent number: 9946958
    Abstract: An image processing system involves a camera, at least one processor associated with the camera, non-transitory storage, a lexical database of terms and image classification software. The image processing system uses the image classification software to assign hyponyms and associated probabilities to an image and then builds a subset hierarchical tree of hypernyms from the lexical database of terms. The processor then scores the hypernyms and identifies at least one hypernym for the image that has a score that is calculated to have a value that is greater than one of: a pre-specified threshold score, or all other calculated level scores within the subset hierarchical tree. The associated methods are also disclosed.
    Type: Grant
    Filed: October 14, 2016
    Date of Patent: April 17, 2018
    Assignee: Cloudera, Inc.
    Inventors: Micha Gorelick, Hilary Mason, Grant Custer
  • Patent number: 9934382
    Abstract: Embodiments of the present disclosure include systems and methods for encrypting a virtual machine image and accessing an encrypted virtual machine image. According to some embodiments an encryption module can encrypt a virtual machine image and place an encryption boot loader. The encryption boot loader may be extracted from the encrypted virtual machine image, be transmitted to, and stored at a key storage system. Upon a request to boot an operating system associated with the encrypted virtual machine image, a pre-boot execution environment may communicate with an image service to retrieve the encryption boot loader from the remote key storage system. The virtual machine image may therefore be decrypted suing the encryption boot loader, which may allow booting of the operating system.
    Type: Grant
    Filed: October 28, 2014
    Date of Patent: April 3, 2018
    Assignee: Cloudera, Inc.
    Inventor: Eduardo Garcia
  • Patent number: 9842126
    Abstract: Systems and methods for checking for region consistency and table integrity problems and automatically repairing a corrupted HBase cluster. The methods and systems operate in a diagnostic mode and a diagnostic and repair mode. The methods include fixing table integrity problems, such as backwards table regions, table region holes, table region overlap, and the like to restore table integrity invariant. Once the table integrity has been restored, each row key resolves to exactly one region. The methods further include fixing region inconsistencies, such as bad region assignment, no region present in the meta table, region information not in the Hadoop Distributed File System (HDFS), and the like to restore region consistency invariant. The information in the HDFS is taken as ground truth and any meta table or assignment problems that are inconsistent with the HDFS is deemed wrong and removed.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: December 12, 2017
    Assignee: Cloudera, Inc.
    Inventor: Jonathan Ming-Cyn Hsieh
  • Patent number: 9819491
    Abstract: Embodiments of the present disclosure include systems and methods for secure release of secret information over a network. The server can be configured to receive a request from a client to access the deposit of secret information, send an authorization request to at least one designated trustee in the set of designated trustees for the deposit of secret information, receive responses over the network from one or more of the designated trustees in the set of designated trustees and apply a trustee policy to the responses from the one or more designated trustees in the set of trustees to determine if the request is authorized. If the request is authorized, the server can send the secret information to the client. If the request is not authorized, the server denies access by the client to the secret information.
    Type: Grant
    Filed: May 9, 2016
    Date of Patent: November 14, 2017
    Assignee: Cloudera, Inc.
    Inventors: Dustin C. Kirkland, Eduardo Garcia
  • Patent number: 9817859
    Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the hatch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the hatch of multiple messages received from the agent node.
    Type: Grant
    Filed: June 1, 2016
    Date of Patent: November 14, 2017
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9817867
    Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.
    Type: Grant
    Filed: November 16, 2015
    Date of Patent: November 14, 2017
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9753954
    Abstract: Systems and methods for data node fencing in a distributed file system to prevent data inconsistencies and corruptions are disclosed. An embodiment includes implementing a protocol whereby data nodes detect a failover and determine an active name node based on transaction identifiers associated with transaction requests. The data nodes also provide to the active name node block location information and an acknowledgment. The embodiment further includes a protocol whereby a name node refrains from issuing invalidation requests to the data nodes until the name node receives acknowledgments from all data nodes that are functional.
    Type: Grant
    Filed: September 11, 2013
    Date of Patent: September 5, 2017
    Assignee: Cloudera, Inc.
    Inventors: Todd Lipcon, Aaron T. Myers, Eli Collins
  • Patent number: 9747333
    Abstract: A sysSQL technology for querying operating system states of multiple hosts in a cluster using a Structured Query Language (SQL) query is disclosed. An administrator of a cluster can use a graphical or text-based user interface to submit an SQL query to determine the operating system states of multiple hosts in parallel. The technology parses the SQL query to determine the datasets needed to execute the SQL query and aggregates those datasets from the multiple hosts. The technology then creates a temporary database to execute the SQL query and provides the results from the SQL query for display on the user interface.
    Type: Grant
    Filed: October 8, 2014
    Date of Patent: August 29, 2017
    Assignee: Cloudera, Inc.
    Inventor: Philip Zeyliger
  • Patent number: 9716624
    Abstract: Systems and methods for centralized configuration of a distributed computing cluster are disclosed. One embodiment of the disclosed technology provides a user environment that facilitates a selection of a service to be run on hosts in the distributed computing cluster and configuration of the service or hosts in the distributed computer cluster. The disclosed technology can further configure each of the hosts in the distributed computing cluster to run the service based on a set of configuration settings.
    Type: Grant
    Filed: October 8, 2014
    Date of Patent: July 25, 2017
    Assignee: Cloudera, Inc.
    Inventors: Philip Zeyliger, Philip Lee Langdale, Patrick David Hunt