Patents Assigned to Cloudera, Inc.
-
Patent number: 9690671Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.Type: GrantFiled: October 29, 2014Date of Patent: June 27, 2017Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
-
Patent number: 9600492Abstract: Systems and methods of data processing performance enhancement are disclosed. One embodiment includes, invoking operating system calls to optimize cache management by an I/O component; wherein, the operating system calls are invoked to perform one or more of; proactive triggering of readaheads for sequential read requests of a disk; purging data out of buffer cache after writing to the disk or performing sequential reads from the desk; and/or eliminating a delay between when a write is performed and when written data from the write is flushed to the disk from the buffer cache.Type: GrantFiled: August 1, 2016Date of Patent: March 21, 2017Assignee: Cloudera, Inc.Inventor: Todd Lipcon
-
Patent number: 9552165Abstract: Systems and methods of a memory allocation buffer to reduce heap fragmentation. In one embodiment, the memory allocation buffer structures a memory arena dedicated to a target region that is one of a plurality of regions in a server in a database cluster such as an HBase cluster. The memory area has a chunk size (e.g., 2 MB) and an offset pointer. Data objects in write requests targeted to the region are received and inserted to the memory arena at a location specified by the offset pointer. When the memory arena is filled, a new one is allocated. When a MemStore of the target region is flushed, the entire memory arenas for the target region are freed up. This reduces heap fragmentation that is responsible for long and/or frequent garbage collection pauses.Type: GrantFiled: September 4, 2015Date of Patent: January 24, 2017Assignee: Cloudera, Inc.Inventor: Todd Lipcon
-
Patent number: 9477731Abstract: A format conversion engine for Apache Hadoop that converts data from its original format to a database-like format at certain time points for use by a low latency (LL) query engine. The format conversion engine comprises a daemon that is installed on each data node in a Hadoop cluster. The daemon comprises a scheduler and a converter. The scheduler determines when to perform the format conversion and notifies the converter when the time comes. The converter converts data on the data node from its original format to a database-like format for use by the low latency (LL) query engine.Type: GrantFiled: October 1, 2013Date of Patent: October 25, 2016Assignee: Cloudera, Inc.Inventors: Marcel Kornacker, Justin Erickson, Nong Li, Lenni Kuff, Henry Noel Robinson, Alan Choi, Alex Behm
-
Patent number: 9405692Abstract: Systems and methods of data processing performance enhancement are disclosed. One embodiment includes, invoking operating system calls to optimize cache management by an I/O component; wherein, the operating system calls are invoked to perform one or more of; proactive triggering of readaheads for sequential read requests of a disk; purging data out of buffer cache after writing to the disk or performing sequential reads from the desk; and/or eliminating a delay between when a write is performed and when written data from the write is flushed to the disk from the buffer cache.Type: GrantFiled: March 21, 2012Date of Patent: August 2, 2016Assignee: Cloudera, Inc.Inventor: Todd Lipcon
-
Patent number: 9361203Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.Type: GrantFiled: July 10, 2015Date of Patent: June 7, 2016Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9342557Abstract: A low latency query engine for APACHE HADOOP™ that provides real-time or near real-time, ad hoc query capability, while completing batch-processing of MapReduce. In one embodiment, the low latency query engine comprises a daemon that is installed on data nodes in a HADOOP™ cluster for handling query requests and all internal requests related to query execution. In a further embodiment, the low latency query engine comprises a daemon for providing name service and metadata distribution. The low latency query engine receives a query request via client, turns the request into collections of plan fragments and coordinates parallel and optimized execution of the plan fragments on remote daemons to generate results at a much faster speed than existing batch-oriented processing frameworks.Type: GrantFiled: March 13, 2013Date of Patent: May 17, 2016Assignee: Cloudera, Inc.Inventors: Marcel Kornacker, Justin Erickson, Nong Li, Lenni Kuff, Henry Noel Robinson, Alan Choi, Alex Behm
-
Patent number: 9338008Abstract: Embodiments of the present disclosure include systems and methods for secure release of secret information over a network. The server can be configured to receive a request from a client to access the deposit of secret information, send an authorization request to at least one designated trustee in the set of designated trustees for the deposit of secret information, receive responses over the network from one or more of the designated trustees in the set of designated trustees and apply a trustee policy to the responses from the one or more designated trustees in the set of trustees to determine if the request is authorized. If the request is authorized, the server can send the secret information to the client. If the request is not authorized, the server denies access by the client to the secret information.Type: GrantFiled: April 1, 2013Date of Patent: May 10, 2016Assignee: Cloudera, Inc.Inventors: Dustin C. Kirkland, Eduardo Garcia
-
Patent number: 9317572Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.Type: GrantFiled: September 8, 2010Date of Patent: April 19, 2016Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9201910Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.Type: GrantFiled: August 18, 2014Date of Patent: December 1, 2015Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9172608Abstract: Systems and methods for centralized configuration and monitoring of a distributed computing cluster are disclosed. One embodiment of the disclose technology enables deployment and central operation a complete Hadoop stack. The application automates the installation process and reduces deployment time from weeks to minutes. One embodiment further provides a cluster-wide, real time view of the services running and the status of the host machines in a cluster via a single, central place to enact configuration changes across the computing cluster which further incorporates reporting and diagnostic tools to optimize cluster performance and utilization.Type: GrantFiled: August 3, 2012Date of Patent: October 27, 2015Assignee: Cloudera, Inc.Inventors: Philip Zeyliger, Philip L. Langdale, Patrick D. Hunt
-
Patent number: 9128949Abstract: Systems and methods of a memory allocation buffer to reduce heap fragmentation. In one embodiment, the memory allocation buffer structures a memory arena dedicated to a target region that is one of a plurality of regions in a server in a database cluster such as an HBase cluster. The memory area has a chunk size (e.g., 2 MB) and an offset pointer. Data objects in write requests targeted to the region are received and inserted to the memory arena at a location specified by the offset pointer. When the memory arena is filled, a new one is allocated. When a MemStore of the target region is flushed, the entire memory arenas for the target region are freed up. This reduces heap fragmentation that is responsible for long and/or frequent garbage collection pauses.Type: GrantFiled: January 18, 2013Date of Patent: September 8, 2015Assignee: Cloudera, Inc.Inventor: Todd Lipcon
-
Patent number: 9082127Abstract: Systems and methods of facilitating collecting and aggregating datasets that are machine or user-generated for analysis are disclosed. One embodiment includes, collecting a dataset on a machine on which the dataset is received or generated, wherein, the dataset is collected from a data source on the machine, aggregating the dataset collected from the data source at a receiving location, performing analytics on the dataset upon collection or aggregation, and/or writing the dataset aggregated at the receiving location to a storage location.Type: GrantFiled: September 8, 2010Date of Patent: July 14, 2015Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9081888Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.Type: GrantFiled: September 8, 2010Date of Patent: July 14, 2015Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 8880592Abstract: Systems and methods for user interface implementation for partial display update are disclosed. One embodiment of the method, which may be embodied on a system includes, in a response received from a web server, identifying, for a web page, a set of elements able to he updated partially as displayed without refreshing the user interface in its entirety, detecting, in the response, updated elements in the set of elements that have been updated from a value displayed in the user interface, and/or partially updating the user interface to reflect changes to the updated elements in the web page without refreshing other portions of the user interface.Type: GrantFiled: March 31, 2011Date of Patent: November 4, 2014Assignee: Cloudera, Inc.Inventors: Aaron Newton, Philip Zeyliger
-
Patent number: 8874526Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.Type: GrantFiled: September 8, 2010Date of Patent: October 28, 2014Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 8812457Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.Type: GrantFiled: September 8, 2010Date of Patent: August 19, 2014Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Publication number: 20130282668Abstract: Systems and methods for checking for region consistency and table integrity problems and automatically repairing a corrupted HBase cluster. The methods and systems operate in a diagnostic mode and a diagnostic and repair mode. The methods include fixing table integrity problems, such as backwards table regions, table region holes, table region overlap, and the like to restore table integrity invariant. Once the table integrity has been restored, each row key resolves to exactly one region. The methods further include fixing region inconsistencies, such as bad region assignment, no region present in the meta table, region information not in the Hadoop Distributed File System (HDFS), and the like to restore region consistency invariant. The information in the HDFS is taken as ground truth and any meta table or assignment problems that are inconsistent with the HDFS is deemed wrong and removed.Type: ApplicationFiled: March 15, 2013Publication date: October 24, 2013Applicant: CLOUDERA, INC.Inventor: Jonathan Ming-Cyn Hsieh
-
Publication number: 20130204948Abstract: Systems and methods for centralized configuration and monitoring of a distributed computing cluster are disclosed. One embodiment of the disclose technology enables deployment and central operation a complete Hadoop stack. The application automates the installation process and reduces deployment time from weeks to minutes. One embodiment further provides a cluster-wide, real time view of the services running and the status of the host machines in a cluster via a single, central place to enact configuration changes across the computing cluster which further incorporates reporting and diagnostic tools to optimize cluster performance and utilization.Type: ApplicationFiled: August 3, 2012Publication date: August 8, 2013Applicant: Cloudera, Inc.Inventors: Philip Zeyliger, Philip L. Langdale, Patrick D. Hunt
-
Publication number: 20130185337Abstract: Systems and methods of a memory allocation buffer to reduce heap fragmentation. In one embodiment, the memory allocation buffer structures a memory arena dedicated to a target region that is one of a plurality of regions in a server in a database cluster such as an HBase cluster. The memory area has a chunk size (e.g., 2 MB) and an offset pointer. Data objects in write requests targeted to the region are received and inserted to the memory arena at a location specified by the offset pointer. When the memory arena is filled, a new one is allocated. When a MemStore of the target region is flushed, the entire memory arenas for the target region are freed up. This reduces heap fragmentation that is responsible for long and/or frequent garbage collection pauses.Type: ApplicationFiled: January 18, 2013Publication date: July 18, 2013Applicant: CLOUDERA, INC.Inventor: Cloudera, Inc.