Patents by Inventor Jonathan Ming-Cyn Hsieh
Jonathan Ming-Cyn Hsieh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11768739Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.Type: GrantFiled: July 30, 2020Date of Patent: September 26, 2023Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
-
Publication number: 20200356447Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.Type: ApplicationFiled: July 30, 2020Publication date: November 12, 2020Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
-
Publication number: 20200356448Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.Type: ApplicationFiled: July 30, 2020Publication date: November 12, 2020Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
-
Patent number: 10776217Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.Type: GrantFiled: May 25, 2017Date of Patent: September 15, 2020Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
-
Patent number: 10187461Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.Type: GrantFiled: April 13, 2016Date of Patent: January 22, 2019Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9842126Abstract: Systems and methods for checking for region consistency and table integrity problems and automatically repairing a corrupted HBase cluster. The methods and systems operate in a diagnostic mode and a diagnostic and repair mode. The methods include fixing table integrity problems, such as backwards table regions, table region holes, table region overlap, and the like to restore table integrity invariant. Once the table integrity has been restored, each row key resolves to exactly one region. The methods further include fixing region inconsistencies, such as bad region assignment, no region present in the meta table, region information not in the Hadoop Distributed File System (HDFS), and the like to restore region consistency invariant. The information in the HDFS is taken as ground truth and any meta table or assignment problems that are inconsistent with the HDFS is deemed wrong and removed.Type: GrantFiled: March 15, 2013Date of Patent: December 12, 2017Assignee: Cloudera, Inc.Inventor: Jonathan Ming-Cyn Hsieh
-
Patent number: 9817867Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.Type: GrantFiled: November 16, 2015Date of Patent: November 14, 2017Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9817859Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the hatch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the hatch of multiple messages received from the agent node.Type: GrantFiled: June 1, 2016Date of Patent: November 14, 2017Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Publication number: 20170262348Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.Type: ApplicationFiled: May 25, 2017Publication date: September 14, 2017Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
-
Patent number: 9690671Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.Type: GrantFiled: October 29, 2014Date of Patent: June 27, 2017Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
-
Publication number: 20160275136Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the hatch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the hatch of multiple messages received from the agent node.Type: ApplicationFiled: June 1, 2016Publication date: September 22, 2016Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Publication number: 20160226968Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.Type: ApplicationFiled: April 13, 2016Publication date: August 4, 2016Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9361203Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.Type: GrantFiled: July 10, 2015Date of Patent: June 7, 2016Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9317572Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.Type: GrantFiled: September 8, 2010Date of Patent: April 19, 2016Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Publication number: 20160070760Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.Type: ApplicationFiled: November 16, 2015Publication date: March 10, 2016Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9201910Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.Type: GrantFiled: August 18, 2014Date of Patent: December 1, 2015Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Publication number: 20150317231Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.Type: ApplicationFiled: July 10, 2015Publication date: November 5, 2015Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9081888Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.Type: GrantFiled: September 8, 2010Date of Patent: July 14, 2015Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Patent number: 9082127Abstract: Systems and methods of facilitating collecting and aggregating datasets that are machine or user-generated for analysis are disclosed. One embodiment includes, collecting a dataset on a machine on which the dataset is received or generated, wherein, the dataset is collected from a data source on the machine, aggregating the dataset collected from the data source at a receiving location, performing analytics on the dataset upon collection or aggregation, and/or writing the dataset aggregated at the receiving location to a storage location.Type: GrantFiled: September 8, 2010Date of Patent: July 14, 2015Assignee: Cloudera, Inc.Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
-
Publication number: 20150127608Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.Type: ApplicationFiled: October 29, 2014Publication date: May 7, 2015Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi