Patents by Inventor Jonathan Ming-Cyn Hsieh

Jonathan Ming-Cyn Hsieh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11768739
    Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.
    Type: Grant
    Filed: July 30, 2020
    Date of Patent: September 26, 2023
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
  • Publication number: 20200356447
    Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.
    Type: Application
    Filed: July 30, 2020
    Publication date: November 12, 2020
    Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
  • Publication number: 20200356448
    Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.
    Type: Application
    Filed: July 30, 2020
    Publication date: November 12, 2020
    Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
  • Patent number: 10776217
    Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.
    Type: Grant
    Filed: May 25, 2017
    Date of Patent: September 15, 2020
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
  • Patent number: 10187461
    Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.
    Type: Grant
    Filed: April 13, 2016
    Date of Patent: January 22, 2019
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9842126
    Abstract: Systems and methods for checking for region consistency and table integrity problems and automatically repairing a corrupted HBase cluster. The methods and systems operate in a diagnostic mode and a diagnostic and repair mode. The methods include fixing table integrity problems, such as backwards table regions, table region holes, table region overlap, and the like to restore table integrity invariant. Once the table integrity has been restored, each row key resolves to exactly one region. The methods further include fixing region inconsistencies, such as bad region assignment, no region present in the meta table, region information not in the Hadoop Distributed File System (HDFS), and the like to restore region consistency invariant. The information in the HDFS is taken as ground truth and any meta table or assignment problems that are inconsistent with the HDFS is deemed wrong and removed.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: December 12, 2017
    Assignee: Cloudera, Inc.
    Inventor: Jonathan Ming-Cyn Hsieh
  • Patent number: 9817867
    Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.
    Type: Grant
    Filed: November 16, 2015
    Date of Patent: November 14, 2017
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9817859
    Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the hatch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the hatch of multiple messages received from the agent node.
    Type: Grant
    Filed: June 1, 2016
    Date of Patent: November 14, 2017
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Publication number: 20170262348
    Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.
    Type: Application
    Filed: May 25, 2017
    Publication date: September 14, 2017
    Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
  • Patent number: 9690671
    Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.
    Type: Grant
    Filed: October 29, 2014
    Date of Patent: June 27, 2017
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi
  • Publication number: 20160275136
    Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the hatch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the hatch of multiple messages received from the agent node.
    Type: Application
    Filed: June 1, 2016
    Publication date: September 22, 2016
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Publication number: 20160226968
    Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.
    Type: Application
    Filed: April 13, 2016
    Publication date: August 4, 2016
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9361203
    Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.
    Type: Grant
    Filed: July 10, 2015
    Date of Patent: June 7, 2016
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9317572
    Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.
    Type: Grant
    Filed: September 8, 2010
    Date of Patent: April 19, 2016
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Publication number: 20160070760
    Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.
    Type: Application
    Filed: November 16, 2015
    Publication date: March 10, 2016
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9201910
    Abstract: Systems and methods of dynamically processing an event using an extensible data model are disclosed. One embodiment includes, specifying attributes of the event in a data model; the data model being extensible to add properties to the event as the dataset is streamed from the source to the sink.
    Type: Grant
    Filed: August 18, 2014
    Date of Patent: December 1, 2015
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Publication number: 20150317231
    Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.
    Type: Application
    Filed: July 10, 2015
    Publication date: November 5, 2015
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9081888
    Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.
    Type: Grant
    Filed: September 8, 2010
    Date of Patent: July 14, 2015
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Patent number: 9082127
    Abstract: Systems and methods of facilitating collecting and aggregating datasets that are machine or user-generated for analysis are disclosed. One embodiment includes, collecting a dataset on a machine on which the dataset is received or generated, wherein, the dataset is collected from a data source on the machine, aggregating the dataset collected from the data source at a receiving location, performing analytics on the dataset upon collection or aggregation, and/or writing the dataset aggregated at the receiving location to a storage location.
    Type: Grant
    Filed: September 8, 2010
    Date of Patent: July 14, 2015
    Assignee: Cloudera, Inc.
    Inventors: Jonathan Ming-Cyn Hsieh, Henry Noel Robinson
  • Publication number: 20150127608
    Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.
    Type: Application
    Filed: October 29, 2014
    Publication date: May 7, 2015
    Inventors: Jonathan Ming-Cyn Hsieh, Matteo Bertozzi