Patents Assigned to StreamSets, Inc.
  • Patent number: 11966316
    Abstract: Systems and methods herein describe receiving identification from a data pipeline, accessing first data offset information for a first data origin and second data offset information for a second data origin, bisecting the first data origin using the first data offset information, processing the data pipeline with the bisected first data offset information and the second data offset information, receiving a notification indicating a data pipeline status, and causing presentation of the notification on a graphical user interface of a computing device.
    Type: Grant
    Filed: November 22, 2022
    Date of Patent: April 23, 2024
    Assignee: StreamSets, Inc.
    Inventor: Hari Shreedharan
  • Patent number: 11947779
    Abstract: Systems and methods herein describe accessing a data processing pipeline, causing presentation of the data processing pipeline on a graphical user interface of a computing device, receiving a selection of a first user interface element within the graphical user interface, generating a datafile representing the data processing pipeline, submitting the datafile and an application to a software framework using an application programming interface, receiving, from the application, the generated datasets, applying the data operations the data processing pipeline, collecting performance data metrics from the data processing pipeline, and dynamically updating the graphical user interface with the collected performance data metrics.
    Type: Grant
    Filed: April 24, 2023
    Date of Patent: April 2, 2024
    Assignee: StreamSets, Inc.
    Inventors: Hari Shreedharan, Arvind Prabhakar
  • Patent number: 11775371
    Abstract: Systems and methods are directed to remote validation and preview. An example system receives an indication of a portion of the data pipeline to be processed, generates a data pipeline configuration file describing operations in the portion of the data pipeline, causes a software framework to perform operations corresponding to the portion of the data pipeline, receives results of the operations corresponding to the portion of the data pipeline, and causes presentation of the results on a graphical user interface of a computing device.
    Type: Grant
    Filed: May 22, 2020
    Date of Patent: October 3, 2023
    Assignee: StreamSets, Inc.
    Inventor: Madhukar Devaraju
  • Patent number: 11734235
    Abstract: In various example embodiments, a system, computer readable medium and method for schema update engine dynamically updating a target data storage system. Incoming data records are received. A front-end schema of the incoming data records is identified. The front-end schema and the current target schema are compared. Based on identifying a difference between the front-end schema and the current target schema, the current target schema is updated in order to be identical to the front-end schema. The current target data file is closed and the incoming data records are stored in a new target data file according to the updated target schema.
    Type: Grant
    Filed: May 27, 2021
    Date of Patent: August 22, 2023
    Assignee: StreamSets, Inc.
    Inventors: Arvind Prabhakar, Alejandro Abdelnur, Madhukar Devaraju
  • Patent number: 11662882
    Abstract: Systems and methods herein describe accessing a data processing pipeline, causing presentation of the data processing pipeline on a graphical user interface of a computing device; receiving a selection of a first user interface element within the graphical user interface, generating a datafile representing the data processing pipeline, submitting the datafile and an application to a software framework using an application programming interface, receiving, from the application, the generated datasets, applying the data operations the data processing pipeline, collecting performance data metrics from the data processing pipeline, and dynamically updating the graphical user interface with the collected performance data metrics.
    Type: Grant
    Filed: April 22, 2020
    Date of Patent: May 30, 2023
    Assignee: StreamSets, Inc.
    Inventors: Hari Shreedharan, Arvind Prabhakar
  • Patent number: 11630840
    Abstract: Systems and methods herein describe embodiments for handling a data drift. An example system accesses the data pipeline, which is comprised of a plurality of stages. For each stage of the plurality of stages in the data pipeline, the system identifies stage schema fields for processing data in the data pipeline and generates a set of stage schema fields comprising the identified stage schema fields in the stage. In response to detecting an origin stage, the system generates a set of pipeline schema fields, whereby the set of pipeline schema fields comprise a union of the generated sets of stage schema fields. The set of pipeline schema fields are then stored.
    Type: Grant
    Filed: May 22, 2020
    Date of Patent: April 18, 2023
    Assignee: StreamSets, Inc.
    Inventor: Hari Shreedharan
  • Patent number: 11526415
    Abstract: Systems and methods herein describe receiving identification from a data pipeline, accessing first data offset information for a first data origin and second data offset information for a second data origin, bisecting the first data origin using the first data offset information, processing the data pipeline with the bisected first data offset information and the second data offset information, receiving a notification indicating a data pipeline status, and causing presentation of the notification on a graphical user interface of a computing device.
    Type: Grant
    Filed: April 22, 2020
    Date of Patent: December 13, 2022
    Assignee: StreamSets, Inc.
    Inventor: Hari Shreedharan
  • Patent number: 11200208
    Abstract: Systems and methods herein describe accessing an original change data capture (CDC) dataset comprising information describing changes to a source database, the original CDC dataset comprising a plurality of entries; identifying a first entry of the plurality of entries comprising a primary-key, a first operation and entry data; identifying a set of entries in the plurality of entries that includes the primary-key; comparing the first operation of the first entry with a second operation of a second entry in the set of entries; updating the first operation and the entry data based on the comparison; generating a new entry based on the updating of the first operation and the entry data; storing the new entry in a consolidated CDC dataset; and applying the consolidated CDC dataset to a target database.
    Type: Grant
    Filed: January 9, 2020
    Date of Patent: December 14, 2021
    Assignee: StreamSets, Inc.
    Inventor: Alejandro Humberto Abdelnur
  • Patent number: 11048673
    Abstract: In various example embodiments, a system, computer readable medium and method for schema update engine dynamically updating a target data storage system. Incoming data records are received. A front-end schema of the incoming data records is identified. The front-end schema and the current target schema are compared. Based on identifying a difference between the front-end schema and the current target schema, the current target schema is updated in order to be identical to the front-end schema. The current target data file is closed and the incoming data records are stored in a new target data file according to the updated target schema.
    Type: Grant
    Filed: June 15, 2018
    Date of Patent: June 29, 2021
    Assignee: StreamSets, Inc.
    Inventors: Arvind Prabhakar, Alejandro Abdelnur, Madhukar Devaraju
  • Patent number: 10678660
    Abstract: In various example embodiments, a system, computer-readable medium and method to detect and dynamically correct a transformation drift in a data pipeline, the method comprising detecting a change in a transformation performed by an upstream subsystem of the data pipeline on a data field of an output dataset of the upstream subsystem; classifying the data field as an impacted data field; identifying, based on the topology information, a downstream subsystem of the data pipeline downstream of the upstream subsystem; identifying an input dataset of the downstream subsystem including the impacted data field; and performing a corrective transformation on the impacted data field of the input dataset of the downstream subsystem
    Type: Grant
    Filed: June 26, 2018
    Date of Patent: June 9, 2020
    Assignee: StreamSets, Inc.
    Inventor: Rupal Jatinkumar Shah