Patents by Inventor Yeh-Heng Sheng

Yeh-Heng Sheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11386108
    Abstract: Mining data transformation flows in spreadsheets includes identifying operations defined in a spreadsheet, identifying source data, in the spreadsheet, on which the operations operate, automatically creating an extract, transform, load (ETL) data transformation flow, and executing the created ETL data transformation flow. Creating the ETL data transformation flow includes selecting, in the ETL system, source data endpoint(s) for data extraction, selecting target data endpoint(s) for data loading, mapping at least one of the identified operations to ETL operation(s) for data transformation, and building the ETL data transformation flow, which defines extraction from the selected source data endpoint(s), transformation based on the ETL operation(s), and loading to the selected target data endpoint(s).
    Type: Grant
    Filed: December 4, 2018
    Date of Patent: July 12, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yeh-Heng Sheng, Xiaoyan Pu, Yong Li, Ryan Pham
  • Patent number: 11144566
    Abstract: A distributed network of participating Extract Transform Load (ETL) servers is received. Data source mappings are generated for the distributed network, where the data source mappings indicate which participating ETL servers in the distributed network have access to which tables in data sources. Network metrics are obtained that indicate, for each pair of participating ETL servers, an average data transmission speed and a unit cost. Data source metrics are obtained for the tables in the data sources. A link mappings table is generated that lists mappings of each link to a network in between participating ETL servers. A plurality of distributed execution plans are generated using the network metrics, the data source metrics, and the link mappings table. An execution plan is selected from the plurality of execution plans according to an optimization criteria. The selected execution plan is executed.
    Type: Grant
    Filed: September 12, 2018
    Date of Patent: October 12, 2021
    Assignee: International Business Machines Corporation
    Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
  • Patent number: 10929417
    Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: February 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10915544
    Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.
    Type: Grant
    Filed: August 3, 2016
    Date of Patent: February 9, 2021
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10769300
    Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: September 8, 2020
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10762234
    Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.
    Type: Grant
    Filed: March 8, 2018
    Date of Patent: September 1, 2020
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20200175027
    Abstract: Mining data transformation flows in spreadsheets includes identifying operations defined in a spreadsheet, identifying source data, in the spreadsheet, on which the operations operate, automatically creating an extract, transform, load (ETL) data transformation flow, and executing the created ETL data transformation flow. Creating the ETL data transformation flow includes selecting, in the ETL system, source data endpoint(s) for data extraction, selecting target data endpoint(s) for data loading, mapping at least one of the identified operations to ETL operation(s) for data transformation, and building the ETL data transformation flow, which defines extraction from the selected source data endpoint(s), transformation based on the ETL operation(s), and loading to the selected target data endpoint(s).
    Type: Application
    Filed: December 4, 2018
    Publication date: June 4, 2020
    Inventors: Yeh-Heng SHENG, Xiaoyan PU, Yong LI, Ryan PHAM
  • Patent number: 10606939
    Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: March 31, 2020
    Assignee: International Business Machines Corporation
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190318123
    Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.
    Type: Application
    Filed: June 26, 2019
    Publication date: October 17, 2019
    Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190278938
    Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.
    Type: Application
    Filed: March 8, 2018
    Publication date: September 12, 2019
    Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190258703
    Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.
    Type: Application
    Filed: February 19, 2018
    Publication date: August 22, 2019
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190258705
    Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.
    Type: Application
    Filed: October 31, 2018
    Publication date: August 22, 2019
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10387554
    Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.
    Type: Grant
    Filed: February 19, 2018
    Date of Patent: August 20, 2019
    Assignee: International Business Machines Corporation
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190079981
    Abstract: A distributed network of participating Extract Transform Load (ETL) servers is received. Data source mappings are generated for the distributed network, where the data source mappings indicate which participating ETL servers in the distributed network have access to which tables in data sources. Network metrics are obtained that indicate, for each pair of participating ETL servers, an average data transmission speed and a unit cost. Data source metrics are obtained for the tables in the data sources. A link mappings table is generated that lists mappings of each link to a network in between participating ETL servers. A plurality of distributed execution plans are generated using the network metrics, the data source metrics, and the link mappings table. An execution plan is selected from the plurality of execution plans according to an optimization criteria. The selected execution plan is executed.
    Type: Application
    Filed: September 12, 2018
    Publication date: March 14, 2019
    Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
  • Patent number: 10120918
    Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers using a data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments. The job segments each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems. The job segments are distributed to the participating ETL servers based on the mappings for parallel execution. Also, the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems.
    Type: Grant
    Filed: June 7, 2016
    Date of Patent: November 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
  • Patent number: 10108683
    Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers using a data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments. The job segments each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems. The job segments are distributed to the participating ETL servers based on the mappings for parallel execution. Also, the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems.
    Type: Grant
    Filed: April 24, 2015
    Date of Patent: October 23, 2018
    Assignee: International Business Machines Corporation
    Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
  • Publication number: 20170075966
    Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.
    Type: Application
    Filed: August 3, 2016
    Publication date: March 16, 2017
    Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20170075964
    Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.
    Type: Application
    Filed: September 11, 2015
    Publication date: March 16, 2017
    Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20160314175
    Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers. A data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems. Each of the job segment is distributed to the participating ETL servers based on the mappings for parallel execution.
    Type: Application
    Filed: April 24, 2015
    Publication date: October 27, 2016
    Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
  • Publication number: 20160314176
    Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers. A data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems. Each of the job segment is distributed to the participating ETL servers based on the mappings for parallel execution.
    Type: Application
    Filed: June 7, 2016
    Publication date: October 27, 2016
    Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng