Patents by Inventor Yeh-Heng Sheng

Yeh-Heng Sheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Mining data transformation flows in spreadsheets

Patent number: 11386108

Abstract: Mining data transformation flows in spreadsheets includes identifying operations defined in a spreadsheet, identifying source data, in the spreadsheet, on which the operations operate, automatically creating an extract, transform, load (ETL) data transformation flow, and executing the created ETL data transformation flow. Creating the ETL data transformation flow includes selecting, in the ETL system, source data endpoint(s) for data extraction, selecting target data endpoint(s) for data loading, mapping at least one of the identified operations to ETL operation(s) for data transformation, and building the ETL data transformation flow, which defines extraction from the selected source data endpoint(s), transformation based on the ETL operation(s), and loading to the selected target data endpoint(s).

Type: Grant

Filed: December 4, 2018

Date of Patent: July 12, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yeh-Heng Sheng, Xiaoyan Pu, Yong Li, Ryan Pham
Distributed balanced optimization for an Extract, Transform, and Load (ETL) job

Patent number: 11144566

Abstract: A distributed network of participating Extract Transform Load (ETL) servers is received. Data source mappings are generated for the distributed network, where the data source mappings indicate which participating ETL servers in the distributed network have access to which tables in data sources. Network metrics are obtained that indicate, for each pair of participating ETL servers, an average data transmission speed and a unit cost. Data source metrics are obtained for the tables in the data sources. A link mappings table is generated that lists mappings of each link to a network in between participating ETL servers. A plurality of distributed execution plans are generated using the network metrics, the data source metrics, and the link mappings table. An execution plan is selected from the plurality of execution plans according to an optimization criteria. The selected execution plan is executed.

Type: Grant

Filed: September 12, 2018

Date of Patent: October 12, 2021

Assignee: International Business Machines Corporation

Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
Transforming and loading data utilizing in-memory processing

Patent number: 10929417

Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.

Type: Grant

Filed: September 11, 2015

Date of Patent: February 23, 2021

Assignee: International Business Machines Corporation

Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
Transforming and loading data utilizing in-memory processing

Patent number: 10915544

Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.

Type: Grant

Filed: August 3, 2016

Date of Patent: February 9, 2021

Assignee: International Business Machines Corporation

Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
Data processing in a hybrid cluster environment

Patent number: 10769300

Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.

Type: Grant

Filed: June 26, 2019

Date of Patent: September 8, 2020

Assignee: International Business Machines Corporation

Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
Data processing in a hybrid cluster environment

Patent number: 10762234

Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.

Type: Grant

Filed: March 8, 2018

Date of Patent: September 1, 2020

Assignee: International Business Machines Corporation

Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
MINING DATA TRANSFORMATION FLOWS IN SPREADSHEETS

Publication number: 20200175027

Abstract: Mining data transformation flows in spreadsheets includes identifying operations defined in a spreadsheet, identifying source data, in the spreadsheet, on which the operations operate, automatically creating an extract, transform, load (ETL) data transformation flow, and executing the created ETL data transformation flow. Creating the ETL data transformation flow includes selecting, in the ETL system, source data endpoint(s) for data extraction, selecting target data endpoint(s) for data loading, mapping at least one of the identified operations to ETL operation(s) for data transformation, and building the ETL data transformation flow, which defines extraction from the selected source data endpoint(s), transformation based on the ETL operation(s), and loading to the selected target data endpoint(s).

Type: Application

Filed: December 4, 2018

Publication date: June 4, 2020

Inventors: Yeh-Heng SHENG, Xiaoyan PU, Yong LI, Ryan PHAM
Applying matching data transformation information based on a user's editing of data within a document

Patent number: 10606939

Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.

Type: Grant

Filed: October 31, 2018

Date of Patent: March 31, 2020

Assignee: International Business Machines Corporation

Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
DATA PROCESSING IN A HYBRID CLUSTER ENVIRONMENT

Publication number: 20190318123

Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.

Type: Application

Filed: June 26, 2019

Publication date: October 17, 2019

Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
DATA PROCESSING IN A HYBRID CLUSTER ENVIRONMENT

Publication number: 20190278938

Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.

Type: Application

Filed: March 8, 2018

Publication date: September 12, 2019

Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
APPLYING MATCHING DATA TRANSFORMATION INFORMATION BASED ON A USER'S EDITING OF DATA WITHIN A DOCUMENT

Publication number: 20190258703

Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.

Type: Application

Filed: February 19, 2018

Publication date: August 22, 2019

Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
Applying Matching Data Transformation Information Based on a User's Editing of Data within a Document

Publication number: 20190258705

Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.

Type: Application

Filed: October 31, 2018

Publication date: August 22, 2019

Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
Applying matching data transformation information based on a user's editing of data within a document

Patent number: 10387554

Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.

Type: Grant

Filed: February 19, 2018

Date of Patent: August 20, 2019

Assignee: International Business Machines Corporation

Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
DISTRIBUTED BALANCED OPTIMIZATION FOR AN EXTRACT, TRANSFORM, AND LOAD (ETL) JOB

Publication number: 20190079981

Abstract: A distributed network of participating Extract Transform Load (ETL) servers is received. Data source mappings are generated for the distributed network, where the data source mappings indicate which participating ETL servers in the distributed network have access to which tables in data sources. Network metrics are obtained that indicate, for each pair of participating ETL servers, an average data transmission speed and a unit cost. Data source metrics are obtained for the tables in the data sources. A link mappings table is generated that lists mappings of each link to a network in between participating ETL servers. A plurality of distributed execution plans are generated using the network metrics, the data source metrics, and the link mappings table. An execution plan is selected from the plurality of execution plans according to an optimization criteria. The selected execution plan is executed.

Type: Application

Filed: September 12, 2018

Publication date: March 14, 2019

Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
Distributed balanced optimization for an extract, transform, and load (ETL) job

Patent number: 10120918

Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers using a data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments. The job segments each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems. The job segments are distributed to the participating ETL servers based on the mappings for parallel execution. Also, the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems.

Type: Grant

Filed: June 7, 2016

Date of Patent: November 6, 2018

Assignee: International Business Machines Corporation

Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
Distributed balanced optimization for an extract, transform, and load (ETL) job

Patent number: 10108683

Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers using a data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments. The job segments each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems. The job segments are distributed to the participating ETL servers based on the mappings for parallel execution. Also, the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems.

Type: Grant

Filed: April 24, 2015

Date of Patent: October 23, 2018

Assignee: International Business Machines Corporation

Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
TRANSFORMING AND LOADING DATA UTILIZING IN-MEMORY PROCESSING

Publication number: 20170075966

Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.

Type: Application

Filed: August 3, 2016

Publication date: March 16, 2017

Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
TRANSFORMING AND LOADING DATA UTILIZING IN-MEMORY PROCESSING

Publication number: 20170075964

Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.

Type: Application

Filed: September 11, 2015

Publication date: March 16, 2017

Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
DISTRIBUTED BALANCED OPTIMIZATION FOR AN EXTRACT, TRANSFORM, AND LOAD (ETL) JOB

Publication number: 20160314175

Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers. A data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems. Each of the job segment is distributed to the participating ETL servers based on the mappings for parallel execution.

Type: Application

Filed: April 24, 2015

Publication date: October 27, 2016

Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
DISTRIBUTED BALANCED OPTIMIZATION FOR AN EXTRACT, TRANSFORM, AND LOAD (ETL) JOB

Publication number: 20160314176

Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers. A data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems. Each of the job segment is distributed to the participating ETL servers based on the mappings for parallel execution.

Type: Application

Filed: June 7, 2016

Publication date: October 27, 2016

Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng