Patents by Inventor Yeh-Heng Sheng
Yeh-Heng Sheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11386108Abstract: Mining data transformation flows in spreadsheets includes identifying operations defined in a spreadsheet, identifying source data, in the spreadsheet, on which the operations operate, automatically creating an extract, transform, load (ETL) data transformation flow, and executing the created ETL data transformation flow. Creating the ETL data transformation flow includes selecting, in the ETL system, source data endpoint(s) for data extraction, selecting target data endpoint(s) for data loading, mapping at least one of the identified operations to ETL operation(s) for data transformation, and building the ETL data transformation flow, which defines extraction from the selected source data endpoint(s), transformation based on the ETL operation(s), and loading to the selected target data endpoint(s).Type: GrantFiled: December 4, 2018Date of Patent: July 12, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yeh-Heng Sheng, Xiaoyan Pu, Yong Li, Ryan Pham
-
Patent number: 11144566Abstract: A distributed network of participating Extract Transform Load (ETL) servers is received. Data source mappings are generated for the distributed network, where the data source mappings indicate which participating ETL servers in the distributed network have access to which tables in data sources. Network metrics are obtained that indicate, for each pair of participating ETL servers, an average data transmission speed and a unit cost. Data source metrics are obtained for the tables in the data sources. A link mappings table is generated that lists mappings of each link to a network in between participating ETL servers. A plurality of distributed execution plans are generated using the network metrics, the data source metrics, and the link mappings table. An execution plan is selected from the plurality of execution plans according to an optimization criteria. The selected execution plan is executed.Type: GrantFiled: September 12, 2018Date of Patent: October 12, 2021Assignee: International Business Machines CorporationInventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
-
Patent number: 10929417Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.Type: GrantFiled: September 11, 2015Date of Patent: February 23, 2021Assignee: International Business Machines CorporationInventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
-
Patent number: 10915544Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.Type: GrantFiled: August 3, 2016Date of Patent: February 9, 2021Assignee: International Business Machines CorporationInventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
-
Patent number: 10769300Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.Type: GrantFiled: June 26, 2019Date of Patent: September 8, 2020Assignee: International Business Machines CorporationInventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
-
Patent number: 10762234Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.Type: GrantFiled: March 8, 2018Date of Patent: September 1, 2020Assignee: International Business Machines CorporationInventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
-
Publication number: 20200175027Abstract: Mining data transformation flows in spreadsheets includes identifying operations defined in a spreadsheet, identifying source data, in the spreadsheet, on which the operations operate, automatically creating an extract, transform, load (ETL) data transformation flow, and executing the created ETL data transformation flow. Creating the ETL data transformation flow includes selecting, in the ETL system, source data endpoint(s) for data extraction, selecting target data endpoint(s) for data loading, mapping at least one of the identified operations to ETL operation(s) for data transformation, and building the ETL data transformation flow, which defines extraction from the selected source data endpoint(s), transformation based on the ETL operation(s), and loading to the selected target data endpoint(s).Type: ApplicationFiled: December 4, 2018Publication date: June 4, 2020Inventors: Yeh-Heng SHENG, Xiaoyan PU, Yong LI, Ryan PHAM
-
Patent number: 10606939Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.Type: GrantFiled: October 31, 2018Date of Patent: March 31, 2020Assignee: International Business Machines CorporationInventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
-
Publication number: 20190318123Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.Type: ApplicationFiled: June 26, 2019Publication date: October 17, 2019Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
-
Publication number: 20190278938Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.Type: ApplicationFiled: March 8, 2018Publication date: September 12, 2019Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
-
Publication number: 20190258705Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.Type: ApplicationFiled: October 31, 2018Publication date: August 22, 2019Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
-
Publication number: 20190258703Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.Type: ApplicationFiled: February 19, 2018Publication date: August 22, 2019Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
-
Patent number: 10387554Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.Type: GrantFiled: February 19, 2018Date of Patent: August 20, 2019Assignee: International Business Machines CorporationInventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
-
Publication number: 20190079981Abstract: A distributed network of participating Extract Transform Load (ETL) servers is received. Data source mappings are generated for the distributed network, where the data source mappings indicate which participating ETL servers in the distributed network have access to which tables in data sources. Network metrics are obtained that indicate, for each pair of participating ETL servers, an average data transmission speed and a unit cost. Data source metrics are obtained for the tables in the data sources. A link mappings table is generated that lists mappings of each link to a network in between participating ETL servers. A plurality of distributed execution plans are generated using the network metrics, the data source metrics, and the link mappings table. An execution plan is selected from the plurality of execution plans according to an optimization criteria. The selected execution plan is executed.Type: ApplicationFiled: September 12, 2018Publication date: March 14, 2019Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
-
Patent number: 10120918Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers using a data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments. The job segments each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems. The job segments are distributed to the participating ETL servers based on the mappings for parallel execution. Also, the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems.Type: GrantFiled: June 7, 2016Date of Patent: November 6, 2018Assignee: International Business Machines CorporationInventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
-
Patent number: 10108683Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers using a data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments. The job segments each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems. The job segments are distributed to the participating ETL servers based on the mappings for parallel execution. Also, the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems.Type: GrantFiled: April 24, 2015Date of Patent: October 23, 2018Assignee: International Business Machines CorporationInventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
-
Publication number: 20170075966Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.Type: ApplicationFiled: August 3, 2016Publication date: March 16, 2017Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
-
Publication number: 20170075964Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.Type: ApplicationFiled: September 11, 2015Publication date: March 16, 2017Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
-
Publication number: 20160314175Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers. A data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems. Each of the job segment is distributed to the participating ETL servers based on the mappings for parallel execution.Type: ApplicationFiled: April 24, 2015Publication date: October 27, 2016Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng
-
Publication number: 20160314176Abstract: Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers. A data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems. Each of the job segment is distributed to the participating ETL servers based on the mappings for parallel execution.Type: ApplicationFiled: June 7, 2016Publication date: October 27, 2016Inventors: Raghavendra R. Dhayapule, Jean-Claude Mamou, Yeh-Heng Sheng