Patents by Inventor Xiaoyan Pu

Xiaoyan Pu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11386108
    Abstract: Mining data transformation flows in spreadsheets includes identifying operations defined in a spreadsheet, identifying source data, in the spreadsheet, on which the operations operate, automatically creating an extract, transform, load (ETL) data transformation flow, and executing the created ETL data transformation flow. Creating the ETL data transformation flow includes selecting, in the ETL system, source data endpoint(s) for data extraction, selecting target data endpoint(s) for data loading, mapping at least one of the identified operations to ETL operation(s) for data transformation, and building the ETL data transformation flow, which defines extraction from the selected source data endpoint(s), transformation based on the ETL operation(s), and loading to the selected target data endpoint(s).
    Type: Grant
    Filed: December 4, 2018
    Date of Patent: July 12, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yeh-Heng Sheng, Xiaoyan Pu, Yong Li, Ryan Pham
  • Patent number: 10929417
    Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: February 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10915544
    Abstract: A system includes at least one processor and processes an ETL job. The system analyzes a specification of the ETL job including one or more functional expressions to load data from one or more source data stores, process the data in memory, and store the processed data to one or more target data stores. One or more data flows are produced from the specification based on the one or more functional expressions. The one or more data flows utilize in-memory distributed data sets generated to accommodate parallel processing for loading and processing the data. The one or more data flows are optimized to assign operations to be performed on the one or more source data stores. The optimized data flows are executed to load the data to the one or more target data stores in accordance with the specification. Present invention embodiments further include methods and computer program products.
    Type: Grant
    Filed: August 3, 2016
    Date of Patent: February 9, 2021
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Yong Li, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10769300
    Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: September 8, 2020
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10762234
    Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.
    Type: Grant
    Filed: March 8, 2018
    Date of Patent: September 1, 2020
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20200175027
    Abstract: Mining data transformation flows in spreadsheets includes identifying operations defined in a spreadsheet, identifying source data, in the spreadsheet, on which the operations operate, automatically creating an extract, transform, load (ETL) data transformation flow, and executing the created ETL data transformation flow. Creating the ETL data transformation flow includes selecting, in the ETL system, source data endpoint(s) for data extraction, selecting target data endpoint(s) for data loading, mapping at least one of the identified operations to ETL operation(s) for data transformation, and building the ETL data transformation flow, which defines extraction from the selected source data endpoint(s), transformation based on the ETL operation(s), and loading to the selected target data endpoint(s).
    Type: Application
    Filed: December 4, 2018
    Publication date: June 4, 2020
    Inventors: Yeh-Heng SHENG, Xiaoyan PU, Yong LI, Ryan PHAM
  • Patent number: 10606939
    Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: March 31, 2020
    Assignee: International Business Machines Corporation
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190318123
    Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.
    Type: Application
    Filed: June 26, 2019
    Publication date: October 17, 2019
    Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190278938
    Abstract: A hybrid cluster environment with a public cloud cluster having nodes storing data and a plurality of private clusters is provided, wherein each of the plurality of private clusters has nodes storing data. Registration data that indicates a customer identifier, a new private cluster, and a file transfer server is received. The new private cluster is added to the plurality of private clusters in the hybrid cluster environment. Input to design a job to process data in the hybrid cluster environment is received. It is determined that the job is to be deployed to the new private cluster. The job is deployed to the new private cluster using the file transfer server, wherein the job is executed at the new private cluster. Job status information and one or more job logs are received with the file transfer server.
    Type: Application
    Filed: March 8, 2018
    Publication date: September 12, 2019
    Inventors: Lawrence A. Greene, Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190258705
    Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.
    Type: Application
    Filed: October 31, 2018
    Publication date: August 22, 2019
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Publication number: 20190258703
    Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.
    Type: Application
    Filed: February 19, 2018
    Publication date: August 22, 2019
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10387554
    Abstract: A mechanism is provided for applying matching data transformation information based on a user's editing of data within a document. User input identifying inputs provided by a user while editing a document within an application executing on the data processing system is received. A matching algorithm is executed based on user input to identify one or more candidate transformation operations within a transformation operation data structure that matches the user input. Responsive to failing to identify any candidate transformation operations, an indication is provided that no candidate transformation operations are identifiable. Responsive to one or more candidate transformation operations being identified, a list of transformation operations is provided that includes the one or more candidate transformation operations to the user via the data processing system.
    Type: Grant
    Filed: February 19, 2018
    Date of Patent: August 20, 2019
    Assignee: International Business Machines Corporation
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu, Yeh-Heng Sheng
  • Patent number: 10380137
    Abstract: A user-defined function (UDF) is received in a central Computer System, which causes registration of the UDF and distributes the UDF to a cluster of computer system nodes configured for performing, in volatile memory of the nodes, extract-transform-load processing of data cached in the volatile memory of the nodes. First and second job specifications that include the UDF are received by the central Computer System, and the central computer system distributes instructions for the job specifications to the nodes including at least one instruction that invokes the UDF for loading and executing the UDF in the volatile memory of at least one of the nodes during runtime of the jobs. The central Computer System does not cause registration of the UDF again after receiving the first job specification.
    Type: Grant
    Filed: October 11, 2016
    Date of Patent: August 13, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yong Li, Ryan Pham, Xiaoyan Pu
  • Patent number: 10333800
    Abstract: Provided are a computer program product, system, and method for allocating physical nodes for processes in an execution plan. An execution plan is generated indicating a plurality of processes. A resource requirement is generated indicating requested physical nodes and an assignment of the processes to execute on the requested physical nodes. A determination is made from the resource requirement of a resource allocation of physical nodes for the requested physical nodes and the processes. The execution plan is updated to generate an updated execution plan indicating the physical nodes on which the processes will execute according to the received resource allocation.
    Type: Grant
    Filed: July 12, 2017
    Date of Patent: June 25, 2019
    Assignee: International Business Machines Corporation
    Inventors: Krishna K. Bonagiri, Eric A. Jacobson, Yong Li, Xiaoyan Pu
  • Patent number: 10296384
    Abstract: An approach for deploying workload in a multi-tenancy computing environment is provided. The approach generates, by one or more computer processors, a tenant ID and a plan ID for a tenant based, at least in part, on a tenant registration request. The approach stores, by one or more computer processors, the tenant ID and the plan ID into a shared system record. The approach receives, by one or more computer processors, a request to update a first tenant service plan. The approach determines, by one or more computer processors, one or more resource pools supporting a second tenant service plan based at least in part, on an association between the tenant ID and the plan ID. The approach deploys, by one or more computer processors, one or more resources from the one or more resource pools supporting the second tenant service plan.
    Type: Grant
    Filed: February 29, 2016
    Date of Patent: May 21, 2019
    Assignee: International Business Machines Corporation
    Inventors: Yong Li, Jean-Claude Mamou, David T. Meeks, Xiaoyan Pu
  • Patent number: 10031831
    Abstract: At least one application in a computing environment is executed and one or more performance metrics of the application are measured. The measured performance metrics are analyzed and an operational performance regression is detected. The detected operational performance regression is correlated with one or more recorded changes and the correlated changes are identified as a cause of the operational performance regression. The elements of the computing environment are alerted in accordance with the identified changes to adjust operational performance.
    Type: Grant
    Filed: April 23, 2015
    Date of Patent: July 24, 2018
    Assignee: International Business Machines Corporation
    Inventors: Lawrence A. Greene, Eric A. Jacobson, Yong Li, Xiaoyan Pu
  • Patent number: 10025838
    Abstract: A method for extract transform load (ETL) input suggestions for an ETL system in which a current job is being created. A method includes: determining when a new input is made in the current job in the ETL system and dynamically receiving the new input which includes a connection between stages input or a property of a stage input; updating stored information relating to the current job with the new input; accessing rules which apply to the current job; analyzing and applying the rules based on the new input and the current job stored information to generate one or more suggested next inputs in the current job; providing a weighting for the one or more suggested next inputs based on the analysis and application of the rules; and providing a prompt in the current job in the ETL system with the suggested one or more next inputs and their weightings.
    Type: Grant
    Filed: August 26, 2016
    Date of Patent: July 17, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joseph Bangs, Leonard D. Greenwood, Arron J. Harden, Xiaoyan Pu, Julian J. Vizor
  • Patent number: 9996389
    Abstract: Embodiments presented herein provide techniques for optimizing parallel data flows of a batch processing job using a profile of the processing job. An application retrieves a job profile for a processing job. The processing job has a plurality of processing stages specified in an execution profile. The job profile includes statistical data for at least one of the processing stages obtained during prior executions of the job. The application modifies properties of the execution profile based on the job profile to optimize the execution of the job. The application executes the processing job with the modified execution profile.
    Type: Grant
    Filed: March 11, 2014
    Date of Patent: June 12, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian K. Caufield, Lawrence A. Greene, Eric A. Jacobson, Yong Li, Xiaoyan Pu
  • Patent number: 9983906
    Abstract: Embodiments presented herein provide techniques for optimizing parallel data flows of a batch processing job using a profile of the processing job. An application retrieves a job profile for a processing job. The processing job has a plurality of processing stages specified in an execution profile. The job profile includes statistical data for at least one of the processing stages obtained during prior executions of the job. The application modifies properties of the execution profile based on the job profile to optimize the execution of the job. The application executes the processing job with the modified execution profile.
    Type: Grant
    Filed: February 13, 2015
    Date of Patent: May 29, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian K. Caufield, Lawrence A. Greene, Eric A. Jacobson, Yong Li, Xiaoyan Pu
  • Patent number: 9973512
    Abstract: A method includes a workload management (WLM) server that receives a first CHECK WORKLOAD command for a workload in a queue of the WLM server. It may be determined whether the workload is ready to run on a WLM client. If the workload is not ready to run, a wait time for the workload with the WLM server is dynamically estimated. The wait time is sent to the WLM client. If the workload is ready to run, then a response is sent to the WLM client that workload is ready to run.
    Type: Grant
    Filed: January 15, 2016
    Date of Patent: May 15, 2018
    Assignee: International Business Machines Corporation
    Inventors: Yong Li, Hanson Lieu, Ron Liu, Xiaoyan Pu