Patents by Inventor James Xu

James Xu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12293438
    Abstract: In an approach for post-modeling data visualization and analysis, a processor presents a first visualization of a training dataset in a first plot. Responsive to receiving a selection of a data group of the training dataset to analyze, a processor identifies three or fewer key model features of the data group of the training dataset. A processor ascertains a representative record of each key model feature of the three or fewer key model features using a Local Interpretable Model-Agnostic Explanation technique. A processor presents a second visualization of the three or fewer key model features and the representative record of each key model feature in a second plot.
    Type: Grant
    Filed: December 13, 2022
    Date of Patent: May 6, 2025
    Assignee: International Business Machines Corporation
    Inventors: Wen Pei Yu, Xiao Ming Ma, Xue Ying Zhang, Si Er Han, Jing James Xu, Jing Xu, Jun Wang
  • Publication number: 20250139500
    Abstract: Determining whether synthetic data is sufficient for utilization in connection with one or more machine learning models. The computing device accesses a protected batch of data associated with a machine learning model. The computing device accesses a simulated batch of data, the simulated batch of data based upon but anonymizing the protected batch of data. The computing device accesses one or more comparisons of one or more variables in the protected batch of data and the simulated batch of data to obtain a similarity value. The computing device performs a machine learning function utilizing at least in-part the simulated batch of data if the similarity value exceeds a similarity threshold.
    Type: Application
    Filed: October 30, 2023
    Publication date: May 1, 2025
    Inventors: Xiao Ming Ma, Si Er Han, Xue Ying Zhang, Jing James Xu, Jing Xu, Ji Hui Yang, Rui Wang
  • Publication number: 20250131116
    Abstract: An embodiment configures a plurality of parameters, the parameters being usable to generate artificial data from original data, the configuring adjusting a level of privacy in the artificial data. An embodiment fits a distribution type to a variable of the original data. An embodiment adjusts, using a desired level of privacy and the distribution type, a level of noise, wherein the level of noise corresponds to the desired level of privacy. An embodiment generates, using the distribution type and the level of noise, the artificial data, the artificial data achieving the desired level of privacy by including noise data corresponding to the level of noise.
    Type: Application
    Filed: October 20, 2023
    Publication date: April 24, 2025
    Applicant: International Business Machines Corporation
    Inventors: Si Er Han, Jing Xu, Xiao Ming Ma, Jing James Xu, Jiang Bo Kang, Xue Ying Zhang, Jun Wang, Ji Hui Yang
  • Publication number: 20250124052
    Abstract: A computer-implemented method for generating an artificial data set is provided. Aspects include obtaining an input data set, calculating an association between the plurality of categorical variables of the input data set, and creating, based on the association, a plurality of clusters of categorical variables. Aspects also include identifying a key variable for each of the plurality of clusters of categorical variables, creating a key cluster for each of the plurality of clusters, and creating a cluster contingency table for each of the clusters. Aspects further include generating, based on the cluster contingency table for each of the plurality of clusters and for the key cluster, a data set for each of the plurality of clusters and the key cluster and generating the artificial data set based on a combination of the data set for each of the plurality of clusters and the key cluster.
    Type: Application
    Filed: October 12, 2023
    Publication date: April 17, 2025
    Inventors: Si Er Han, Xiao Ming Ma, Rui Wang, Jing James Xu, Jing Xu, Xue Ying Zhang, Lei Tian, Dong Hai Yu
  • Publication number: 20250117443
    Abstract: A computer-implemented method for performing data difference evaluation is provided. Aspects include obtaining a first data set and a second data set, creating a first plurality of feature vectors by inputting the first data set into each of a plurality of models, and creating a second plurality of feature vectors by inputting the second data set into each of the plurality of models. Aspects also include identifying a mapping between elements of the first plurality of vectors and elements the second plurality of feature vectors created by a same model of the plurality of models, calculating, for each of the plurality of models based at least in part on the mapping, a model distance between the first data set and the second data set, and calculating, based at least in part on the model distances, an ensemble distance between first data set and the second data set.
    Type: Application
    Filed: October 9, 2023
    Publication date: April 10, 2025
    Inventors: Lei Tian, Han Zhang, Jing James Xu, Xue Ying Zhang, Si Er Han
  • Publication number: 20250094267
    Abstract: A time series anomaly detection method, system, and computer program product that processes time series data includes absorbing profiles of the time series data and anomaly types of a model as features, optimizing biased ranks to create optimized ranks through merging initial ranks with new ranks generated by real anomalies, and auto-suggesting the optimized ranks for saving a predetermined amount of data operation.
    Type: Application
    Filed: September 15, 2023
    Publication date: March 20, 2025
    Inventors: Jun Wang, Jing Xu, Xiao Ming Ma, Xue Ying Zhang, Si Er Han, Jing James Xu, Wen Pei Yu
  • Patent number: 12249012
    Abstract: A method, computer system, and a computer program product are provided for post-modeling feature evaluation. In one embodiment, at least at least one post model visual output and associated data is obtained that at least includes an individual conditional expectation (ICE) plot and a partial dependence (PDP) plot. Using the associated data and the plots, a Feature Importance (PI) plot is provided. A plurality of features is then determined for each PI, PDP and ICE plots to calculate at least one Interesting Value for each plot. An overall score is also calculated for each plurality of features based on the associated Interesting Values for each PDP, ICE and PI plots. At least one top feature is selected based on said scores. A final plot is then generated at least reflecting the top feature. The final plot combines the PI, PDP and ICE plots together.
    Type: Grant
    Filed: November 17, 2022
    Date of Patent: March 11, 2025
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xiao Ming Ma, Wen Pei Yu, Jing James Xu, Xue Ying Zhang, Si Er Han, Jing Xu, Jun Wang
  • Patent number: 12242367
    Abstract: Disclosed are a computer-implemented method, a system and a computer program product for model exploration. Model feature importance of each model of a plurality of models can be obtained, the plurality of models can be grouped into a plurality of model clusters based on the model feature importance of each model, and the model feature importance can be presented by box-plot or confidence interval.
    Type: Grant
    Filed: May 15, 2022
    Date of Patent: March 4, 2025
    Assignee: International Business Machines Corporation
    Inventors: Jing Xu, Xue Ying Zhang, Si Er Han, Jing James Xu, Xiao Ming Ma, Jun Wang, Wen Pei Yu
  • Patent number: 12243065
    Abstract: A computing system is configured to generate a predictive model during training of a machine learning program using a training data set including a personal data set of a plurality of first users. The predictive model is configured to generate a predicted assessment score with respect to a second user by correlating a personal data set of the second user to the personal data set of at least one of the first users, with the generating of the predicted assessment score occurring automatically when a data entry of the personal data set of the second user is determined to have changed by the computing system. The computing system is configured to report the automatically generated predicted assessment score to the second user via a user device of the second user.
    Type: Grant
    Filed: July 27, 2022
    Date of Patent: March 4, 2025
    Assignee: TRUIST BANK
    Inventors: Dontá Lamar Wilson, Jane Moury Kane, Kenneth William Cluff, Peter Councill, Qing Li, James Xu
  • Publication number: 20250053858
    Abstract: In an approach, a processor selects a top N features for a machine learning (ML) model; discretizes values of each continuous feature of the top N features; generates a set of combination values that each represent a unique combination of feature values in for a data record; predicts, using the ML model, a target value for each record generating predicted target values; groups the predicted target values based on the combination value for each respective record; fits a distribution for each grouping of the predicted target values associated with a respective combination value generating a set of distributions; clusters and refits the set of distributions using a clustering algorithm resulting in a set of clusters and a refitted distribution for each cluster of the set of clusters; and outputs a visualization of the refitted distribution for each cluster as a distribution curve on a graph along with the associated records.
    Type: Application
    Filed: August 8, 2023
    Publication date: February 13, 2025
    Inventors: Si Er Han, Xiao Ming Ma, Wen Pei Yu, Xue Ying Zhang, Jing Xu, Jing James Xu, Jun Wang, Lei Tian
  • Publication number: 20240427684
    Abstract: A computer-implemented method, a system and a computer program product for abnormal point simulation are disclosed. A processor analyzes a plurality of data blocks in first time series data to determine traits of respective data blocks. For the respective data blocks, a processor simulates one or more abnormal points based on the traits of the respective data blocks.
    Type: Application
    Filed: June 20, 2023
    Publication date: December 26, 2024
    Inventors: Si Er Han, Xiao Ming Ma, Jun Wang, Wen Pei Yu, Xue Ying Zhang, Jing James Xu, Jing Xu
  • Publication number: 20240411783
    Abstract: A computer-implemented method for treating post-modeling data includes computing, sequentially for each category of a feature, a category importance (CI) value. The CI value is based on a model accuracy change when records of a category being examined are reassigned to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include all categories of the feature, except for the category being examined. A post-modeling category is performed to merge of each category having the CI value less than a CI value threshold.
    Type: Application
    Filed: June 12, 2023
    Publication date: December 12, 2024
    Inventors: Xue Ying Zhang, Si Er Han, Jing Xu, Xiao Ming Ma, Wen Pei Yu, Jing James Xu, Jun Wang, Ji Hui Yang
  • Patent number: 12153953
    Abstract: Mechanisms are provided for intelligently identifying an execution environment to execute a computing job. An execution time of the computing job in each execution environment of a plurality of execution environments is predicted by applying a set of existing machine learning models matching execution context information and key parameters of the computing job and execution environment information of the execution environment. The predicted execution time of the machine learning models is aggregated. The aggregated predicted execution times of the computing job are summarized for the plurality of execution environments. Responsive to a selection of an execution environment from the plurality of execution environments based on the summary of the aggregated predicted execution times of the computing job, the computing job is executed in the selected execution environment. Related data during the execution of the computing job in the selected execution environment is collected.
    Type: Grant
    Filed: April 8, 2021
    Date of Patent: November 26, 2024
    Assignee: International Business Machines Corporation
    Inventors: A Peng Zhang, Lei Gao, Jin Wang, Jing James Xu, Jun Wang, Dong Hai Yu
  • Patent number: 12056622
    Abstract: A method for identifying influential effects that contribute most to a status change of a target index for goal seeking analysis. The method includes generating a candidate list of significant changed predictors between the normal and abnormal status time periods in collected data, and building a plurality of regression models from the collected data. The method determines a first value (trend value or Pearson correlation value) for each of the significant changed predictors based on whether at least one of the significant changed predictors have a significant change trend using the regression models. The method obtains a second predictor importance value for each of the significant changed predictors from a single model built on all the collected data. The method generates a final predictor value for each of the significant changed predictors by combining the first value with the second predictor importance value for each of the significant changed predictors.
    Type: Grant
    Filed: February 3, 2021
    Date of Patent: August 6, 2024
    Assignee: International Business Machines Corporation
    Inventors: Jing James Xu, Lei Gao, A Peng Zhang, Rui Wang, Si Er Han, Xiao Ming Ma
  • Publication number: 20240256637
    Abstract: A computer implemented method manages an ensemble model system to classify records. A number of processor units cluster records into groups of records based on classification predictions generated by base models in the ensemble model system for the records. The number of processor units determines sets of weights for the base models that increase a probability that the base models in the ensemble model system correctly predict the groups of records. Each set of weights in the sets of weights is associated with a group of records in the groups of records.
    Type: Application
    Filed: January 27, 2023
    Publication date: August 1, 2024
    Inventors: Si Er Han, Xue Ying Zhang, Jing Xu, Jing James Xu, Xiao Ming Ma, Wen Pei Yu, Jun Wang, Ji Hui Yang
  • Patent number: 12014026
    Abstract: Using a set of menu to key process mappings, historical menu usage data for an application is aggregated into aggregated key process usage data. A set of key process association rules, each comprising a consequent key process given a particular antecedent key process, is generated. From the set of key process association rules and a set of ranked menus by frequency of usage within each key process, a set of model menu recommendations is generated. According to an application usage history, a menu frequency ratio, and a confidence value of a modelled next menu, the set of menu recommendations is scored. A scored menu recommendation having a rank below a threshold rank is pruned from a set of menu items of the application ranked according to their scores. The pruned set of scored menu recommendations is presented for selection instead of the set of menu items.
    Type: Grant
    Filed: April 21, 2023
    Date of Patent: June 18, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Long Fan, Yang Yang, Ye Fan, Juan Wu, Qi Mao, Jing James Xu
  • Publication number: 20240193830
    Abstract: In an approach for post-modeling data visualization and analysis, a processor presents a first visualization of a training dataset in a first plot. Responsive to receiving a selection of a data group of the training dataset to analyze, a processor identifies three or fewer key model features of the data group of the training dataset. A processor ascertains a representative record of each key model feature of the three or fewer key model features using a Local Interpretable Model-Agnostic Explanation technique. A processor presents a second visualization of the three or fewer key model features and the representative record of each key model feature in a second plot.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 13, 2024
    Inventors: Wen Pei Yu, Xiao Ming Ma, Xue Ying Zhang, Si Er Han, Jing James Xu, Jing Xu, Jun Wang
  • Publication number: 20240169614
    Abstract: A method, computer system, and a computer program product are provided for post-modeling feature evaluation. In one embodiment, at least at least one post model visual output and associated data is obtained that at least includes an individual conditional expectation (ICE) plot and a partial dependence (PDP) plot. Using the associated data and the plots, a Feature Importance (PI) plot is provided. A plurality of features is then determined for each PI, PDP and ICE plots to calculate at least one Interesting Value for each plot. An overall score is also calculated for each plurality of features based on the associated Interesting Values for each PDP, ICE and PI plots. At least one top feature is selected based on said scores. A final plot is then generated at least reflecting the top feature. The final plot combines the PI, PDP and ICE plots together.
    Type: Application
    Filed: November 17, 2022
    Publication date: May 23, 2024
    Inventors: Xiao Ming Ma, Wen Pei Yu, Jing James Xu, Xue Ying Zhang, Si Er Han, Jing Xu, Jun Wang
  • Patent number: 11971796
    Abstract: An approach is provided in which the approach builds a combination model that includes a normal status model and an abnormal status model. The normal status model is built from a set of time-sequenced normal status records and the abnormal status model is built from a set of time-sequenced abnormal status records. The approach computes a set of time-sequenced coefficient combination values of the normal status model and the abnormal status model based on applying a set of fitting coefficient characteristics to the normal status model and the abnormal status model. The approach performs goal seek analysis on a system using the combination model and the set of time-sequenced coefficient combination values.
    Type: Grant
    Filed: May 18, 2021
    Date of Patent: April 30, 2024
    Assignee: International Business Machines Corporation
    Inventors: Xiao Ming Ma, Si Er Han, Lei Gao, A Peng Zhang, Chun Lei Xu, Rui Wang, Jing James Xu
  • Patent number: 11966340
    Abstract: To automate time series forecasting machine learning pipeline generation, a data allocation size of time series data may be determined based on one or more characteristics of a time series data set. The time series data may be allocated for use by candidate machine learning pipelines based on the data allocation size. Features for the time series data may be determined and cached by the candidate machine learning pipelines. Predictions of each of the candidate machine learning pipelines using at least the one or more features may be evaluated. A ranked list of machine learning pipelines may be automatically generated from the candidate machine learning pipelines for time series forecasting based upon evaluating predictions of each of the one or more candidate machine learning pipelines.
    Type: Grant
    Filed: March 15, 2022
    Date of Patent: April 23, 2024
    Assignee: International Business Machines Corporation
    Inventors: Long Vu, Bei Chen, Xuan-Hong Dang, Peter Daniel Kirchner, Syed Yousaf Shah, Dhavalkumar C. Patel, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Gregory Bramble, Horst Cornelius Samulowitz, Saket K. Sathe, Wesley M. Gifford, Petros Zerfos