Patents by Inventor Xiao-Ming Ma

Xiao-Ming Ma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DETERMINING TIME SERIES MODEL STABILITY AND ROBUSTNESS IN REFRESHMENT

Publication number: 20250148350

Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to determining time series model stability and robustness in refreshment. The computer-implemented system can comprise a memory that can store computer executable components. The computer-implemented system can further comprise a processor that can execute the computer executable components stored in the memory, wherein the computer executable components can comprise a computation component that can employ weighted model evaluation to compute stability of time series pipelines over respective holdout datasets and a determination component that can select, based on the computed pipeline stabilities, a most stable time series pipeline.

Type: Application

Filed: November 2, 2023

Publication date: May 8, 2025

Inventors: Jiang Bo Kang, Dong Hai Yu, Jun Wang, Yao Dong Liu, Bo Song, Xiao Ming Ma
Visualize data and significant records based on relationship with the model

Patent number: 12293438

Abstract: In an approach for post-modeling data visualization and analysis, a processor presents a first visualization of a training dataset in a first plot. Responsive to receiving a selection of a data group of the training dataset to analyze, a processor identifies three or fewer key model features of the data group of the training dataset. A processor ascertains a representative record of each key model feature of the three or fewer key model features using a Local Interpretable Model-Agnostic Explanation technique. A processor presents a second visualization of the three or fewer key model features and the representative record of each key model feature in a second plot.

Type: Grant

Filed: December 13, 2022

Date of Patent: May 6, 2025

Assignee: International Business Machines Corporation

Inventors: Wen Pei Yu, Xiao Ming Ma, Xue Ying Zhang, Si Er Han, Jing James Xu, Jing Xu, Jun Wang
SYNTHETIC DATA TESTING IN MACHINE LEARNING APPLICATIONS

Publication number: 20250139500

Abstract: Determining whether synthetic data is sufficient for utilization in connection with one or more machine learning models. The computing device accesses a protected batch of data associated with a machine learning model. The computing device accesses a simulated batch of data, the simulated batch of data based upon but anonymizing the protected batch of data. The computing device accesses one or more comparisons of one or more variables in the protected batch of data and the simulated batch of data to obtain a similarity value. The computing device performs a machine learning function utilizing at least in-part the simulated batch of data if the similarity value exceeds a similarity threshold.

Type: Application

Filed: October 30, 2023

Publication date: May 1, 2025

Inventors: Xiao Ming Ma, Si Er Han, Xue Ying Zhang, Jing James Xu, Jing Xu, Ji Hui Yang, Rui Wang
ANOMALY DETECTION FOR TIME SERIES DATA

Publication number: 20250130919

Abstract: A computer-implemented method for anomaly detection for a time series data is provided. Aspects include receiving a time series data including a plurality of sequential data points, calculating an expected next value for the time series data based on the plurality of sequential data points, and receiving an actual next value corresponding to the time series data. Aspects also include calculating an anomaly strength estimate based on the expected next value and the actual next value, identifying one of a plurality of anomaly detection pipelines based on the anomaly strength estimate and a portrait associated with each of the plurality of anomaly detection pipelines, and obtaining an anomaly prediction by inputting the time series data and the actual next value into the one of the plurality of anomaly detection pipelines.

Type: Application

Filed: October 19, 2023

Publication date: April 24, 2025

Inventors: Si Er Han, Jing Xu, Xue Ying Zhang, Xiao Ming Ma, Jun Wang, Ji Hui Yang
ARTIFICIAL DATA GENERATION FOR DIFFERENTIAL PRIVACY

Publication number: 20250131116

Abstract: An embodiment configures a plurality of parameters, the parameters being usable to generate artificial data from original data, the configuring adjusting a level of privacy in the artificial data. An embodiment fits a distribution type to a variable of the original data. An embodiment adjusts, using a desired level of privacy and the distribution type, a level of noise, wherein the level of noise corresponds to the desired level of privacy. An embodiment generates, using the distribution type and the level of noise, the artificial data, the artificial data achieving the desired level of privacy by including noise data corresponding to the level of noise.

Type: Application

Filed: October 20, 2023

Publication date: April 24, 2025

Applicant: International Business Machines Corporation

Inventors: Si Er Han, Jing Xu, Xiao Ming Ma, Jing James Xu, Jiang Bo Kang, Xue Ying Zhang, Jun Wang, Ji Hui Yang
Query performance discovery and improvement

Patent number: 12282480

Abstract: Embodiments analyze a query pattern of an incoming query on a database, perform a semantic analysis of the query pattern of the incoming query, generate a re-write query that has an improved query performance in comparison to a query performance of the incoming query based on the analyzed query pattern and the semantic analysis; build a query model using machine learning based on at least one of the query pattern and the semantic analysis; and apply the re-write query by performing the re-write query on the database to provide the improved query performance.

Type: Grant

Filed: September 6, 2023

Date of Patent: April 22, 2025

Assignee: International Business Machines Corporation

Inventors: Sheng Yan Sun, Peng Hui Jiang, Xiao Ming Ma, Xue Ying Zhang
GENERATING AN ARTIFICIAL DATA SET

Publication number: 20250124052

Abstract: A computer-implemented method for generating an artificial data set is provided. Aspects include obtaining an input data set, calculating an association between the plurality of categorical variables of the input data set, and creating, based on the association, a plurality of clusters of categorical variables. Aspects also include identifying a key variable for each of the plurality of clusters of categorical variables, creating a key cluster for each of the plurality of clusters, and creating a cluster contingency table for each of the clusters. Aspects further include generating, based on the cluster contingency table for each of the plurality of clusters and for the key cluster, a data set for each of the plurality of clusters and the key cluster and generating the artificial data set based on a combination of the data set for each of the plurality of clusters and the key cluster.

Type: Application

Filed: October 12, 2023

Publication date: April 17, 2025

Inventors: Si Er Han, Xiao Ming Ma, Rui Wang, Jing James Xu, Jing Xu, Xue Ying Zhang, Lei Tian, Dong Hai Yu
OPTIMIZING DETECTION OF ABNORMAL DATA POINTS IN TIME SERIES DATA

Publication number: 20250103948

Abstract: In an approach for optimizing abnormal point detection, a processor receives a set of data, wherein the set of data is partially labeled time series data; determines a data block size for the set of data; splits the set of data into data blocks based on the data block size; computes trait measurements for traits for each data block; assigns a tag to each data block, wherein the tag is selected from the group consisting of a normal tag, an abnormality tag, and an unknown tag; uses the respective data blocks with either the normal tag or the abnormality tag as training data; updates the training data with artificial abnormalities; trains a detection model with the updated training data; and utilizes the trained detection model to predict whether the respective data blocks with the unknown tag have an abnormality or no abnormality.

Type: Application

Filed: September 27, 2023

Publication date: March 27, 2025

Inventors: Jing Xu, Si Er Han, Xue Ying Zhang, Xiao Ming Ma
INTELLIGENT RECOMMENDATION OF TIME SERIES ANOMALY DETECTION MODEL PIPELINES

Publication number: 20250094267

Abstract: A time series anomaly detection method, system, and computer program product that processes time series data includes absorbing profiles of the time series data and anomaly types of a model as features, optimizing biased ranks to create optimized ranks through merging initial ranks with new ranks generated by real anomalies, and auto-suggesting the optimized ranks for saving a predetermined amount of data operation.

Type: Application

Filed: September 15, 2023

Publication date: March 20, 2025

Inventors: Jun Wang, Jing Xu, Xiao Ming Ma, Xue Ying Zhang, Si Er Han, Jing James Xu, Wen Pei Yu
POST-MODELING CATEGORY MERGING

Publication number: 20250094831

Abstract: An embodiment identifies, by a post-modeling category merging engine, a plurality of valid pairs associated with a categorical predictor, the plurality of valid pairs representing potential mergers of categories associated with a categorical predictor of a predictive model. The embodiment tests, by the post-modeling category merging engine, a merge strategy for the plurality of valid pairs to determine a merger that minimizes a loss in accuracy of the predictive model. The embodiment merges, by the post-modeling category merging engine based on the testing, a valid pair in the plurality of valid pairs to form a hybrid category.

Type: Application

Filed: September 20, 2023

Publication date: March 20, 2025

Applicant: International Business Machines Corporation

Inventors: Xue Ying Zhang, Si Er Han, Xiao Ming Ma, Jing Xu
Visual representation using post modeling feature evaluation

Patent number: 12249012

Abstract: A method, computer system, and a computer program product are provided for post-modeling feature evaluation. In one embodiment, at least at least one post model visual output and associated data is obtained that at least includes an individual conditional expectation (ICE) plot and a partial dependence (PDP) plot. Using the associated data and the plots, a Feature Importance (PI) plot is provided. A plurality of features is then determined for each PI, PDP and ICE plots to calculate at least one Interesting Value for each plot. An overall score is also calculated for each plurality of features based on the associated Interesting Values for each PDP, ICE and PI plots. At least one top feature is selected based on said scores. A final plot is then generated at least reflecting the top feature. The final plot combines the PI, PDP and ICE plots together.

Type: Grant

Filed: November 17, 2022

Date of Patent: March 11, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Xiao Ming Ma, Wen Pei Yu, Jing James Xu, Xue Ying Zhang, Si Er Han, Jing Xu, Jun Wang
QUERY PERFORMANCE DISCOVERY AND IMPROVEMENT

Publication number: 20250077515

Abstract: Embodiments analyze a query pattern of an incoming query on a database, perform a semantic analysis of the query pattern of the incoming query, generate a re-write query that has an improved query performance in comparison to a query performance of the incoming query based on the analyzed query pattern and the semantic analysis; build a query model using machine learning based on at least one of the query pattern and the semantic analysis; and apply the re-write query by performing the re-write query on the database to provide the improved query performance.

Type: Application

Filed: September 6, 2023

Publication date: March 6, 2025

Inventors: Sheng Yan Sun, Peng Hui Jiang, Xiao Ming Ma, Xue Ying Zhang
Feature importance based model optimization

Patent number: 12242367

Abstract: Disclosed are a computer-implemented method, a system and a computer program product for model exploration. Model feature importance of each model of a plurality of models can be obtained, the plurality of models can be grouped into a plurality of model clusters based on the model feature importance of each model, and the model feature importance can be presented by box-plot or confidence interval.

Type: Grant

Filed: May 15, 2022

Date of Patent: March 4, 2025

Assignee: International Business Machines Corporation

Inventors: Jing Xu, Xue Ying Zhang, Si Er Han, Jing James Xu, Xiao Ming Ma, Jun Wang, Wen Pei Yu
Efficient serverless method and system of serving artificial intelligence models

Patent number: 12231491

Abstract: A method for forecasting server demand includes collecting a historical number of scoring requests from a network using a serverless architecture. A scoring request capacity per server is determined using the historical number of scoring requests. A prediction model predicts a first future value of scoring requests for a first future time span. A current number of servers in a pool of servers handling the scoring requests. Using the prediction model, a determination of whether the current number of servers is capable of handling the first future value of scoring requests for the first future time span is made. Upon determining that the current number of servers is incapable of handling the first future value of scoring requests, one or more additional servers are warmed up. The warmed-up additional servers are added to the pool of servers prior to an arrival of the first future time span.

Type: Grant

Filed: November 14, 2023

Date of Patent: February 18, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bo Song, Jun Wang, Dong Hai Yu, Yao Dong Liu, Xiao Ming Ma, Jiang Bo Kang
POST-MODELING VISUALIZATION

Publication number: 20250053858

Abstract: In an approach, a processor selects a top N features for a machine learning (ML) model; discretizes values of each continuous feature of the top N features; generates a set of combination values that each represent a unique combination of feature values in for a data record; predicts, using the ML model, a target value for each record generating predicted target values; groups the predicted target values based on the combination value for each respective record; fits a distribution for each grouping of the predicted target values associated with a respective combination value generating a set of distributions; clusters and refits the set of distributions using a clustering algorithm resulting in a set of clusters and a refitted distribution for each cluster of the set of clusters; and outputs a visualization of the refitted distribution for each cluster as a distribution curve on a graph along with the associated records.

Type: Application

Filed: August 8, 2023

Publication date: February 13, 2025

Inventors: Si Er Han, Xiao Ming Ma, Wen Pei Yu, Xue Ying Zhang, Jing Xu, Jing James Xu, Jun Wang, Lei Tian
ABNORMAL POINT SIMULATION

Publication number: 20240427684

Abstract: A computer-implemented method, a system and a computer program product for abnormal point simulation are disclosed. A processor analyzes a plurality of data blocks in first time series data to determine traits of respective data blocks. For the respective data blocks, a processor simulates one or more abnormal points based on the traits of the respective data blocks.

Type: Application

Filed: June 20, 2023

Publication date: December 26, 2024

Inventors: Si Er Han, Xiao Ming Ma, Jun Wang, Wen Pei Yu, Xue Ying Zhang, Jing James Xu, Jing Xu
KEY CATEGORY IDENTIFICATION AND VISUALIZATION

Publication number: 20240411783

Abstract: A computer-implemented method for treating post-modeling data includes computing, sequentially for each category of a feature, a category importance (CI) value. The CI value is based on a model accuracy change when records of a category being examined are reassigned to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include all categories of the feature, except for the category being examined. A post-modeling category is performed to merge of each category having the CI value less than a CI value threshold.

Type: Application

Filed: June 12, 2023

Publication date: December 12, 2024

Inventors: Xue Ying Zhang, Si Er Han, Jing Xu, Xiao Ming Ma, Wen Pei Yu, Jing James Xu, Jun Wang, Ji Hui Yang
Machine Learning Model Deployment in Inference System

Publication number: 20240320543

Abstract: Deploying machine learning models is provided. A new machine learning model is received for a given problem that corresponds to a service running in a container. A cluster of machine learning models of a plurality of clusters of machine learning models corresponding to the given problem is selected. A cluster performance score is determined for the cluster based on combining a model performance score of each machine learning model in the cluster in accordance with a corresponding weight of each machine learning model. It is determined whether the cluster performance score of the cluster is greater than a minimum cluster performance score threshold. The new machine learning model is added to the cluster to increase predictive accuracy for the given problem while the service is running without interruption in response to determining that the cluster performance score of the cluster is greater than the minimum cluster performance score threshold.

Type: Application

Filed: March 22, 2023

Publication date: September 26, 2024

Inventors: Bo Song, Dong Hai Yu, Jun Wang, Jiang Bo Kang, Yao Dong Liu, Xiao Ming Ma
Privacy protection in a search process

Patent number: 12099628

Abstract: The present disclosure relates to privacy protection in a search process. According to a method, a target emotion vector is extracted from a search interaction, the target emotion vector representing emotional information in the search interaction. Respective emotion distances between the target emotion vector and respective emotion vectors associated with a plurality of text clusters are determined. The plurality of text clusters is clustered from a dictionary of text elements. A first number of text clusters are selected from the plurality of text clusters based on the determined respective emotion distances. The first number of text clusters have emotion distances larger than at least one unselected text cluster among the plurality of text clusters. A plurality of confused search interactions are constructed for the search interaction based on the first number of text clusters, and the plurality of confused search interactions are performed.

Type: Grant

Filed: May 3, 2022

Date of Patent: September 24, 2024

Assignee: International Business Machines Corporation

Inventors: Jin Wang, Lei Gao, A Peng Zhang, Kai Li, Jun Wang, Xiao Ming Ma, Xin Feng Zhu, Geng Wu Yang
Chaining version data bi-directionally in data page to avoid additional version data accesses

Patent number: 12086118

Abstract: A computer-implemented method, system and computer program product for improving performance of a distributed database. A query is received to store version data in the distributed database. Upon receiving the query to store the version data, the version data is stored in a row of a data page of a main table of a heap organized table/index organized table of the distributed database, where the row of the data page of the main table of the heap organized table/index organized table of the distributed database contains a pointer pointing to a later/previous version of the version data if the later/previous version of the version data is stored in the data page thereby chaining version data bi-directionally.

Type: Grant

Filed: November 15, 2021

Date of Patent: September 10, 2024

Assignee: International Business Corporation Machines

Inventors: Sheng Yan Sun, Shuo Li, Xiaobo Wang, Xiao Ming Ma

1 2 3 4 next