Patents by Inventor Djallel Bouneffouf

Djallel Bouneffouf has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Constrained decision-making and explanation of a recommendation

Patent number: 12645961

Abstract: Systems, computer-implemented methods, and computer program products to facilitate constrained decision-making and explanation of a recommendation are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a recommendation component that can recommend a decision based on one or more decision policies. The decision can comply with one or more constraints of a constrained decision policy. The computer executable components can further comprise an explanation component that can generate an explanation of the decision. The explanation can comprise one or more factors contributing to the decision.

Type: Grant

Filed: July 31, 2018

Date of Patent: June 2, 2026

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Avinash Balakrishnan, Djallel Bouneffouf, Nicholas Mattei, Francesca Rossi
Computing robust policies in offline reinforcement learning

Patent number: 12619911

Abstract: According to one embodiment, a method, computer system, and computer program product for reinforcement learning is provided. The present invention may include training, using an offline dataset, a plurality of diverse reward models, and creating a policy based on an output of the reward models and a robustness operator of the reward models.

Type: Grant

Filed: June 17, 2022

Date of Patent: May 5, 2026

Assignee: International Business Machines Corporation

Inventors: Radu Marinescu, Parikshit Ram, Djallel Bouneffouf, Tejaswini Pedapati, Paulito Palmes
CONTEXTUAL MORAL VALUE ALIGNMENT THROUGH CONTEXT-BASED AGGREGATION

Publication number: 20260119963

Abstract: Mechanisms are provided for generating for aligning responses with a user moral profile. The mechanisms train one or more classifiers to generate a reward output. The trained classifier(s) evaluate an output from a language model (LM) according to whether the output aligns with one or more moral values corresponding to the trained classifier(s). The mechanisms train at least one moral value agent, based on the trained classifiers, to generate one or more responses to inputs that are aligned with at least one moral value. The mechanisms generate, for an input, an aligned output that is aligned with a user moral profile based on a processing of the input via the at least one moral value agent. The user moral profile encodes a moral value configuration specifying a user adherence to moral values in the predefined set of moral values.

Type: Application

Filed: October 24, 2024

Publication date: April 30, 2026

Inventors: Djallel Bouneffouf, Pierre L. Dognin, Inkit Padhi, Jesus Maria Rios Aliaga, Ronny Luss, Prasanna Sattigeri, Miao Liu, Kush Raj Varshney, Manish Nagireddy, Matthew D Riemer
Feedback driven decision support in partially observable settings

Patent number: 12468920

Abstract: A novel formulation called the Context-Attentive Bandit with Observations (CABO) is described, where only a limited number of features can be accessed by the learner. The present invention is applicable to many problems including problems arising in clinical settings and dialog systems where it is not possible to reveal the whole feature set. The present invention adapts the standard contextual bandit algorithm known as Thompson Sampling with a novel algorithm, we call Context-Attentive Thompson Sampling with Observations (CATSO). Experimental results are included to demonstrate its effectiveness including a regret analysis and an empirical evaluation demonstrating advantages of the disclosed novel approach on several real-life datasets.

Type: Grant

Filed: July 29, 2019

Date of Patent: November 11, 2025

Assignee: International Business Machines Corporation

Inventors: Sohini Upadhyay, Yasaman Khazaeni, Djallel Bouneffouf, Mayank Agarwal
MULTI-ARMED BANDIT WITH OPTIMUM EXPLORATION-EXPLOITATION DISTRIBUTION PARAMETER

Publication number: 20250253014

Abstract: A method, computer program product, and computer system for triggering actions within a multi-armed bandit process. In a current iteration of an iterative process: a vector cV(t) of values of V features is generated; a distribution parameter ?t is selected by maximizing a function that depends on ?t and a measure of a probability of success ??; a set CU(t) of U features is selected by maximizing a function that depends on cV(t) and ?t; values cU+V(t) of respective features in CU+V(t) are received; an arm k(t) is selected by maximizing a function that depends on cU+V(t) and ?t; an electromagnetic signal is sent to a hardware machine directing the hardware machine to perform an action of the selected arm k(t); a reward rk(t) resulting from the hardware machine having performed the action is received; and updates are performed for the next iteration.

Type: Application

Filed: February 5, 2024

Publication date: August 7, 2025

Inventor: Djallel BOUNEFFOUF
Optimization of multiple molecules

Patent number: 12334195

Abstract: A computer implemented method of modifying molecular structures constrained by a budget is provided. The computer implemented method includes receiving from a user a subset of molecules, where each molecule is represented as a generation path, and receiving from the user an allotted budget for modifying a selection of molecules from the subset of molecules. The computer implemented method further includes testing a first molecule, and reducing the allotted budget based on the resources expended to test the first molecule. The computer implemented method further includes testing a second molecule, and reducing the allotted budget based on the resources expended to test the second molecule. The computer implemented method further includes determining a remaining amount of the allotted budget, and testing additional molecules from the subset of molecules until the allotted budget is exhausted. The computer implemented method further includes presenting the tested molecules to the user.

Type: Grant

Filed: December 22, 2021

Date of Patent: June 17, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Akihiro Kishimoto, Toshiyuki Hama, Hsiang Han Hsu, Djallel Bouneffouf
COMPUTER-BASED QUESTION-ANSWERING SYSTEM USING MULTIPLE TYPES OF USER FEEDBACK

Publication number: 20250111402

Abstract: A computer-based question-answering system is capable of receiving a user input specifying a noisy reward and a sparse reward. The noisy reward and the sparse reward are received responsive to an initial recommendation generated by a computer-based recommendation system. A filtered noisy reward is generated by filtering the noisy reward based on an upper bound for the sparse reward or a lower bound for the sparse reward. A final reward is generated based on the filtered noisy reward and the sparse reward. An expected reward and a confidence interval for each of a plurality of candidate recommendations are updated based on the final reward. A subsequent recommendation generated by the computer-based recommendation system is provided based on the expected reward as updated and the confidence interval as updated for each candidate recommendation of the plurality of candidate recommendations.

Type: Application

Filed: June 28, 2023

Publication date: April 3, 2025

Inventor: Djallel Bouneffouf
CONTEXTUAL THOMPSON SAMPLING WITH CORRUPTED AND MISSING CONTEXT

Publication number: 20250103928

Abstract: A method, computer program product, and computer system for triggering actions within a multi-armed bandit process with corrupted context. In a current time step: a context vector c(t) is received; a weight a is sampled, from first normal probability distribution, to maximize a function f? of c(t) and ?a; functions f1k are f2k respectively having and not having a functional dependence on c(t) are determined for each arm k; an arm k(t) is selected to maximize [?(t)f1k+(1??(t))f2k]; an electromagnetic signal is sent to a hardware machine directing the hardware machine to perform an action of the selected arm k(t); a dynamic reward rk(t) resulting from the hardware machine having performed the action is received; and updates are performed for the next time step, including updating the first normal probability distribution for ?=?(t) as a function of c(t) and rk(t) for the selected arm k(t).

Type: Application

Filed: September 26, 2023

Publication date: March 27, 2025

Inventor: Djallel BOUNEFFOUF
EVOLUTIONARY CONTEXTUAL BANDITS

Publication number: 20250068932

Abstract: A method for solving a contextual bandit problem using an Evolution Linear Thompson Sampling (ELINTS) algorithm is provided, wherein the method includes identifying a contextual bandit problem having exploration parameters and feature subsets, initializing a population of genomes for use with the exploration parameters and the feature subset, initializing exploration parameter values and a random feature subset, calculating an expected reward using the exploration parameters and the feature subsets, choosing an action arm A(t), observing a reward R(t) and update a cumulative reward, selecting a subset of existing genomes based on the cumulative and replacing one or more of the existing genomes with newly created offspring genomes.

Type: Application

Filed: August 23, 2023

Publication date: February 27, 2025

Inventor: Djallel BOUNEFFOUF
ONLINE SYSTEM WITH BANDIT FEATURE AND AUTO-REGRESSIVE TEMPORAL STRUCTURE

Publication number: 20250068969

Abstract: A multi-armed bandit (MAB) problem is obtained and a per-round regret lower bound is determined, wherein a corresponding regret is measured against a benchmark. The multi-armed bandit problem is provided to an algorithm that has a per-round regret that is close to the determined per-round regret lower bound, wherein the algorithm dynamically adapts to changes and discards irrelevant past information by alternating between recently pulled arms and unpulled arms having potential, wherein the alternating comprises updating an estimate of an expected reward of each arm within each epoch and an estimate for an error bound that captures an amount of error contained in the estimate of the expected reward for each arm within each epoch based on the auto-regressive temporal structure with trend components, and restarting the algorithm.

Type: Application

Filed: April 5, 2024

Publication date: February 27, 2025

Inventors: Qinyi Chen, Negin Golrezaei, Djallel Bouneffouf
ONLINE SYSTEM AND METHOD FOR SOLVING CONTEXT-ATTENTIVE COMBINATORIAL BANDIT WITH OBSERVATIONS

Publication number: 20250068696

Abstract: A method for solving a Context-Attentive Combinatorial Bandit with Observations (CACBO) problem using a Context-Attentive Combinatorial Thompson Sampling with Observations (CACTSO) algorithm is provided where the method includes identifying a Context-Attentive Combinatorial Bandit with Observations problem having multiple arms, identifying a plurality of parameters including a total number of features N, a number of initially observed features V, an initially observed features set CV, a number of observed additional features U, a distribution parameter, and a function ?(t) which is computed differently for stationary and nonstationary cases, initializing the initially observed features, the initially observed features set and the observed additional features, identifying a plurality of subsets CV(t) for each time t from a plurality of predetermined times t, sampling a vector parameter for each context feature for a plurality of context features, identifying a best subset of features and selecting an arm based

Type: Application

Filed: August 23, 2023

Publication date: February 27, 2025

Inventor: Djallel BOUNEFFOUF
ADAPTIVE SPINAL CORD STIMULATION POLICY GENERATION

Publication number: 20250058119

Abstract: An embodiment collects a first set of patient data and a first set of treatment data associated with a patient population treated with spinal cord stimulation. The embodiment clusters the patient population into a plurality of cohorts. The embodiment generates a plurality of states using a second set of patient data associated with a cohort in the plurality of cohorts. The embodiment generates a plurality of actions using a second set of treatment data associated with the cohort. The embodiment determines, based on the plurality of actions, a plurality of probabilities associated with a transition from a first state to a second state in the plurality of states. The embodiment generates, based on the plurality of probabilities, a stimulator action policy for a patient in the cohort.

Type: Application

Filed: August 14, 2023

Publication date: February 20, 2025

Applicant: International Business Machines Corporation

Inventors: Tigran Tigran Tchrakian, Mykhaylo Zayats, Sergiy Zhuk, Djallel Bouneffouf
ADAPTIVE NEUROMODULATOR ACTION POLICY GENERATION

Publication number: 20250062009

Abstract: An embodiment collects a first set of patient data and a first set of treatment data associated with a patient population treated with neuromodulation. The embodiment clusters the patient population into a plurality of cohorts. The embodiment generates a plurality of states using a second set of patient data associated with a cohort in the plurality of cohorts. The embodiment generates a plurality of actions using a second set of treatment data associated with the cohort. The embodiment determines, based on the plurality of actions, a plurality of probabilities associated with a transition from a first state to a second state in the plurality of states. The embodiment generates, based on the plurality of probabilities, a neuromodulator action policy for a patient in the cohort.

Type: Application

Filed: August 14, 2023

Publication date: February 20, 2025

Applicant: International Business Machines Corporation

Inventors: Tigran Tigran Tchrakian, Mykhaylo Zayats, Sergiy Zhuk, Djallel BOUNEFFOUF
ONLINE LEARNING SYSTEM WITH CONTEXTUAL BANDITS FEEDBACK AND LATENT STATE DYNAMICS

Publication number: 20250045570

Abstract: A method, computer program product, and computer system for triggering actions in a sequence of time steps within a multi-armed bandit process. In a current time step: a context input is received; a hidden Markov model (HMM) parameter transformation is executed to compute a latent state probability vector and HMM parameters using a conditional probability distribution, context input, values of latent state probability vector, and HMM parameters from a previous time step; an action is selected; an electromagnetic signal is sent to a hardware machine directing the hardware machine to perform the action; a dynamic reward resulting from the hardware machine having performed the action is received; a mean reward estimate as a function of the dynamic reward and the latent state probability is updated; and an update of the latent state probability vector in dependence on the dynamic reward, the action, and the mean reward estimate vector is computed.

Type: Application

Filed: August 1, 2023

Publication date: February 6, 2025

Inventors: Elliot Nelson, Djallel Bouneffouf, Debarun Bhattacharjya, Tian Gao, Miao Liu
CONTEXTUAL BANDIT WITH TRENDING REWARD FUNCTION

Publication number: 20250005099

Abstract: A method for solving a contextual bandit problem having a trending reward function including identifying a contextual bandit (MAB) problem having multiple arms i and a known trend, wherein the shape of a reward function for each of the multiple arms is known, and wherein a distribution of the reward function is unknown and implementing a Linear Upper Confidence Bound Contextual Bandit (ALINUCB) algorithm to take advantage of the shape of the reward function by causing each of the multiple arms to be independently drawn by the agent responsive to a sequence during a predetermined time period, identifying a preferred arm from the multiple arms, wherein the primary arm has the best reward during the predetermined time period, engaging the preferred arm during the predetermined time period, detecting the expiration of the predetermined time period and testing each of the multiple arms for a subsequent predetermined time period.

Type: Application

Filed: June 28, 2023

Publication date: January 2, 2025

Inventor: Djallel Bouneffouf
PATIENT TREATMENT RECOMMENDATIONS

Publication number: 20250006335

Abstract: Data associated with a patient undergoing medical treatment can be received. The data can pertain to a use of the medical treatment and a state of the patient. A subset of the data pertaining to the state of the patient can be identified as context. A function can be learned that relates the context to a reward derived from the medical treatment. The function can be used, based on a current state of the patient, to identify a type of treatment to deliver to the patient.

Type: Application

Filed: June 29, 2023

Publication date: January 2, 2025

Inventors: Tigran Tigran Tchrakian, Sergiy Zhuk, Djallel BOUNEFFOUF, Mykhaylo Zayats, JEFFREY L. ROGERS
Machine Learning Using Robust Stochastic Multi-Armed Bandits with Historical Data

Publication number: 20240330741

Abstract: Mechanisms are provided for training a machine learning computer model. The mechanisms execute a first initialization of machine learning training logic based on a determination of propensity scores for each output, of a plurality of predetermined outputs, of a machine learning computer model, the propensity scores being determined from historical data. The mechanisms execute a second initialization of the machine learning training logic by performing a trimmed optimization of the machine learning training logic, based on the historical data, to estimate initial parameters of the machine learning computer model. The resulting initialized machine learning training logic is executed on the machine learning computer model to train the machine learning computer model which is then deployed.

Type: Application

Filed: March 30, 2023

Publication date: October 3, 2024

Inventors: Sarah Boufelja-Yacoubi, Djallel Bouneffouf, Sergiy Zhuk
Online machine learning with immediate rewards when real rewards are delayed

Patent number: 12056584

Abstract: An online machine learning model such as an autonomous agent predicts an action. A processor associated with, or running, the online machine learning model observes an environment for an interval of time for a real reward associated with the action. Responsive to determining that the real reward is not received within the interval of time, the processor determines based on a criterion whether to allocate an immediate reward received within the interval of time to the online machine learning model, where the immediate reward is an approximation of the real reward. Responsive to determining that the immediate reward is to be allocated, the processor allocates the immediate reward to the online machine learning model. The online machine learning model further learns or retrains itself based on the immediate reward.

Type: Grant

Filed: November 16, 2020

Date of Patent: August 6, 2024

Assignee: International Business Machines Corporation

Inventors: Oznur Alkan, Djallel Bouneffouf, Bei Chen, Elizabeth Daly
REINFORCEMENT LEARNING WITH MULTIPLE OBJECTIVES AND TRADEOFFS

Publication number: 20240232682

Abstract: A method for computing possibly optimal policies in reinforcement learning with multiple objectives and tradeoffs includes receiving a dataset comprising state, action, and reward information for objectives in a multiple objective environment. Tradeoff information indicating that a first vector comprising first values of the objectives in the multiple objective environment is preferred to a second vector comprising second values of the objectives in the multiple objective environment is received. A set of possibly optimal policies for the multiple objective environment is produced based on the dataset and the tradeoff information, where the set of possibly optimal policies indicates actions for an intelligent agent operating in the multiple objective environment to take.

Type: Application

Filed: October 24, 2022

Publication date: July 11, 2024

Inventors: Radu Marinescu, Parikshit Ram, Djallel Bouneffouf, Tejaswini Pedapati, Paulito Palmes
REINFORCEMENT LEARNING WITH MULTIPLE OBJECTIVES AND TRADEOFFS

Publication number: 20240135234

Abstract: A method for computing possibly optimal policies in reinforcement learning with multiple objectives and tradeoffs includes receiving a dataset comprising state, action, and reward information for objectives in a multiple objective environment. Tradeoff information indicating that a first vector comprising first values of the objectives in the multiple objective environment is preferred to a second vector comprising second values of the objectives in the multiple objective environment is received. A set of possibly optimal policies for the multiple objective environment is produced based on the dataset and the tradeoff information, where the set of possibly optimal policies indicates actions for an intelligent agent operating in the multiple objective environment to take.

Type: Application

Filed: October 23, 2022

Publication date: April 25, 2024

Inventors: Radu Marinescu, Parikshit Ram, Djallel Bouneffouf, Tejaswini Pedapati, Paulito Palmes

1 2 next