Patents by Inventor Djallel Bouneffouf
Djallel Bouneffouf has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12645961Abstract: Systems, computer-implemented methods, and computer program products to facilitate constrained decision-making and explanation of a recommendation are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a recommendation component that can recommend a decision based on one or more decision policies. The decision can comply with one or more constraints of a constrained decision policy. The computer executable components can further comprise an explanation component that can generate an explanation of the decision. The explanation can comprise one or more factors contributing to the decision.Type: GrantFiled: July 31, 2018Date of Patent: June 2, 2026Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Avinash Balakrishnan, Djallel Bouneffouf, Nicholas Mattei, Francesca Rossi
-
Patent number: 12619911Abstract: According to one embodiment, a method, computer system, and computer program product for reinforcement learning is provided. The present invention may include training, using an offline dataset, a plurality of diverse reward models, and creating a policy based on an output of the reward models and a robustness operator of the reward models.Type: GrantFiled: June 17, 2022Date of Patent: May 5, 2026Assignee: International Business Machines CorporationInventors: Radu Marinescu, Parikshit Ram, Djallel Bouneffouf, Tejaswini Pedapati, Paulito Palmes
-
Publication number: 20260119963Abstract: Mechanisms are provided for generating for aligning responses with a user moral profile. The mechanisms train one or more classifiers to generate a reward output. The trained classifier(s) evaluate an output from a language model (LM) according to whether the output aligns with one or more moral values corresponding to the trained classifier(s). The mechanisms train at least one moral value agent, based on the trained classifiers, to generate one or more responses to inputs that are aligned with at least one moral value. The mechanisms generate, for an input, an aligned output that is aligned with a user moral profile based on a processing of the input via the at least one moral value agent. The user moral profile encodes a moral value configuration specifying a user adherence to moral values in the predefined set of moral values.Type: ApplicationFiled: October 24, 2024Publication date: April 30, 2026Inventors: Djallel Bouneffouf, Pierre L. Dognin, Inkit Padhi, Jesus Maria Rios Aliaga, Ronny Luss, Prasanna Sattigeri, Miao Liu, Kush Raj Varshney, Manish Nagireddy, Matthew D Riemer
-
Patent number: 12468920Abstract: A novel formulation called the Context-Attentive Bandit with Observations (CABO) is described, where only a limited number of features can be accessed by the learner. The present invention is applicable to many problems including problems arising in clinical settings and dialog systems where it is not possible to reveal the whole feature set. The present invention adapts the standard contextual bandit algorithm known as Thompson Sampling with a novel algorithm, we call Context-Attentive Thompson Sampling with Observations (CATSO). Experimental results are included to demonstrate its effectiveness including a regret analysis and an empirical evaluation demonstrating advantages of the disclosed novel approach on several real-life datasets.Type: GrantFiled: July 29, 2019Date of Patent: November 11, 2025Assignee: International Business Machines CorporationInventors: Sohini Upadhyay, Yasaman Khazaeni, Djallel Bouneffouf, Mayank Agarwal
-
Publication number: 20250253014Abstract: A method, computer program product, and computer system for triggering actions within a multi-armed bandit process. In a current iteration of an iterative process: a vector cV(t) of values of V features is generated; a distribution parameter ?t is selected by maximizing a function that depends on ?t and a measure of a probability of success ??; a set CU(t) of U features is selected by maximizing a function that depends on cV(t) and ?t; values cU+V(t) of respective features in CU+V(t) are received; an arm k(t) is selected by maximizing a function that depends on cU+V(t) and ?t; an electromagnetic signal is sent to a hardware machine directing the hardware machine to perform an action of the selected arm k(t); a reward rk(t) resulting from the hardware machine having performed the action is received; and updates are performed for the next iteration.Type: ApplicationFiled: February 5, 2024Publication date: August 7, 2025Inventor: Djallel BOUNEFFOUF
-
Patent number: 12334195Abstract: A computer implemented method of modifying molecular structures constrained by a budget is provided. The computer implemented method includes receiving from a user a subset of molecules, where each molecule is represented as a generation path, and receiving from the user an allotted budget for modifying a selection of molecules from the subset of molecules. The computer implemented method further includes testing a first molecule, and reducing the allotted budget based on the resources expended to test the first molecule. The computer implemented method further includes testing a second molecule, and reducing the allotted budget based on the resources expended to test the second molecule. The computer implemented method further includes determining a remaining amount of the allotted budget, and testing additional molecules from the subset of molecules until the allotted budget is exhausted. The computer implemented method further includes presenting the tested molecules to the user.Type: GrantFiled: December 22, 2021Date of Patent: June 17, 2025Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Akihiro Kishimoto, Toshiyuki Hama, Hsiang Han Hsu, Djallel Bouneffouf
-
Publication number: 20250111402Abstract: A computer-based question-answering system is capable of receiving a user input specifying a noisy reward and a sparse reward. The noisy reward and the sparse reward are received responsive to an initial recommendation generated by a computer-based recommendation system. A filtered noisy reward is generated by filtering the noisy reward based on an upper bound for the sparse reward or a lower bound for the sparse reward. A final reward is generated based on the filtered noisy reward and the sparse reward. An expected reward and a confidence interval for each of a plurality of candidate recommendations are updated based on the final reward. A subsequent recommendation generated by the computer-based recommendation system is provided based on the expected reward as updated and the confidence interval as updated for each candidate recommendation of the plurality of candidate recommendations.Type: ApplicationFiled: June 28, 2023Publication date: April 3, 2025Inventor: Djallel Bouneffouf
-
Publication number: 20250103928Abstract: A method, computer program product, and computer system for triggering actions within a multi-armed bandit process with corrupted context. In a current time step: a context vector c(t) is received; a weight a is sampled, from first normal probability distribution, to maximize a function f? of c(t) and ?a; functions f1k are f2k respectively having and not having a functional dependence on c(t) are determined for each arm k; an arm k(t) is selected to maximize [?(t)f1k+(1??(t))f2k]; an electromagnetic signal is sent to a hardware machine directing the hardware machine to perform an action of the selected arm k(t); a dynamic reward rk(t) resulting from the hardware machine having performed the action is received; and updates are performed for the next time step, including updating the first normal probability distribution for ?=?(t) as a function of c(t) and rk(t) for the selected arm k(t).Type: ApplicationFiled: September 26, 2023Publication date: March 27, 2025Inventor: Djallel BOUNEFFOUF
-
Publication number: 20250068932Abstract: A method for solving a contextual bandit problem using an Evolution Linear Thompson Sampling (ELINTS) algorithm is provided, wherein the method includes identifying a contextual bandit problem having exploration parameters and feature subsets, initializing a population of genomes for use with the exploration parameters and the feature subset, initializing exploration parameter values and a random feature subset, calculating an expected reward using the exploration parameters and the feature subsets, choosing an action arm A(t), observing a reward R(t) and update a cumulative reward, selecting a subset of existing genomes based on the cumulative and replacing one or more of the existing genomes with newly created offspring genomes.Type: ApplicationFiled: August 23, 2023Publication date: February 27, 2025Inventor: Djallel BOUNEFFOUF
-
Publication number: 20250068969Abstract: A multi-armed bandit (MAB) problem is obtained and a per-round regret lower bound is determined, wherein a corresponding regret is measured against a benchmark. The multi-armed bandit problem is provided to an algorithm that has a per-round regret that is close to the determined per-round regret lower bound, wherein the algorithm dynamically adapts to changes and discards irrelevant past information by alternating between recently pulled arms and unpulled arms having potential, wherein the alternating comprises updating an estimate of an expected reward of each arm within each epoch and an estimate for an error bound that captures an amount of error contained in the estimate of the expected reward for each arm within each epoch based on the auto-regressive temporal structure with trend components, and restarting the algorithm.Type: ApplicationFiled: April 5, 2024Publication date: February 27, 2025Inventors: Qinyi Chen, Negin Golrezaei, Djallel Bouneffouf
-
Publication number: 20250068696Abstract: A method for solving a Context-Attentive Combinatorial Bandit with Observations (CACBO) problem using a Context-Attentive Combinatorial Thompson Sampling with Observations (CACTSO) algorithm is provided where the method includes identifying a Context-Attentive Combinatorial Bandit with Observations problem having multiple arms, identifying a plurality of parameters including a total number of features N, a number of initially observed features V, an initially observed features set CV, a number of observed additional features U, a distribution parameter, and a function ?(t) which is computed differently for stationary and nonstationary cases, initializing the initially observed features, the initially observed features set and the observed additional features, identifying a plurality of subsets CV(t) for each time t from a plurality of predetermined times t, sampling a vector parameter for each context feature for a plurality of context features, identifying a best subset of features and selecting an arm basedType: ApplicationFiled: August 23, 2023Publication date: February 27, 2025Inventor: Djallel BOUNEFFOUF
-
Publication number: 20250058119Abstract: An embodiment collects a first set of patient data and a first set of treatment data associated with a patient population treated with spinal cord stimulation. The embodiment clusters the patient population into a plurality of cohorts. The embodiment generates a plurality of states using a second set of patient data associated with a cohort in the plurality of cohorts. The embodiment generates a plurality of actions using a second set of treatment data associated with the cohort. The embodiment determines, based on the plurality of actions, a plurality of probabilities associated with a transition from a first state to a second state in the plurality of states. The embodiment generates, based on the plurality of probabilities, a stimulator action policy for a patient in the cohort.Type: ApplicationFiled: August 14, 2023Publication date: February 20, 2025Applicant: International Business Machines CorporationInventors: Tigran Tigran Tchrakian, Mykhaylo Zayats, Sergiy Zhuk, Djallel Bouneffouf
-
Publication number: 20250062009Abstract: An embodiment collects a first set of patient data and a first set of treatment data associated with a patient population treated with neuromodulation. The embodiment clusters the patient population into a plurality of cohorts. The embodiment generates a plurality of states using a second set of patient data associated with a cohort in the plurality of cohorts. The embodiment generates a plurality of actions using a second set of treatment data associated with the cohort. The embodiment determines, based on the plurality of actions, a plurality of probabilities associated with a transition from a first state to a second state in the plurality of states. The embodiment generates, based on the plurality of probabilities, a neuromodulator action policy for a patient in the cohort.Type: ApplicationFiled: August 14, 2023Publication date: February 20, 2025Applicant: International Business Machines CorporationInventors: Tigran Tigran Tchrakian, Mykhaylo Zayats, Sergiy Zhuk, Djallel BOUNEFFOUF
-
Publication number: 20250045570Abstract: A method, computer program product, and computer system for triggering actions in a sequence of time steps within a multi-armed bandit process. In a current time step: a context input is received; a hidden Markov model (HMM) parameter transformation is executed to compute a latent state probability vector and HMM parameters using a conditional probability distribution, context input, values of latent state probability vector, and HMM parameters from a previous time step; an action is selected; an electromagnetic signal is sent to a hardware machine directing the hardware machine to perform the action; a dynamic reward resulting from the hardware machine having performed the action is received; a mean reward estimate as a function of the dynamic reward and the latent state probability is updated; and an update of the latent state probability vector in dependence on the dynamic reward, the action, and the mean reward estimate vector is computed.Type: ApplicationFiled: August 1, 2023Publication date: February 6, 2025Inventors: Elliot Nelson, Djallel Bouneffouf, Debarun Bhattacharjya, Tian Gao, Miao Liu
-
Publication number: 20250005099Abstract: A method for solving a contextual bandit problem having a trending reward function including identifying a contextual bandit (MAB) problem having multiple arms i and a known trend, wherein the shape of a reward function for each of the multiple arms is known, and wherein a distribution of the reward function is unknown and implementing a Linear Upper Confidence Bound Contextual Bandit (ALINUCB) algorithm to take advantage of the shape of the reward function by causing each of the multiple arms to be independently drawn by the agent responsive to a sequence during a predetermined time period, identifying a preferred arm from the multiple arms, wherein the primary arm has the best reward during the predetermined time period, engaging the preferred arm during the predetermined time period, detecting the expiration of the predetermined time period and testing each of the multiple arms for a subsequent predetermined time period.Type: ApplicationFiled: June 28, 2023Publication date: January 2, 2025Inventor: Djallel Bouneffouf
-
Publication number: 20250006335Abstract: Data associated with a patient undergoing medical treatment can be received. The data can pertain to a use of the medical treatment and a state of the patient. A subset of the data pertaining to the state of the patient can be identified as context. A function can be learned that relates the context to a reward derived from the medical treatment. The function can be used, based on a current state of the patient, to identify a type of treatment to deliver to the patient.Type: ApplicationFiled: June 29, 2023Publication date: January 2, 2025Inventors: Tigran Tigran Tchrakian, Sergiy Zhuk, Djallel BOUNEFFOUF, Mykhaylo Zayats, JEFFREY L. ROGERS
-
Publication number: 20240330741Abstract: Mechanisms are provided for training a machine learning computer model. The mechanisms execute a first initialization of machine learning training logic based on a determination of propensity scores for each output, of a plurality of predetermined outputs, of a machine learning computer model, the propensity scores being determined from historical data. The mechanisms execute a second initialization of the machine learning training logic by performing a trimmed optimization of the machine learning training logic, based on the historical data, to estimate initial parameters of the machine learning computer model. The resulting initialized machine learning training logic is executed on the machine learning computer model to train the machine learning computer model which is then deployed.Type: ApplicationFiled: March 30, 2023Publication date: October 3, 2024Inventors: Sarah Boufelja-Yacoubi, Djallel Bouneffouf, Sergiy Zhuk
-
Patent number: 12056584Abstract: An online machine learning model such as an autonomous agent predicts an action. A processor associated with, or running, the online machine learning model observes an environment for an interval of time for a real reward associated with the action. Responsive to determining that the real reward is not received within the interval of time, the processor determines based on a criterion whether to allocate an immediate reward received within the interval of time to the online machine learning model, where the immediate reward is an approximation of the real reward. Responsive to determining that the immediate reward is to be allocated, the processor allocates the immediate reward to the online machine learning model. The online machine learning model further learns or retrains itself based on the immediate reward.Type: GrantFiled: November 16, 2020Date of Patent: August 6, 2024Assignee: International Business Machines CorporationInventors: Oznur Alkan, Djallel Bouneffouf, Bei Chen, Elizabeth Daly
-
Publication number: 20240232682Abstract: A method for computing possibly optimal policies in reinforcement learning with multiple objectives and tradeoffs includes receiving a dataset comprising state, action, and reward information for objectives in a multiple objective environment. Tradeoff information indicating that a first vector comprising first values of the objectives in the multiple objective environment is preferred to a second vector comprising second values of the objectives in the multiple objective environment is received. A set of possibly optimal policies for the multiple objective environment is produced based on the dataset and the tradeoff information, where the set of possibly optimal policies indicates actions for an intelligent agent operating in the multiple objective environment to take.Type: ApplicationFiled: October 24, 2022Publication date: July 11, 2024Inventors: Radu Marinescu, Parikshit Ram, Djallel Bouneffouf, Tejaswini Pedapati, Paulito Palmes
-
Publication number: 20240135234Abstract: A method for computing possibly optimal policies in reinforcement learning with multiple objectives and tradeoffs includes receiving a dataset comprising state, action, and reward information for objectives in a multiple objective environment. Tradeoff information indicating that a first vector comprising first values of the objectives in the multiple objective environment is preferred to a second vector comprising second values of the objectives in the multiple objective environment is received. A set of possibly optimal policies for the multiple objective environment is produced based on the dataset and the tradeoff information, where the set of possibly optimal policies indicates actions for an intelligent agent operating in the multiple objective environment to take.Type: ApplicationFiled: October 23, 2022Publication date: April 25, 2024Inventors: Radu Marinescu, Parikshit Ram, Djallel Bouneffouf, Tejaswini Pedapati, Paulito Palmes