Patents by Inventor Kilian Q. Weinberger

Kilian Q. Weinberger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9298172
    Abstract: The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance metric and a distance-based function approximator estimating long-range expected value are then initialized, where the distance metric computes a distance between two (state, action) pairs, and the distance metric and function approximator are adjusted such that a Bellman error measure of the function approximator on the set of exemplars is minimized. A management policy is then derived based on the trained distance metric and function approximator.
    Type: Grant
    Filed: October 11, 2007
    Date of Patent: March 29, 2016
    Assignee: International Business Machines Corporation
    Inventors: Gerald J. Tesauro, Kilian Q. Weinberger
  • Patent number: 8060454
    Abstract: The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.
    Type: Grant
    Filed: October 11, 2007
    Date of Patent: November 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Rajarshi Das, Gerald J. Tesauro, Kilian Q. Weinberger
  • Patent number: 7599898
    Abstract: The present invention is a method and an apparatus for improved regression modeling to address the curse of dimensionality, for example for use in data analysis tasks. In one embodiment, a method for analyzing data includes receiving a set of exemplars, where at least two of the exemplars include an input pattern (i.e., a point in an input space) and at least one of the exemplars includes a target value associated with the input pattern. A function approximator and a distance metric are then initialized, where the distance metric computes a distance between points in the input space, and the distance metric is adjusted such that an accuracy measure of the function approximator on the set of exemplars is improved.
    Type: Grant
    Filed: October 17, 2006
    Date of Patent: October 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Gerald J. Tesauro, Kilian Q. Weinberger
  • Publication number: 20090099985
    Abstract: The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance metric and a distance-based function approximator estimating long-range expected value are then initialized, where the distance metric computes a distance between two (state, action) pairs, and the distance metric and function approximator are adjusted such that a Bellman error measure of the function approximator on the set of exemplars is minimized. A management policy is then derived based on the trained distance metric and function approximator.
    Type: Application
    Filed: October 11, 2007
    Publication date: April 16, 2009
    Inventors: GERALD J. TESAURO, Kilian Q. Weinberger
  • Publication number: 20090098515
    Abstract: The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the transformed exemplars to obtain a management policy.
    Type: Application
    Filed: October 11, 2007
    Publication date: April 16, 2009
    Inventors: Rajarshi Das, Gerald J. Tesauro, Kilian Q. Weinberger
  • Publication number: 20080154817
    Abstract: The present invention is a method and an apparatus for improved regression modeling to address the curse of dimensionality, for example for use in data analysis tasks. In one embodiment, a method for analyzing data includes receiving a set of exemplars, where at least two of the exemplars include an input pattern (i.e., a point in an input space) and at least one of the exemplars includes a target value associated with the input pattern. A function approximator and a distance metric are then initialized, where the distance metric computes a distance between points in the input space, and the distance metric is adjusted such that an accuracy measure of the function approximator on the set of exemplars is improved.
    Type: Application
    Filed: October 17, 2006
    Publication date: June 26, 2008
    Inventors: Gerald J. Tesauro, Kilian Q. Weinberger