Patents by Inventor Kilian Q. Weinberger

Kilian Q. Weinberger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus for improved reward-based learning using adaptive distance metrics

Patent number: 9298172

Abstract: The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance metric and a distance-based function approximator estimating long-range expected value are then initialized, where the distance metric computes a distance between two (state, action) pairs, and the distance metric and function approximator are adjusted such that a Bellman error measure of the function approximator on the set of exemplars is minimized. A management policy is then derived based on the trained distance metric and function approximator.

Type: Grant

Filed: October 11, 2007

Date of Patent: March 29, 2016

Assignee: International Business Machines Corporation

Inventors: Gerald J. Tesauro, Kilian Q. Weinberger
Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction

Patent number: 8060454

Abstract: The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.

Type: Grant

Filed: October 11, 2007

Date of Patent: November 15, 2011

Assignee: International Business Machines Corporation

Inventors: Rajarshi Das, Gerald J. Tesauro, Kilian Q. Weinberger
Method and apparatus for improved regression modeling

Patent number: 7599898

Abstract: The present invention is a method and an apparatus for improved regression modeling to address the curse of dimensionality, for example for use in data analysis tasks. In one embodiment, a method for analyzing data includes receiving a set of exemplars, where at least two of the exemplars include an input pattern (i.e., a point in an input space) and at least one of the exemplars includes a target value associated with the input pattern. A function approximator and a distance metric are then initialized, where the distance metric computes a distance between points in the input space, and the distance metric is adjusted such that an accuracy measure of the function approximator on the set of exemplars is improved.

Type: Grant

Filed: October 17, 2006

Date of Patent: October 6, 2009

Assignee: International Business Machines Corporation

Inventors: Gerald J. Tesauro, Kilian Q. Weinberger
METHOD AND APPARATUS FOR IMPROVED REWARD-BASED LEARNING USING NONLINEAR DIMENSIONALITY REDUCTION

Publication number: 20090098515

Abstract: The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the transformed exemplars to obtain a management policy.

Type: Application

Filed: October 11, 2007

Publication date: April 16, 2009

Inventors: Rajarshi Das, Gerald J. Tesauro, Kilian Q. Weinberger
METHOD AND APPARATUS FOR IMPROVED REWARD-BASED LEARNING USING ADAPTIVE DISTANCE METRICS

Publication number: 20090099985

Abstract: The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance metric and a distance-based function approximator estimating long-range expected value are then initialized, where the distance metric computes a distance between two (state, action) pairs, and the distance metric and function approximator are adjusted such that a Bellman error measure of the function approximator on the set of exemplars is minimized. A management policy is then derived based on the trained distance metric and function approximator.

Type: Application

Filed: October 11, 2007

Publication date: April 16, 2009

Inventors: GERALD J. TESAURO, Kilian Q. Weinberger
METHOD AND APPARATUS FOR IMPROVED REGRESSION MODELING

Publication number: 20080154817

Abstract: The present invention is a method and an apparatus for improved regression modeling to address the curse of dimensionality, for example for use in data analysis tasks. In one embodiment, a method for analyzing data includes receiving a set of exemplars, where at least two of the exemplars include an input pattern (i.e., a point in an input space) and at least one of the exemplars includes a target value associated with the input pattern. A function approximator and a distance metric are then initialized, where the distance metric computes a distance between points in the input space, and the distance metric is adjusted such that an accuracy measure of the function approximator on the set of exemplars is improved.

Type: Application

Filed: October 17, 2006

Publication date: June 26, 2008

Inventors: Gerald J. Tesauro, Kilian Q. Weinberger

Method and apparatus for improved reward-based learning using adaptive distance metrics

Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction

Method and apparatus for improved regression modeling

METHOD AND APPARATUS FOR IMPROVED REWARD-BASED LEARNING USING NONLINEAR DIMENSIONALITY REDUCTION

METHOD AND APPARATUS FOR IMPROVED REWARD-BASED LEARNING USING ADAPTIVE DISTANCE METRICS

METHOD AND APPARATUS FOR IMPROVED REGRESSION MODELING