Patents by Inventor Steven Loscalzo

Steven Loscalzo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Approximate value iteration with complex returns by bounding

Patent number: 12169793

Abstract: A system and method for controlling a system, comprising estimating an optimal control policy for the system; receiving data representing sequential states and associated trajectories of the system, comprising off-policy states and associated off-policy trajectories; improving the estimate of the optimal control policy by performing at least one approximate value iteration, comprising: estimating a value of operation of the system dependent on the estimated optimal control policy; using a complex return of the received data, biased by the off-policy states, to determine a bound dependent on at least the off-policy trajectories, and using the bound to improve the estimate of the value of operation of the system according to the estimated optimal control policy; and updating the estimate of the optimimal control policy, dependent on the improved estimate of the value of operation of the system.

Type: Grant

Filed: November 16, 2020

Date of Patent: December 17, 2024

Assignee: The Research Foundation for The State University of New York

Inventors: Robert Wright, Lei Yu, Steven Loscalzo
APPROXIMATE VALUE ITERATION WITH COMPLEX RETURNS BY BOUNDING

Publication number: 20210150399

Abstract: A system and method for controlling a system, comprising estimating an optimal control policy for the system; receiving data representing sequential states and associated trajectories of the system, comprising off-policy states and associated off-policy trajectories; improving the estimate of the optimal control policy by performing at least one approximate value iteration, comprising: estimating a value of operation of the system dependent on the estimated optimal control policy; using a complex return of the received data, biased by the off-policy states, to determine a bound dependent on at least the off-policy trajectories, and using the bound to improve the estimate of the value of operation of the system according to the estimated optimal control policy; and updating the estimate of the optimimal control policy, dependent on the improved estimate of the value of operation of the system.

Type: Application

Filed: November 16, 2020

Publication date: May 20, 2021

Inventors: Robert Wright, Lei Yu, Steven Loscalzo
Approximate value iteration with complex returns by bounding

Patent number: 10839302

Abstract: A control system and method for controlling a system, which employs a data set representing a plurality of states and associated trajectories of an environment of the system; and which iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states; using a complex return of the data set to determine a bound to improve the estimated long term value; and producing an updated estimate of an optimal control policy dependent on the improved estimate of the long term value. The control system may produce an output signal to control the system directly, or output the optimized control policy. The system preferably is a reinforcement learning system which continually improves.

Type: Grant

Filed: November 22, 2016

Date of Patent: November 17, 2020

Assignee: The Research Foundation for the State University of New York

Inventors: Robert Wright, Lei Yu, Steven Loscalzo
APPROXIMATE VALUE ITERATION WITH COMPLEX RETURNS BY BOUNDING

Publication number: 20180012137

Abstract: A control system and method for controlling a system, which employs a data set representing a plurality of states and associated trajectories of an environment of the system; and which iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states; using a complex return of the data set to determine a bound to improve the estimated long term value; and producing an updated estimate of an optimal control policy dependent on the improved estimate of the long term value. The control system may produce an output signal to control the system directly, or output the optimized control policy. The system preferably is a reinforcement learning system which continually improves.

Type: Application

Filed: November 22, 2016

Publication date: January 11, 2018

Inventors: Robert Wright, Lei Yu, Steven Loscalzo

Approximate value iteration with complex returns by bounding

APPROXIMATE VALUE ITERATION WITH COMPLEX RETURNS BY BOUNDING

Approximate value iteration with complex returns by bounding

APPROXIMATE VALUE ITERATION WITH COMPLEX RETURNS BY BOUNDING