Abstract: There is provided a method for learning a bipedal robot control policy, the method includes (i) learning, by a processing circuit, an action-related corrective policy that once applied reduces a gap associated with an initial simulation state transition function and with a real world state transition function; and (ii) determining a control policy of the bipedal robot in a simulator, using the action-related corrective policy.