Abstract: Systems and methods may train neural networks (NNs) and determine when to stop training to not waste computing or other resources when improvement is not no longer likely. After training period for a NN, a model trained using training data from other NNs may return a a probability of improvement in the loss of the NN or a probability that the likely best loss of the NN is lower than the best loss of the other NNs for which hyperparameters have been chosen. Training may be stopped if the probability is less than a threshold, or a wait value is greater than a wait threshold.