Abstract: There is provided a computer implemented method of automatically creating a training dataset comprising a plurality of records, wherein a record includes: an image of a sample of an object, an indication of monitored manipulations by a user of a presentation of the sample, and a ground truth indication of a monitored gaze of the user viewing the sample on a display or via an optical device mapped to pixels of the image of the sample, wherein the monitored gaze comprises at least one location of the sample the user is viewing and an amount of time spent viewing the at least one location.