Abstract: A scanning device that scans a document and allows a user to voice annotate the scanned document by speaking into a voice pickup located in the device. The data from scanning is saved as an image data file in device memory. The device digitizes the speech input, compresses the speech, and saves the speech as a voice clip file in device memory. The device establishes a connection between the voice clip file and the image data file. When the user uploads the image data file to a host computer, the voice clip file is automatically transferred. When the user selects an image data file through a user interface, the voice clip is automatically played back or the user is notified of the voice clip. The play back process involves decompressing the speech and then reformatting the speech into a format which the host computer sound card can recognize.