Abstract: Multi-modal system which enables documents to be interpreted as either or both of voice browser-based documents and/or visual browser-based documents for a thin client such as a portable telephone. Special techniques and additions are added into the document to enable the document to be converted between voice markup and visual markup languages. In addition, the document can be simultaneously viewed in both of the voice markup and the visual markup languages. Special techniques are used to allow keeping track of the browsing position within this document.