Abstract: A system for recognizing the content of a communication in symbolic language and composed of plural glyphs arranged in a predetermined order, each glyph being the smallest (lowest) informational unit of the language. The system includes a device for inputting a stream of data indicative of the plural glyphs, such as formed in a page of text. That stream is input into a storage means. The stored data is horizontally segmented into discrete lines of text and is then vertically segmented into individual glyphs. Each individual glyph is assigned a unique identifier, whereby all substantially identical glyphs are represented by the same identifier. The identifiers are arranged in a sequence corresponding to the sequence in which the glyphs appeared in the communication, thus representing glyph "words".