Abstract: A method for performing sentiment analysis on Arabic text may be described. Training text data may be preprocessed by removing non-Arabic characters, numbers, control characters or graphics. Since Arabic words may include the same letters but written in a different format, an embodiment may identify common letters and unify them in order to remove or avoid duplicates. An annotator may label portions of the data, such as words, terms, or phrases, as positive, negative. A lexicon may be formed based on the labeled training data. The bag-of-phrases may be formed from the training text data, which may be used to analyze the targeted data for sentiment. Based on the distribution of words or phrases, a sentiment may be formed indicating a sentiment of each portion of the target data.