We begin with harvesting two corpora in Polish from the domains of mobile phone and book reviews, propose a simple automatic annotation method, and manually evaluate the annotation accuracy on a sample. Then, we compare methods of sentiment lexicon acquisition by automatic translation of similar English resources. We also propose a sentiment reversal model for opinion words and evaluate its usefulness using a basic rulebased classifier on a subset of corpus. Finally, we evaluate the influence of different text pre-processing methods and feature sets on the accuracy of sentiment classification in Polish, and show how the baseline bag-of-words model can be improved by incorporation of semantic features based on the harvested sentiment lexicon and the Polish WordNet.
Our endeavor delivers a reference and a new suite of tools that enables sentiment analysis also for Polish.