Social media communities like Reddit and Twitter allow users to express
their views on topics of their interest, and to engage with other users
who may share or oppose these views. This can lead to productive
discussions towards a consensus, or to contended debates, where
disagreements frequently arise. Prior work on such settings has
primarily focused on identifying notable instances of antisocial
behavior such as hate-speech and "trolling", which represent possible
threats to the health of a community. These, however, are exceptionally
severe phenomena, and do not encompass controversies stemming from user
debates, differences of opinions, and off-topic content, all of which
can naturally come up in a discussion without going so far as to
compromise its development.
This dissertation proposes a framework for the systematic analysis of
social media discussions that take place in the presence of
controversial themes, disagreements, and mixed opinions from
participating users. We start by building a feature model to
characterize adversarial discussions surrounding political campaigns on
Twitter, with a focus on the factual and sentimental nature of their
topics and the role played by different users involved. We then extend
our approach to Reddit discussions, leveraging community feedback
signals to define a new notion of controversy and to highlight
conversational archetypes that arise from frequent and interesting
interaction patterns. We use our feature model to build logistic
regression classifiers that can predict future instances of controversy
in Reddit communities centered on politics, world news, sports, and
personal relationships. Finally, our model also provides the basis for a
comparison of different communities in the health domain, where topics
and activity vary considerably despite their shared overall focus.