Pontificia Universidad Católica de Chile Pontificia Universidad Católica de Chile
Mendoza M., Parra D., Soto A. (2020)

GENE: Graph generation conditioned on named entities for polarity and controversy detection in social media

Revista : Information Processing & Management
Páginas : 102366
Tipo de publicación : ISI Ir a publicación


Many of the interactions between users on social networks are controversial, specially in polarized environments. In effect, rather than producing a space for deliberation, these environments foster the emergence of users that disqualify the position of others. On news sites, comments on the news are characterized by such interactions. This is detrimental to the construction of a deliberative and democratic climate, stressing the need for automatic tools that can provide an early detection of polarization and controversy. We introduce GENE (graph generation conditioned on named entities), a representation of user networks conditioned on the named entities (personalities, brands, organizations) which users comment upon. GENE models the leaning that each user has concerning entities mentioned in the news. GENE graphs is able to segment the user network according to their polarity. Using the segmented network, we study the performance of two controversy indices, the existing Random Walks Controversy (RWC) and another one we introduce, Relative Closeness Controversy (RCC). These indices measure the interaction between the network’s poles providing a metric to quantify the emergence of controversy. To evaluate the performance of GENE, we model the network of users of a popular news site in Chile, collecting data in an observation window of more than three years. A large-scale evaluation using GENE, on thousands of news, allows us to conclude that over 60% of user comments have a predictable polarity. This predictability of the user interaction scenario allows both controversy indices to detect a controversy successfully. In particular, our introduced RCC index shows satisfactory performance in the early detection of controversies using partial information collected during the first hours of the news event, with a sensitivity to the target class exceeding 90%.