Probabilistic Causal Analysis of Social Influence
Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, homophily and other spurious causes. However, most of the studies to characterize social influence and, in general, most data-science analyses focus on correlations, statistical independence, conditional independence etc.; only recently, there has been a resurgence of interest in "causal data science", e.g., grounded on causality theories. In this paper we adopt a principled causal approach to the analysis of social influence from information-propagation data, rooted in probabilistic causal theory. Our approach develops around two phases. In the first step, in order to avoid the pitfalls of misinterpreting causation when the data spans a mixture of several subtypes ("Simpson's paradox"), we partition the set of propagation traces in groups, in such a way that each group is as less contradictory as possible in terms of the hierarchical structure of information propagation. For this goal we borrow from the literature the notion of "agony" and define the Agony-bounded Partitioning problem, which we prove being hard, and for which we develop two efficient algorithms with approximation guarantees. In the second step, for each group from the first phase, we apply a constrained MLE approach to ultimately learn a minimal causal topology. Experiments on synthetic data show that our method is able to retrieve the genuine causal arcs w.r.t. a known ground-truth generative model. Experiments on real data show that, by focusing only on the extracted causal structures instead of the whole social network, we can improve the effectiveness of predicting influence spread.
READ FULL TEXT