The Dynamics Of Information Diffusion On On-Line Social Networks
Although there has been a long history of studying the diffusion of information in various social science fields, existing theories are mostly built on direct observations in small networks or survey responses from large samples. As a result, it is hard to verify or refute these theories empirically on a large scale. In recent years, the abundance of digital records of online interactions has provided us for the first time both explicit network structure and detailed dynamics, supporting global-scale, quantitative study of diffusion in the real world. Using these large scale datasets collected from social media sites, we are able to dissect and study the process of information diffusion in its three components: people, information, and network. This thesis mainly addresses a few long-standing questions about each component, including: "who influences whom?", "how do different types of information spread?", and "how does the network structure impact the diffusion process?" In our search of answers for these questions, we realize that these three components are interconnected, constantly interacting with each other in real-world diffusion processes. Thus our results on each component should not be taken in isolation but be viewed interdependently. To understand who influences whom in today's hybrid communication environment, we study people's influence on social media based on their role in the global media ecosystem. By categorizing Twitter accounts into elite (i.e. celebrities, media outlets, organizations, and bloggers) and ordinary users, we find a striking concentration of attention on a minority of elite users, and significant homophily within elite categories. On the other hand, following the definition of "opinion leaders" in the classical "two-step flow" theory, we find a large population of opinion leaders who serve as a layer of intermediaries between the elite users and the masses. The next question we ask is the role of content in the diffusion process. In contrast to previous research on the virality of information, we switch our focus to the persistence of information, trying to understand why certain content keeps on spreading in social media for a long time while most does not. First, we see an interaction effect, from both people and content, on the lifespan of information. As a result, there is a significant difference in lifespan, for information broadcast by different categories of users. Second, we find a strong association between the linguistic style of content and its temporal dynamics: rapidly-fading information contains significantly more words related to negative emotion, actions, and more complicated cognitive processes, whereas persistent information contains more words related to positive emotion, leisure, and lifestyle. In the end, we conduct a longitudinal study of the local and global structure of several large social networks, asking how and where disengagement happens in the social graph. We find that, although there is a significant correlation in both arrival and departure among friends, the dynamics of departure behave differently from the dynamics of arrival. In particular, for the majority of users with a sufficient number (e.g., greater than 20) of friends, departure is best predicted by the overall fraction of active friends within a user's neighborhood, independent of the size of the neighborhood. We also find that active users tend to belong to a core that is densifying and is significantly denser than the inactive users, and the inactive set of users exhibit a higher density and lower conductance than the degree distribution alone can explain. These two aspects suggest that nodes at the fringe are more likely to depart and subsequent departures are correlated among neighboring nodes in tightly-knit communities.
social network analysis; information diffusion; social media
Macy, Michael Walton
Kleinberg, Jon M; Cosley, Daniel R.
Ph.D. of Information Science
Doctor of Philosophy
dissertation or thesis