The Structure And Dynamics Of Large Social Networks
In this thesis, we first explore two different approaches to efficient community detection that address different aspects of community structure. We establish the definition of community fundamentally different from previous literature, where communities were typically assumed to be densely connected internally but sparsely connected to the rest of the network. A community should be considered as a densely connected subgraph in which the probability of an edge between any two vertices is higher than average. Further, a community should also be well connected to the remaining network, that is, the number of edges connecting a community to the rest of the graph should be significant. In order to identify a well-defined community, we provide rigorous definitions of two terms: "whiskers" and the "core". Whiskers correspond to subsets of vertices that are barely connected to the rest of the network, while the core exclusively contains the type of community we are interested in. We prove that detecting whiskers, or equivalently, extracting the core, is an NP-complete problem for both weighted and unweighted graphs. Then, three heuristic algorithms are proposed for finding an approximate core and are evaluated for their performance on large networks, which reveals the common existence of the core structure in both random and real-world graphs. Well-defined communities can be extracted from the core using a number of techniques, and the experimental results not only justify our intuitive notion of community, but also demonstrate the existence of large-scale communities in various networks. An ([alpha], [beta] )-community is a connected subgraph C with each vertex in C connected to at least [beta] vertices of C (self-loops counted) and each vertex outside of C connected to at most [alpha] vertices of C ([alpha]<[beta] ). We present a heuristic algorithm that in practice successfully finds a fundamental community structure. We also explore the structure of ([alpha], [beta] )-communities in various social networks. ([alpha], [beta] )-communities are well clustered into a small number of disjoint groups, and there are no isolated ([alpha], [beta] )-communities scattered between these groups. Two ([alpha], [beta] )-communities in the same group have significant overlap, while those in different groups have extremely small pairwise resemblance. A surprising core structure is discovered by taking the intersection of each group of massively overlapping ([alpha], [beta] )-communities. Similar experiments on random graphs demonstrate that the core structure found in many social networks is due to their underlying social structure, rather than due to high-degree vertices or a particular degree distribution. In many social networks, there exist two types of users that exhibit different influence and different behavior. For instance, statistics have shown that less than 1% of the Twitter users (e.g. entertainers, politicians, writers) produce 50% of its content , while the others (e.g. fans, followers, readers) have much less influence and completely different social behavior. In this thesis, we define and explore a novel problem called community kernel detection in order to uncover the hidden community structure in large social networks. We discover that influential users pay closer attention to those who are more similar to them, which leads to a natural partition into different community kernels. We propose G REEDY and W E BA, two efficient algorithms for finding community kernels in large social networks. G REEDY is based on maximum cardinality search, while W E BA formalizes the problem in an optimization framework. We conduct experiments on three large social networks: Twitter, Wikipedia, and Coauthor, which show that W E BA achieves an average 15-50% performance improvement over the other state-of-the-art algorithms, and W E BA is 6-2,000 times faster on average in detecting community kernels. Cascading processes, such as disease contagion, information diffusion, and viral marketing, are a pervasive phenomenon in many types of networks. The problem of devising intervention strategies to facilitate or inhibit such processes has recently received considerable attention. However, a major challenge is that the underlying network is often unknown. In this thesis, we revisit the problem of inferring latent network structure given observations from a diffusion process, such as the spread of trending topics in social media. We define a family of novel probabilistic models that can explain recurrent cascading behavior, and take into account not only the time differences between events but also a richer set of additional features. We show that MAP inference is tractable and can therefore scale to very large real-world networks. Further, we demonstrate the effectiveness of our approach by inferring the underlying network structure of a subset of the popular Twitter following network by analyzing the topics of a large number of messages posted by users over a 10-month period. Experimental results show that our models accurately recover the links of the Twitter network, and significantly improve the performance over previous models based entirely on time.
social influence; community detection; network inference
Hopcroft, John E
Resnick, Sidney Ira; Chen, Tsuhan
Ph.D. of Electrical Engineering
Doctor of Philosophy
dissertation or thesis