The Structure And Dynamics Of Large Social Networks
Loading...
Files
No Access Until
Permanent Link(s)
Collections
Other Titles
Authors
Abstract
In this thesis, we first explore two different approaches to efficient community detection that address different aspects of community structure. We establish the definition of community fundamentally different from previous literature, where communities were typically assumed to be densely connected internally but sparsely connected to the rest of the network. A community should be considered as a densely connected subgraph in which the probability of an edge between any two vertices is higher than average. Further, a community should also be well connected to the remaining network, that is, the number of edges connecting a community to the rest of the graph should be significant. In order to identify a well-defined community, we provide rigorous definitions of two terms: "whiskers" and the "core". Whiskers correspond to subsets of vertices that are barely connected to the rest of the network, while the core exclusively contains the type of community we are interested in. We prove that detecting whiskers, or equivalently, extracting the core, is an NP-complete problem for both weighted and unweighted graphs. Then, three heuristic algorithms are proposed for finding an approximate core and are evaluated for their performance on large networks, which reveals the common existence of the core structure in both random and real-world graphs. Well-defined communities can be extracted from the core using a number of techniques, and the experimental results not only justify our intuitive notion of community, but also demonstrate the existence of large-scale communities in various networks. An ([alpha], [beta] )-community is a connected subgraph C with each vertex in C connected to at least [beta] vertices of C (self-loops counted) and each vertex outside of C connected to at most [alpha] vertices of C ([alpha]<[beta] ). We present a heuristic algorithm that in practice successfully finds a fundamental community structure. We also explore the structure of ([alpha], [beta] )-communities in various social networks. ([alpha], [beta] )-communities are well clustered into a small number of disjoint groups, and there are no isolated ([alpha], [beta] )-communities scattered between these groups. Two ([alpha], [beta] )-communities in the same group have significant overlap, while those in different groups have extremely small pairwise resemblance. A surprising core structure is discovered by taking the intersection of each group of massively overlapping ([alpha], [beta] )-communities. Similar experiments on random graphs demonstrate that the core structure found in many social networks is due to their underlying social structure, rather than due to high-degree vertices or a particular degree distribution. In many social networks, there exist two types of users that exhibit different influence and different behavior. For instance, statistics have shown that less than 1% of the Twitter users (e.g. entertainers, politicians, writers) produce 50% of its content [1], while the others (e.g. fans, followers, readers) have much less influence and completely different social behavior. In this thesis, we define and explore a novel problem called community kernel detection in order to uncover the hidden community structure in large social networks. We discover that influential users pay closer attention to those who are more similar to them, which leads to a natural partition into different community kernels. We propose G REEDY and W E BA, two efficient algorithms for finding community kernels in large social networks. G REEDY is based on maximum cardinality search, while W E BA formalizes the problem in an optimization framework. We conduct experiments on three large social networks: Twitter, Wikipedia, and Coauthor, which show that W E BA achieves an average 15-50% performance improvement over the other state-of-the-art algorithms, and W E BA is 6-2,000 times faster on average in detecting community kernels. Cascading processes, such as disease contagion, information diffusion, and viral marketing, are a pervasive phenomenon in many types of networks. The problem of devising intervention strategies to facilitate or inhibit such processes has recently received considerable attention. However, a major challenge is that the underlying network is often unknown. In this thesis, we revisit the problem of inferring latent network structure given observations from a diffusion process, such as the spread of trending topics in social media. We define a family of novel probabilistic models that can explain recurrent cascading behavior, and take into account not only the time differences between events but also a richer set of additional features. We show that MAP inference is tractable and can therefore scale to very large real-world networks. Further, we demonstrate the effectiveness of our approach by inferring the underlying network structure of a subset of the popular Twitter following network by analyzing the topics of a large number of messages posted by users over a 10-month period. Experimental results show that our models accurately recover the links of the Twitter network, and significantly improve the performance over previous models based entirely on time.
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
2013-01-28
Publisher
Keywords
social influence; community detection; network inference
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Hopcroft, John E
Committee Co-Chair
Committee Member
Resnick, Sidney Ira
Chen, Tsuhan
Chen, Tsuhan
Degree Discipline
Electrical Engineering
Degree Name
Ph. D., Electrical Engineering
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Rights URI
Types
dissertation or thesis