Analysis Of Large-Scale Data From Human Activities On The Web

Other Titles
This work focuses on data mining and machine learning using large-scale datasets, with an emphasis on Web information, social computing, and on-line social networks. These datasets are becoming more numerous, and as the Web's reach grows, it is important to understand these datasets for two reasons. First, better understanding of the systems generating the data allows us to improve the systems. For example, by looking at where search queries come from, we can better select what results and advertisements to display. Second, an in-depth understanding of the data allows us to leverage it for a variety of purposes. For instance, by looking at the geographic sources of queries we can discover the reach of various ideas. In particular we will develop new algorithms to deal with these large datasets, answering the subtle and nuanced questions that require a huge amount of data and novel methodology. We will examine large social networks, and processes related to these networks such as group formation and network evolution. We will also look at data from web search, showing that it is a rich source of information which, when combined with IP address geolocation can tell us a great deal about the geographic extent of various terms. In addition to learning about these systems, we will also design algorithms for improving them. Through the use of server logs, we will show how changing content can be scheduled more optimally on web pages. Finally, we will examine some of the privacy implications of this style of research, showing a negative result which illustrates how careful we must be with our data.
Journal / Series
Volume & Issue
Date Issued
Dat Mining
Effective Date
Expiration Date
Union Local
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Degree Discipline
Degree Name
Degree Level
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
Link(s) to Reference(s)
Previously Published As
Government Document
Other Identifiers
Rights URI
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record