Analysis Of Large-Scale Data From Human Activities On The Web

Other Titles


This work focuses on data mining and machine learning using large-scale datasets, with an emphasis on Web information, social computing, and on-line social networks. These datasets are becoming more numerous, and as the Web's reach grows, it is important to understand these datasets for two reasons. First, better understanding of the systems generating the data allows us to improve the systems. For example, by looking at where search queries come from, we can better select what results and advertisements to display. Second, an in-depth understanding of the data allows us to leverage it for a variety of purposes. For instance, by looking at the geographic sources of queries we can discover the reach of various ideas. In particular we will develop new algorithms to deal with these large datasets, answering the subtle and nuanced questions that require a huge amount of data and novel methodology. We will examine large social networks, and processes related to these networks such as group formation and network evolution. We will also look at data from web search, showing that it is a rich source of information which, when combined with IP address geolocation can tell us a great deal about the geographic extent of various terms. In addition to learning about these systems, we will also design algorithms for improving them. Through the use of server logs, we will show how changing content can be scheduled more optimally on web pages. Finally, we will examine some of the privacy implications of this style of research, showing a negative result which illustrates how careful we must be with our data.

Journal / Series

Volume & Issue



Date Issued




Dat Mining


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record