Show simple item record

dc.contributor.authorHaque, Asif-ulen_US
dc.identifier.otherbibid: 7955398
dc.description.abstractEver increasing participation has made the interaction between information and social systems not only interesting to observe but essential to quantify and analyze. This dissertation presents methods for understanding such interaction through combined analysis of metadata, networks, text and log data. ArXiv, an open and highly influential scholarly communication system, served as the testbed for these methods. In the first part of this dissertation we examine in depth interesting phenomena such as self-promotion, procrastination, visibility and geographic differences. We have confirmed the predictive power of early readership through regression and discussed undesirable effects of recommendation and possibilities of new impact metrics. In the second part we demonstrate extraction of subtopical concepts, characterized by phrases, through a statistical method for vocabulary selection and a network based ranking. Validation via search query and click logs is advocated as relevant and scalable. A clustering scheme to summarize temporal patterns of topic clicks is also presented. In the last part of this dissertation we present a name disambiguation algorithm and a novel evaluation method using node role based sampling in the context of network analysis. Finally we provide guidelines on performing large scale graph computation using the Map-Reduce framework.en_US
dc.subjectdata miningen_US
dc.subjecttext miningen_US
dc.titleInformation And Social System Interactionen_US
dc.typedissertation or thesisen_US Science Universityen_US of Philosophy D., Computer Science
dc.contributor.chairFriedman, Eric J.en_US
dc.contributor.committeeMemberGinsparg, Paul Henryen_US
dc.contributor.committeeMemberWilliamson, David Pen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record