CITIES AS COMPLEX SYSTEMS: SOCIAL INTERACTIONS, 
AGGLOMERATION, AND ECONOMIC GROWTH 
 
 
 
 
 
 
 
 
A Dissertation 
Presented to the Faculty of the Graduate School 
of Cornell University 
In Partial Fulfillment of the Requirements for the Degree of 
Doctor of Philosophy 
 
 
 
 
 
 
by 
Jaebeum Cho 
May 2018
 
 
 
 
 
 
 
 
 
 
 
 
 
©  2018 Jaebeum Cho
 
 
CITIES AS COMPLEX SYSTEMS: SOCIAL INTERACTIONS, 
AGGLOMERATION, AND ECONOMIC GROWTH 
 
Jaebeum Cho, Ph. D. 
Cornell University 2018 
 
 
A key distinguishing feature of cities is that the population density is high relative to 
non-urban areas. Arising from this density is the frequent contact between various 
socioeconomic actors, which provides for the means of social interactions as well as 
productivity gains accrued through agglomeration economies. This collection of 
papers begins on the premise that social interactions underlie economic forces, which 
constitute the ingredients of the complex system that is the urban economy, jointly 
determining the outcome of cities as a whole. 
 With such a view of the urban economy, this dissertation attempts to answer a 
series of key questions regarding the interface between social interactions, 
agglomeration economies, new firm formation, and economic growth. The first paper 
proposes an agent-based model of social network formation that explicitly considers 
space and untangles the complex relationship between social interaction dynamics and 
inequalities in socioeconomic resources. The second paper builds upon the insight that 
social interactions and economic outcomes are related and addresses the question of 
how social interactions and agglomeration economies jointly determine new firm 
formation in cities. Finally, the last paper attempts to answer the critical question of 
how urban economies should grow, under the premise that growth takes place through 
 
changes in industrial structure brought about by entrepreneurship in particular 
industries.  
 Knowledge of the underlying mechanisms of social interactions, and how such 
interactions bring about new firm formation and economic growth provides for both a 
theoretical and empirical framework for which planning interventions can be made 
within the realm of community and economic development. The findings could be 
used to assist planners in better understanding the workings of the urban economy and 
inform decision making that aims to promote sustained economic growth.
 
BIOGRAPHICAL SKETCH 
 
Jaebeum Cho was born in Seoul, Korea yet spent most of his childhood living outside 
of his homeland in countries such as the U.S., Canada, and Singapore. He obtained 
bachelors and masters degrees in Urban Planning and Engineering from Yonsei 
University, Korea and worked as an economic development planner for three years 
prior to joining the City and Regional Planning department at Cornell University in the 
Fall of 2012. Since then, his doctoral research has revolved around community 
economic development, with a particular emphasis on regional science and urban 
economics.   
v 
 
 
 
 
 
 
 
 
 
 
 
 
To everyone that has helped me along this journey
vi 
 
ACKNOWLEDGMENTS 
 
 
I would like to express my deepest appreciation to my committee chair, Professor 
Kieran P. Donaghy, as well as to my committee members, Professor Yuri S. Mansury, 
Professor M. Diane Burton, and Professsor Benjamin T. Cornwell for their continued 
and extensive support all throughout my doctoral studies. I truly was gifted with a 
supportive committee that provided me with both knowledge as well as emotional 
support during good and bad times.  
 I would also like to thank my family for their support, as well as their 
unwavering faith in me even when I sometimes lost faith in myself. In addition, I 
would like to acknowledge the Ewing Marion Kauffman Foundation, for which this 
research was partially funded by. Even with the aid provided by others, I acknowledge 
that the contents of this publication are solely the responsibility of myself.   
vii 
 
TABLE OF CONTENTS 
 
CHAPTER 1  
INTRODUCTION .......................................................................................................... 1 
 
CHAPTER 2 
CHURNING, POWER LAWS, AND INEQUALITY 
IN A SPATIAL AGENT-BASED MODEL OF SOCIAL NETWORKS .................... 11 
2.1. Introduction  .......................................................................................................... 11 
2.2. Theoretical framework  ......................................................................................... 15 
2.3. The model  ............................................................................................................ 21 
2.4. Algorithm implementation  ................................................................................... 25 
2.5. Simulation results ................................................................................................. 29 
2.6. Conclusions  .......................................................................................................... 50 
 
CHAPTER 3  
AGGLOMERATION, REGIONAL SOCIAL CAPITAL, 
AND ENTREPRENEURSHIP IN CITIES  ................................................................ 68 
3.1. Introduction  .......................................................................................................... 68 
3.2. Related literature  .................................................................................................. 72 
3.3. Data and variables  ................................................................................................ 79 
3.4. Empirical framework  ........................................................................................... 97 
3.5. Results  ................................................................................................................ 101 
3.6. Conclusions  ........................................................................................................ 113 
 
CHAPTER 4 
PATHWAYS FOR ENTREPRENEURSHPI DRIVEN 
ECONOMIC GROWTH: ENVISIONING THE 
INDUSTRY SPACE  ................................................................................................. 128 
4.1. Introduction  ........................................................................................................ 128 
4.2. Related literature  ................................................................................................ 131 
4.3. The industry space  ............................................................................................. 139 
4.4. Empirical framework  ......................................................................................... 154 
4.5. Results  ................................................................................................................ 156 
4.6. Conclusions  ........................................................................................................ 165 
 
CHAPTER 5 
CONCLUDING REMARKS  .................................................................................... 173 
 
viii 
 
LIST OF FIGURES 
 
Figure 2.1. ABM flow chart  ........................................................................................ 27 
Figure 2.2. Degree distributions for select parameter settings  .................................... 30 
Figure 2.3. Relationship between power law fit and network churn parameters  ........ 33 
Figure 2.4. Network formation dynamics under two different parameter 
           configurations  ............................................................................................ 34 
Figure 2.5. Tie formation, decay, and aggregate social capital  .................................. 36 
Figure 2.6. Relationships between tie-formation, decay, and the Gini coefficient  ..... 39 
Figure 2.7. Differences in social capital between high and low human capital 
           agents  ......................................................................................................... 42 
Figure 2.9. Spatial distribution of agents with 𝑆̿̿̿̿𝐶𝑖 > 1 for representative 
           parameter settings ....................................................................................... 47 
Figure 2.10. Differences in social capital between introverts and extroverts  ............. 49 
Figure 4.1. Network of industries based on Ellison-Glaeser 
           coagglomeration index  ............................................................................ 146 
Figure 4.2. Network of industries  ............................................................................. 147 
Figure 4.3. Entrepreneurship activity for the New York-Northern 
           New Jersey-Long Island MSA (top) and Los Angeles- 
           Long Beach-Santa Ana MSA (bottom)  ................................................... 151 
Figure 4.4. Average weighted centrality versus linear prediction for GDP  .............. 162 
Figure 4.5. Average marginal effects of centrality measure at different 
           levels of population  ................................................................................. 162 
Figure 4.6. MSA groupings by centrality levels  ....................................................... 164 
 
 
 
 
 
 
 
 
 
 
 
ix 
 
LIST OF TABLES 
 
Table 3.1. Count of new firms and entry rates for single and all 
          establishment births  .................................................................................... 83 
Table 3.2. Select descriptive statistics for variables  ................................................... 96 
Table 3.3. Births of single (start-up) and all establishments ..................................... 102 
Table 3.4. Births of single (start-up) and all establishments, 
          traded versus local industries, Poisson estimates  ...................................... 108 
Table 3.5. Births of single (start-up) and all establishments, 
          high-tech versus low-tech industries, Poisson estimates  .......................... 110 
Table 3.6. Births of single (start-up) and all establishments, 
          manufacturing versus non-manufacturing industries,  
          Poisson estimates  ...................................................................................... 112 
Table 4.1. Ellison-Glaeser (EG) coagglomeration index values  ............................... 143 
Table 4.2. Weighted degree centrality of 4 digit NAICS industries  ......................... 149 
Table 4.3. Average centrality of MSAs  .................................................................... 153 
Table 4.4. Summary statistics  ................................................................................... 155 
Table 4.5. Regression results – Log employment  ..................................................... 157 
Table 4.6. Regression results – Log GDP  ................................................................. 159 
Table 4.7. Regression results – Log GDP per capita  ................................................ 160 
 
 
 
 
 
 
 
 
 
 
x 
CHAPTER 1 
INTRODUCTION 
 
Traditionally, there has been much debate regarding the exact definition of a city. 
Urban economists usually define a city as “a geographical area that contains a large 
number of people in a relatively small area (O’Sullivan 2012),” while the Economist 
Intelligence Unit defines cities as “the urban agglomeration or metropolitan area it 
holds together (Economist Intelligence Unit 2013).” Other more specific definitions of 
cities exist; for example, the U.S. Census Bureau considers urban areas to be 
geographical areas with a minimum population of 2,500 people and a minimum 
density of 500 people per square mile, and a Metropolitan Statistical Area (MSA) to 
be a core area with a substantial population nucleus and adjacent areas that are 
economically integrated, with a total population of 50,000 or above.  
Whichever definition is used, a city distinguishes itself from non-urban areas 
in that the population density is high relative to the density of surrounding regions. 
This emphasis on population density is due to an essential feature of an urban area, 
namely the frequent contact between different socioeconomic actors, which is feasible 
only when individuals, firms and households are concentrated in a relatively small 
area. The natural question to ask then is why do cities exist? Considering that people 
need land to produce food and other essential resources, living in cities is in a sense 
counterintuitive for it separates us from the origins where critical commodities are 
produced. Furthermore, cities are noisy, dirty, and crowded.  The presence of cities 
1 
 
despite these drawbacks is due to a number of factors, which relate to the benefits of 
colocation that more than offset the negative effects.  
The fundamental benefits of density are due to increased productivity resulting 
from specialization and agglomeration (Marshall 1920; Smith 1776). Specialization 
allows each person to be more productive by allowing for 1) allocational efficiency, 
and 2) technical efficiency. Allocational efficiency is related to making the best use of 
a particular worker’s skillset by assigning different tasks to workers who possess 
different aptitudes. Technical efficiency arises from the reduction of transition times 
between different tasks. Specialization is benefited by higher density for more workers 
allow for a better skillset match. In addition to specialization, higher density results in 
scale economies, or increasing returns to scale in production. Due to various 
agglomeration externalities, the increase in output is more than proportional to the 
increase in inputs, resulting in a decline in average costs and thus higher productivity.  
 Alfred Marshall (1920) famously noted the underlying mechanisms of 
agglomeration economies, or the economic forces that cause firms to locate close to 
one another. The first is related to the sharing of intermediate inputs, where competing 
firms locate close to one another to share intermediate inputs of production. 
Intermediate inputs are goods and services that a firm produces that is used as inputs 
in the production process of other firms; for example, the classic example is that of 
dressmaking firms sharing a buttonmaker (Vernon 1972). Due to economies of scale, 
the cost per intermediate input decreases as the quantity increases, leading to lower 
production costs. The second agglomeration economy is related to the sharing of labor 
pools. A large labor market allows for workers to readily shift across employers, thus 
2 
 
reducing labor market uncertainty, and also facilitates better matches between firms 
and workers (Helsley and Strange 1990). Finally, knowledge spillovers are also a 
source of agglomeration economies, and entails the sharing of knowledge among firms 
in an industry. This results in “the mysteries of the trade becoming no mystery; but are 
as it were in the air (Marshall 1920).”  
 One of the key arguments presented in this dissertation is that agglomeration 
economies exist due to the presence of social interactions (Durlauf and Ioannides 
2010; Glaeser 2008; Ioannides 2013). For example, proximity to customers and 
suppliers may reduce the costs of obtaining inputs or transporting goods to 
downstream consumers (Ellison, Glaeser, and Kerr 2010; Fujita, Krugman, and 
Venables 1999), but it also may embody stronger social ties between similar firms and 
customers that increases trust and information exchange (Dahl and Sorenson 2012). 
Similarly, labor market pooling shields workers from firm-specific shocks (Krugman 
1991) and promotes better worker-firm matches (Helsley and Strange 1990), but it 
also represents social homophily (McPherson, Smith-Lovin, and Cook 2001). 
Especially with knowledge spillovers, the spillover of ideas is possible because 
individuals collocate and gain information through social linkages, which allow the 
knowledge “in the air (Marshall 1920)” to be shared with one another (Saxenian 
1996).  
 The overarching theme of the papers included in this dissertation is that the 
economies of cities comprise a “complex system,” which is defined as a system that 
exhibits adaptive, nontrivial, emergent, and self-organizing behaviors stemming from 
agents with rules of operation and no central control (Arthur 2013). The reasoning 
3 
 
behind this view stems from the fact that economic agents are faced with fundamental 
uncertainty; they do not know what they face, and thus in any economic situation, 
forecasts, strategies, and actions are being “tested” for survival within a situation those 
beliefs, strategies and actions create. Within an urban economy, people, firms, and 
governments react to the aggregate outcome these agents together create, without a 
mechanism for central control. Furthermore, at the regional level, regional economies 
evolve by adapting to their current circumstances and through competition and 
collaboration with neighboring areas. Thus in this sense, urban economies are the 
perfect example of a complex system. 
 Another key aspect of a complex system is that the individual agents interact 
with one another to bring about emergent outcomes. Considering the agglomeration 
mechanisms discussed above, these types of externalities that occur due to physical 
proximity directly embody interactions within space; otherwise, there simply would be 
no benefit due to spatial proximity. Furthermore, the social interactions that underlie 
agglomeration mechanisms are also manifested due to social networks and the 
interaction of social actors within an area. In order to best represent the urban 
economy as a complex system, I will thus forward stress the importance of 
representing various aspects of the economy as networks of firms, organizations, 
people, and even ideas. Networks, by definition, are the joint set of nodes and their 
linkages, which makes them a perfect vehicle onto which the interactions that occur 
within a spatial economy can be depicted and analyzed.  
 Utilizing this theoretical framework, this dissertation attempts to answer a 
series of key questions regarding the interface between social interactions, 
4 
 
agglomeration economies, new firm formation, and economic growth. The first paper 
focuses on the question of how the dynamics of social interactions that take place 
within a spatial setting affect the inequality in socioeconomic resources among social 
actors. Utilizing an agent-based model of social network formation that explicitly 
considers space, one of the main contributions of the paper is the addition of the 
spatial dimension to social network analysis. Traditional models of networks such as 
the preferential attachment model of Barabási and Albert (1999), the random graph 
model of Erdos and Rényi (1960), or the small-world model of Watts and Strogatz 
(1998) all assume that geographic space has little to no relevance. Casual empiricism 
however suggests that space matters in important ways. Urban dwellers for example 
rely much more for mutual support on local neighbors than on acquaintances in other 
cities (Gans 1962; Mansury and Shin 2015). By situating social actors within space 
and varying the dynamics of social interactions, the paper attempts to answer the 
policy relevant question of how to minimize inequalities in socioeconomic resources, 
given a model of social network formation that benefits actors with more connections 
and underlying human capital. 
 The second paper moves on to address the question of how social interactions 
and agglomeration economies jointly determine new firm formation within cities. The 
key argument in this paper is that social interactions, and more broadly social capital 
within the community or region, aids entrepreneurs in the early stages of forming new 
firms. Social aspects of the region have been viewed to be a crucial element of 
regional competitiveness (Kitson, Martin, and Tyler 2004; Porter 2003), where the 
social characteristics of a region are not simple aggregations of firms or individuals. 
5 
 
Porter (1998) suggests that a key component of cluster formation and success is the 
degree of social embeddedness, the existence of facilitative social networks, social 
capital, and institutional structures. Similarly, Storper (1995, 2013) stresses the 
importance of “untraded interdependencies” such as networks of trust and cooperation 
as well as local norms and conventions when considering the success of regions. Thus, 
the natural question to ask is whether there is a role that regional social capital plays in 
promoting entrepreneurship, over and above the effect of social interactions at the 
micro-level. This paper attempts to unify the treatment of regional social capital and 
agglomeration economies as being part of the broader “entrepreneurial ecosystem” of 
a region, where the ecosystem takes its form in various types of networks and their 
linkages. The aim is to present findings regarding the relative strengths of these 
mechanisms that may aid planners and policy makers in promoting entrepreneurship 
and economic growth within cities. 
 The final paper attempts to answer the critical question of how urban 
economies should grow. Many theories exist as to why economic growth takes place. 
Adam Smith (1776) emphasized capital deepening, or the increase in physical capital 
per worker, while more recent models of endogenous growth (Lucas 1988; Romer 
1986) focus on human capital, technological change, and knowledge economies. Of 
course, the regional science and urban economics literature has focused on 
agglomeration effects to play a critical role on growth at the urban level. The main 
contribution of this paper is to bridge the gap between the many theories that explain 
the causes of growth with the relative paucity of theories that elucidate how growth 
should take place, given the theoretical background. Integrating new insights from 
6 
 
complexity science and development economics with more traditional theories of 
economic development that exist in the urban planning and urban economics 
literatures, the paper studies optimal patterns of economic growth, defined as 
structural change (Lewis 1954) that takes place through a shift in the underlying 
industrial structure of cities caused by new firm formation. Such pathways for 
economic growth through new firm formation should prove to be a useful tool for 
planners and policy makers alike in promoting job creation and growth within 
communities and regions.   
  
7 
 
REFERENCES 
 
Arthur, W Brian. 2013. Complexity and the Economy. Oxford University Press. 
Barabási, Albert-László, and Réka Albert. 1999. “Emergence of Scaling in Random 
Networks.” Science 286 (5439): 509–12. 
Dahl, Michael S, and Olav Sorenson. 2012. “Home Sweet Home: Entrepreneurs’ 
Location Choices and the Performance of Their Ventures.” Management 
Science 58 (6): 1059–71. 
Durlauf, Steven N., and Yannis M. Ioannides. 2010. “Social Interactions.” Annual 
Review of Economics 2 (1): 451–78. 
Economist Intelligence Unit. 2013. Hot Spots 2025: Benchmarking the Future 
Competitiveness of Cities. The Economist, London. 
Ellison, Glenn, Edward L Glaeser, and William R Kerr. 2010. “What Causes Industry 
Agglomeration? Evidence from Coagglomeration Patterns.” The American 
Economic Review 100 (3): 1195–1213. 
Erdos, Paul, and Alfréd Rényi. 1960. “On the Evolution of Random Graphs.” 
Publications of the Mathematical Institute of the Hungarian Academy of 
Sciences 5: 17–61. 
Fujita, Masahisa, Paul Krugman, and Anthony J. Venables. 1999. The Spatial 
Economy: Cities, Regions, and International Trade. Cambridge, MA: MIT 
Press. 
Gans, Herbert J. 1962. The Urban Villagers: Group and Class in the Life of Italians-
Americans. [New York]: Free Press of Glencoe. 
8 
 
Glaeser, Edward L. 2008. Cities, Agglomeration, and Spatial Equilibrium. Oxford 
University Press. 
Helsley, Robert W, and William C Strange. 1990. “Matching and Agglomeration 
Economies in a System of Cities.” Regional Science and Urban Economics 20 
(2): 189–212. 
Ioannides, Yannis M. 2013. From Neighborhoods to Nations: The Economics of 
Social Interactions. Princeton University Press. 
Kitson, Michael, Ron Martin, and Peter Tyler. 2004. “Regional Competitiveness: An 
Elusive yet Key Concept?” Regional Studies 38 (9): 991–99. 
Krugman, Paul. 1991. Geography and Trade. Cambridge, MA: MIT Press. 
Lewis, W Arthur. 1954. “Economic Development with Unlimited Supplies of 
Labour.” The Manchester School 22 (2): 139–91. 
Lucas, Robert E. 1988. “On the Mechanics of Economic Development.” Journal of 
Monetary Economics 22: 3–42. 
Mansury, Yuri, and JK Shin. 2015. “Size, Connectivity, and Tipping in Spatial 
Networks: Theory and Empirics.” Computers, Environment and Urban Systems 
54: 428–37. 
Marshall, Alfred. 1920. Principles of Economics. London: MacMillan. 
McPherson, Miller, Lynn Smith-Lovin, and James M Cook. 2001. “Birds of a Feather: 
Homophily in Social Networks.” Annual Review of Sociology, 415–44. 
O’Sullivan, Arthur. 2012. Urban Economics. 8th ed. New York, NY: McGraw-
Hill/Irwin. 
9 
 
Porter, Michael E. 1998. “Location, Clusters, and the New Microeconomics of 
Competition.” Business Economics, 7–13. 
———. 2003. “The Economic Performance of Regions.” Regional Studies 37 (6–7): 
549–78. 
Romer, Paul M. 1986. “Increasing Returns and Long-Run Growth.” The Journal of 
Political Economy, 1002–37. 
Saxenian, AnnaLee. 1996. Regional Advantage: Culture and Competition in Silicon 
Valley and Route 128. Cambridge, MA: Harvard University Press. 
Smith, Adam. 1776. An Inquiry into the Nature and Causes of the Wealth of Nations. 
New York: Bartleby. 
Storper, Michael. 1995. “Competitiveness Policy Options: The Technology‐regions 
Connection.” Growth and Change 26 (2): 285–308. 
———. 2013. Keys to the City: How Economics, Institutions, Social Interaction, and 
Politics Shape Development. Princeton University Press. 
Vernon, Raymond. 1972. “External Economies.” In Readings in Urban Economics, 
27–49. eds. M. Edel and J. Rothenberg. New York: Macmillan. 
Watts, Duncan J, and Steven H Strogatz. 1998. “Collective Dynamics of ‘small- 
World’ Networks.” Nature 393 (6684): 440–42.  
10 
 
CHAPTER 2 
CHURNING, POWER LAWS, AND INEQUALITY IN A SPATIAL AGENT-
BASED MODEL OF SOCIAL NETWORKS 
 
2.1. Introduction 
Regional science since its inception has focused on socioeconomic phenomena with a 
spatial dimension (Nijkamp, Rose, & Kourtit, 2014). Social networks in particular are 
one of the defining issues of our time that have transformed how we think about 
socioeconomic phenomena. A growing body of empirical work measuring different 
aspects of social networks has indeed shown that connections matter for a variety of 
outcomes, such as getting jobs (Lin & Dumin, 1986), becoming more successful 
entrepreneurs (Greve & Salaff, 2003), or maintaining high-performing organizations 
(Borgatti & Cross, 2003). But while it is intuitive to many that social interactions 
should respect Tobler’s (1970) “first law of geography,” ground-breaking models of 
networks such as the preferential attachment model of Barabási and Albert (1999), the 
random graph model of Erdos and Rényi (1960), or the small-world model of Watts 
and Strogatz (1998) all assume that geographic space has little to no relevance. Casual 
empiricism however suggests that space matters in important ways. Urban dwellers for 
example rely much more for mutual support on local neighbors than on acquaintances 
in other cities (Gans, 1962; Mansury & Shin, 2015). This is in line with a Pew Internet 
study that shows face-to-face contact has remained the dominant means of 
communication even for core users of online social network sites (Hampton, Sessions, 
Her, & Rainie, 2009). 
11 
 
Regional science is well-positioned to contribute to the literature on spatial 
networks as it has long recognized the critical role of physical geography in 
socioeconomic relationships. Cities in particular are manifestations of the dense 
interactions among residents living in close proximity to one another (Batty, 2013). 
Situating entities in space therefore strengthens the empirical basis and sheds new 
light on the nature of social networks. Spatially embedded models of networks indeed 
show the non-trivial impact of geography on network properties (Kosmidis, Havlin, & 
Bunde, 2008) as well as the importance of geographically concentrated networks 
(Browning, Dietz, & Feinberg, 2004). Empirical studies by both sociologists 
(McPherson, Smith-Lovin, & Cook, 2001; Wellman, 1996) as well as regional 
scientists (Cassi & Plunket, 2014; Fritsch & Kauffeld-Monz, 2010; Ioannides & Topa, 
2010) confirm the critical role of space, with social distance (e.g. the frequency of 
contact or the strength of relations) being heavily influenced by geographic 
propinquity. 
An important feature of contemporary social networks is churning brought 
about by social actors that continuously re-evaluate and alter their links. For example, 
it has been observed that more than half of social networking users have “unfriended” 
contacts in their networks, thereby removing members from their inner circle 
(Madden, 2012). This is also evident in online dating networks, where members who 
start out as strangers eventually enter a committed relationship (Smith & Duggan, 
2013). Many of these relationships end up being dissolved in the end. From a broader 
perspective, it has been argued that rising crime and disorder in cities are in large part 
brought about by the decay of local social ties (Sampson, 2004). Others have gone so 
12 
 
far as to suggest that the continuous decline in social connections has impoverished 
our lives and communities (Putnam, 2001). 
The churning dynamics are consistent with the view of networks as complex, 
dynamic, self-organizing systems embedded in space and governed by individual 
interactions. The present study therefore examines the evolution of spatial networks 
using agent-based models (ABMs), a class of complex-systems approximations where 
the abstraction maintains “a close association with real-world agents of interest” 
(Miller & Page, 2009). Regional scientists in particular have used simulation 
techniques to explore the bottom-up processes that drive the emergence of spatial 
patterns (Mansury & Gulyás, 2007; Torrens, 2007, 2010; Xie, Batty, & Zhao, 2007). 
In essence, an ABM is a set of agent-specific rule-based algorithms that shape the 
outcomes for individual agents. The algorithms allow agents to interact directly with 
one another, and in so doing link individual changes to systems dynamics. ABMs’ 
main advantage for modeling social networks is that they utilize a more realistic 
bottom-up approach that gives agency to social actors to collectively generate systems 
behavior.  
This paper addresses the following research questions. The first focuses on 
power law distributions building upon the observation that, while many real-life social 
networks have highly skewed distributions, many others do not. For example while 
citation networks and actor networks are both networks of collaboration, the former 
tend to follow the power law (Barabasi, 2000) even though the latter exhibit small-
world properties (Watts, 1999). The present study queries when a scale-free 
distribution is sustainable in a spatial network where agents are allowed to re-evaluate 
13 
 
their ties, and conversely under what conditions churning destabilizes the power law. 
Second, it is argued that social capital is a socioeconomic resource affected by the 
decisions to maintain, dissolve, and form new connections. A natural question then is 
how the unequal distribution of social capital is affected by the differential strength of 
network churn factors, and under what conditions the disparities can be mitigated.  
 This paper contributes to the literature by adding space into an ABM of 
preferential attachment (PA) with network churn, and by showing how such 
refinements result in previously unknown emergent properties and network behavior. 
In contrast to the sole focus on connectivity in the original preferential attachment 
model, here I consider the refinement proposed by Bianconi and Barabási (2001) 
allowing agent fitness to also influence the probability to form a new connection. 
Space then matters in the specification where proximity affects agent fitness, which 
intertwines with churning to embed the agent selection mechanism in an evolving 
spatial landscape. The model is further enriched by the distinction between two types 
of agents–namely introverts and extroverts–in a community where introverts are 
limited in their spatial scope of interactions while extroverts are free to maintain long 
range connections. This is in line with the theory and empirics that reveal that 
psychological traits dictate social interactions (Cuperman & Ickes, 2009) and 
economic outcomes (Tversky & Kahneman, 1981). 
The next section highlights three key concepts, namely PA, power laws, and 
homophily. Section 3 elaborates on the network model, and section 4 on the 
implementation of the ABM in simulation analysis. The penultimate section 5 
analyzes the simulation results focusing on the degree distributions of agents under 
14 
 
different parameter settings, as well as on the individual and spatial inequalities in 
social capital. The concluding section discusses key policy implications and 
recommendations. 
 
2.2. Theoretical Framework 
2.2.1. Preferential attachment (PA) 
The original PA model proposed by Barabási and Albert (1999) ushered in a boom in 
network studies in the late 1990s. While the model was formalized in their seminal 
paper, the idea that the rate in which a particular agent acquires links is proportional to 
the links that the agent already has – thus preferential attachment – had been around 
long before. Notably, de Solla Price (1965) discussed the concept of cumulative 
advantage in the context of scientific citations, while sociologists referred to the same 
phenomena as the Matthew effect (Merton, 1968), named after the Gospel of Matthew: 
“For whoever has will be given more, and they will have an abundance.” (Mt, 25:29). 
This “rich get richer” mechanism has been observed in many types of networks, 
ranging from the internet and power grids to empirical social networks (see Barabasi, 
2000 for examples). 
The model assumes a network that starts with a small number of nodes that are 
randomly connected. In every succeeding step a new node is added, linking itself to the 
incumbent nodes already in the network. PA is incorporated by the simple rule that 
incoming nodes prefer incumbents that are already well connected, thus awarding an 
incumbent with significant connectivity a higher probability of attracting newcomers. 
The PA model serves as a good starting point for analyses of social networks due to two 
15 
 
key characteristics. First, it features a network that expands continuously through agent 
and tie addition. Second, new agents connect to others already in the network through a 
process of selection, which favors more connected agents. The first characteristic is 
essential in modeling growing networks that are fundamentally different from static 
networks with a fixed number of nodes and edges (Erdos & Rényi, 1960; Watts & 
Strogatz, 1998). The second characteristic captures the fact that in many social networks, 
agents choose to associate with others that are already well-connected. In theoretical 
terms, selection is introduced by allowing the probability that a node attracts others to 
be proportional to its degree connectivity1, which is in contrast to random network 
models in which the probabilities of attachment are fixed in time.  
The PA model however abstracts away from certain important aspects of real 
networks. The lack of a spatial dimension in particular is one critical omission to 
which I devote section 2.3 below. The PA model also ignores network churn, unlike 
random graph or small-world models that allow for the “rewiring” of links. Thus once 
an agent is connected to an incumbent upon entry, it ceases to seek new connections 
and only passively receives links from subsequent newcomers. Ignoring churn 
however disregards the body of empirical literature confirming that social actors 
constantly reevaluate network ties based on individual decisions and preferences 
(Karnstedt et al., 2010; Sasovova, Mehra, Borgatti, & Schippers, 2010). Furthermore, 
without tie formation and detachment among incumbents, the PA model gives near 
absolute agency to older nodes,2 which is simply not the case in many settings. The 
                                                 
1 The degree of a node refers to the number of connections that the node has. 
2 This is simply because nodes that have been present for longer periods of time have more 
opportunities to form links with incoming nodes (see Adamic & Huberman, 2000 for a critique) 
16 
 
present study responds to the challenge of developing a spatial agent-based model to 
study the evolution of networks in a churning environment where relationships ebb 
and flow.  
 
2.2.2. Power laws 
The PA model was conceived to explain the World Wide Web in which a few 
influential websites have a very large number of links while the rest harbor only a few 
connections. This highly unequal distribution exhibits a power law, which refers to the 
linear relationship in the log-log plot of the degree distribution.3 Power law 
distributions are scale-free, a notion that is best understood vis-à-vis the scale-
dependent counterparts. Magnitudes such as the length of a town block, the height of a 
building and the number of bedrooms in a housing unit have a characteristic scale, 
which means the mean value is representative of the magnitude that one actually 
observes on the ground. By contrast, power-law distributed urban social networks lack 
characteristic scale, and this means that the average number of acquaintances is not a 
good predictor of the extent to which a city resident is connected. In stochastic terms, 
the degree distribution exhibits “fat tails,” which implies scale-independence, or scale-
free. 
Urban and regional scholars have long been familiar with regularities in the 
size distribution of cities (Zipf, 1949). More relevant for the study here is the presence 
of power law relationships in spatial networks, such as those for commuters (De 
Montis, Chessa, Campagna, Caschili, & Deplano, 2009). The essence of power law 
                                                 
3 The degree distribution of a network describes the relative frequencies of nodes with different 
degrees (Jackson, 2008). 
17 
 
distributions is captured by the existence of a few nodes with very large degrees, 
acting as “hubs.” Fat-tailed distributions therefore imply wide variations in the extent 
of social contacts–and thus the resources (Granovetter, 2005; Lin, 2001)–that a node 
can tap into. While most have modest connections, a few exert enormous influence by 
virtue of their hundreds or even thousands of contacts. The implied distribution of 
resources is therefore highly skewed.  
Understanding the sources of inequality in space is one of the central 
challenges of the science of regions. But before we can begin addressing spatial 
inequality we must first disentangle the processes that give rise to differential degree 
distributions. Under what conditions do networks sustain fat-tailed distributions, and 
under what conditions does the power law break down? It is important to note that 
many networks are scale dependent. In the small-world networks that Watts and 
Strogatz (1998) consider for example, the degree distribution follows a random 
distribution, and the authors give numerous examples of such networks. Thus while 
the power law distribution is observed for specific networks, there are many networks 
that exhibit characteristic scale with small deviations from the average connectivity.  
The present study addresses this and other questions, including whether the 
power law can be sustained in networks where the arrival of new nodes and links 
occurs at the same time as the elimination of ties among existing nodes. The evidence 
drawn from the ABM simulations hints at the potential tug of war between tie 
formation and tie dissolution. The power law seems to persist when dissolution 
dominates formation, but vanishes in the opposite case.  
 
18 
 
2.2.3. Homophily, propinquity, and social capital 
As elegant as the preferential attachment (PA) model may be, it leaves out a number 
of important variables. Social ties depend not only on connectivity but also on other 
social and economic indicators. In particular the model abstracts away from 
homophily, which refers to the tendency for people to associate with those sharing a 
wide range of similar attributes. Homophily is one of the most fundamental forces 
identified to date that have been known to shape social networks (McPherson et al., 
2001). Numerous studies of social relationships indicate the dominance of homophily 
in social interactions, ranging from ties of marriage (Kalmijn, 1998) or friendship 
(Aral, Muchnik, & Sundararajan, 2009; Verbrugge, 1983), to membership in voluntary 
associations (Cornwell & Dokshin, 2014) or appearing with others in a public space 
(Mayhew, McPherson, Rotolo, & Smith-Lovin, 1995). The PA model is in essence a 
model of heterophily, whereby nodes prefer to connect to others that are as different as 
possible (in the number of links) to themselves. Within the social networks literature, 
whether it is homophilous or heterophilous ties that predominate has long been 
debated. Developments in theory and empirics suggest that both are instrumental in 
explaining social interactions, with one dominating the other under different 
circumstances (Burt, 2005; Mehra, Kilduff, & Brass, 1998). 
An important source of homophily is propinquity, as we are more likely to 
sustain relationships with others that are geographically closer to us rather than with 
others farther away (McPherson et al., 2001). Zipf (1949) states that the importance of 
propinquity stems from the notion of effort, for it takes more energy and effort to 
maintain relationships with distant contacts. Many examples, such as studies of 
19 
 
neighborhoods (Campbell, 1990), residential proximity (Verbrugge, 1983), or 
immigrant enclaves (Wilson & Portes, 1980) show that more homophilous interactions 
take place when social actors are closer to each other.4 Locations thus not only 
determine physical factors, but also the common traits that forge neighborhoods. It is 
to account for homophily that I give each node a unique location in the network of 
heterophilous interactions. The introduction of space allows us to set the probability 
for tie creation to be inversely related to the relative distance between nodes. 
Spatial embeddedness changes the dynamics of tie formation as locations introduce an 
important new source of heterogeneity. In the preferential attachment model 
heterogeneity stems only from differences in connectivity. The introduction of space 
means that nodes with the same connectivity are not necessarily equally attractive, as 
it depends on where they are located in relation to the evaluating node. As I will show 
below, the introduction of space results in degree distributions that generally lack 
power law properties. While hubs with the highest number of connections are still the 
biggest draw, I do not expect hubs to be secluded in space as isolation would have 
prevented them from becoming a hub to begin with. 
Closely aligned with the notions of homophily and heterophily is the concept 
of social capital. It has been theorized that strong homophilous relationships promote 
higher levels of trust, reciprocity and enforcement of norms (Coleman, 1988). By 
contrast, weak heterophilous relationships matter when people need to “get ahead,” 
with individuals maintaining more such relationships gaining brokerage capacity or 
                                                 
4 Proximity and preferences come together in Schelling’s (1969) famous work where people sharing 
similar traits end up in the proximity of each other despite only mild preference for neighbors who are 
like them. 
20 
 
better access to non-redundant information (Burt, 1992, 2005; Granovetter, 1973). 
Social capital is thus a resource that one can tap into, but unlike physical or human 
capital, it is embedded in the network fabric that one is a part of (Portes, 1998). As I 
will show below, this conception allows us to examine the inequalities in social capital 
emerging from the differential propensities to form and dissolve ties.  
 
2.3. The model 
In the following presentation of the model I will use the term agents and nodes 
interchangeably. I consider a network comprised a set of nodes 𝑁 = {1,2, … , 𝑛} and a 
set of undirected links 𝐿 ⊆ 𝑁 × 𝑁. Multiple links between two given nodes as well as 
self-links are assumed to be absent. The network is initially conceived with two nodes 
that are connected to each other at 𝑡 = 0. In each subsequent period a new node joins 
the network, and must choose which of the preexisting nodes it will connect to. As in 
the preferential attachment model, a preexisting node is chosen with probability that is 
proportional to its degree connectivity (Vega-Redondo, 2007) as well as its individual 
fitness. The probabilistic approach is used to represent sources of variability in 
network formation that are too complex to capture mechanistically. In real networks 
ties are not always formed based on connections because of either bounded rationality 
or factors other than connectivity. 
The novel attachment mechanism here stems from the preferential bias for 
preexisting nodes with higher fitness, captured through the fitness parameter 𝜂, which 
represents individual-specific characteristics such as social affinity, wealth, or human 
capital (Bianconi & Barabási, 2001). Formally, the probability Π𝑖,𝑗 that a new node i 
21 
 
will connect to an incumbent node j depends on both the connectivity 𝑘𝑗 and the 
fitness parameter 𝜂𝑗 such that: 
𝜂𝑗𝑘𝑗
Π𝑖,𝑗 =                       (1). ∑𝑗 𝜂𝑗𝑘𝑗
Equation (1) has a straightforward interpretation. Other things equal, a higher 
connectivity raises the likelihood of being linked to an incoming node, but a lower 
fitness would lower this probability. A node searching for a new contact evaluates an 
existing node j’s fitness based on the combination of two factors, namely human 
capital 𝜆𝑗 and the distance between the two nodes 𝑑𝑖𝑗, such that: 
𝜆𝑗
𝜂𝑗 = 2                             (2). 𝑑𝑖𝑗
The level of human capital 𝜆𝑗 is drawn from a uniform random distribution with 
support [0, 1]. Note that equation (2) introduces space by assigning every agent a 
unique location in the network so that no two nodes can occupy the same area. 
 A central feature of the tie dynamics in the model is the formation and 
dissolution of ties among incumbent nodes. The trajectory is cumulatively driven by a 
combination of tie creation and tie deletion, leading to network churning (Koka, 
Madhavan, & Prescott, 2006). For tie formation I introduce the notion of maximum 
visibility reach that distinguishes extroverts from introverts. The former are capable of 
connecting to other agents anywhere within the spatial grid, while the latter are agents 
that are unable to connect to other agents outside of their visibility range (v). Formally, 
the probability Π𝐹𝑒,𝑗 for an extrovert e to connect to any other node j that is not yet a 
link neighbor is calculated as:  
22 
 
𝜂𝑗𝑘𝑗
Π𝐹𝑒,𝑗 =  𝜃𝑓              (3), ∑𝑗∉𝑆 𝜂 𝑘𝑒 𝑗 𝑗
where 𝜃𝑓 is the tie formation parameter drawn from the interval [0, 1], and 𝑆𝑒 is the set 
of node e’s link neighbors. By contrast, the tie formation probability for introverts i is 
defined as: 
 
𝜂𝑗𝑘𝑗
 𝜃𝑓            𝑤ℎ𝑒𝑛 𝑑∑ 𝑖𝑗
≤ 𝑣 
Π𝐹 = 𝑗∉𝑆𝑖,𝑑 ≤𝑣
𝜂𝑗𝑘𝑖𝑗 𝑗
𝑖,𝑗           (4) 
 
{          0                          𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
Equation (4) assigns a non-zero probability for an introvert i to connect to agents 
within i's visibility range, 𝑑𝑖𝑗 ≤ 𝑣, or zero otherwise. 
For tie decay, the probability that any node i deletes a tie with a current link 
neighbor is determined by the number of links currently maintained by that agent (𝑘𝑖) 
and the distance between the two nodes (𝑑𝑖𝑗). This is based on the notion that 
maintaining ties requires social effort (Lin, 2001), and hence other things equal, nodes 
with a larger number of links lose connectivity at a higher rate compared to nodes with 
only a few links. At the same time, links between nodes farther away have a higher 
chance to dissolve. Formally, I introduce a tie decay parameter that controls how 
ephemeral ties are within the network. The equation that governs the probability for 
node i to dissolve its link to neighbor j is as follows: 
2
 𝑑𝑖𝑗 𝜃 𝑘            𝑤ℎ𝑒𝑛 𝜃 𝑘 𝑑2 < ∑𝑑2 𝑑 𝑖 2 𝑑 𝑖 𝑖𝑗 𝑖𝑗  
Π𝐷
∑𝑗∈𝑆 𝑑𝑖 𝑖𝑗
𝑖,𝑗 = 𝑗∈𝑆𝑖          (5), 
 
 
{            1                    𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒                     
23 
 
where 𝜃𝑑 is the tie dissolution parameter drawn from the interval [0, 1].
5  
 To capture individual network resources, I utilize the concept of social capital, 
defined as the instrumental resources that are available to actors through the social ties 
they maintain (Lin, 2001). Formally, I utilize a simple definition of an agent’s 
individual social capital 𝑆𝐶𝑖 as the sum of the human capital (Coleman, 1988; Moretti, 
2004) of an agent’s first-degree link neighbors such that: 
𝑆𝐶𝑖 =  ∑ 𝜆𝑗            (6) , 
𝑗 ∈ 𝑆𝑖
and naturally I define aggregate social capital in the network to be ∑𝑖 𝑆𝐶𝑖. It is 
important to note that social capital – both at the individual and aggregate levels – 
indirectly feeds back into the model in the form of influencing individual decisions of 
tie formation and dissolution. From the setup, the individual social capital measure 
depends on both the number of links (which increases the number of link neighbors 
𝑗 that are included in set 𝑆𝑖) as well as the quality of these links (the human capital that 
a link neighbor is endowed with). Agents with more links will, ceteris paribus, have 
more social capital, which feeds back to them by 1) increasing their chances of being 
linked by other agents, while at the same time 2) increasing their likelihood of losing 
links to others due to higher values of 𝑘𝑖 which increases Π
𝐷
𝑖,𝑗 (see equation (5)). On 
the other hand, agents with higher quality links, ceteris paribus, will be affected by an 
increased chance of losing links, for their link neighbors will usually harbor a larger 
number of connections which would increase the chance of tie dissolution. Aggregate 
social capital also feeds back to individual agents, with higher levels resulting in lower 
                                                 
5 Π𝐷𝑖,𝑗 can become greater than 1 when 𝜃 𝑘
2 2
𝑑 𝑖𝑑𝑖𝑗 > ∑𝑗∈𝑆 𝑑𝑖𝑗, in which case the probability defaults to 1. 𝑖
24 
 
tie formation due to network saturation (i.e. less possible links to be formed) as well as 
higher tie dissolution (i.e. more possible links to be severed).   
Note that there are two mechanisms that act against each other to determine 
levels of social capital; namely 1) the mechanism of preferential attachment based on 
degree, human capital, and distance, and 2) the mechanism of preferential detachment 
based on degree and distance. I will assess how individual and aggregate levels of 
social capital – as well as inequality in social capital – evolve for varying values of the 
tie formation (𝜃𝑓) and tie decay (𝜃𝑑) parameters.  
 
2.4. Algorithm implementation 
I use the standard protocol of “Overview, Design concepts, and Details” (ODD, see 
Railsback & Grimm, 2011) to describe the agent-based algorithm. ODD provides a 
standardized way of presenting the ABM starting with three elements which overview 
the model and how it is designed, followed by specific design concepts that illustrate 
the ABM’s key characteristics, and ending with three elements that describe the 
initialization and implementation details. I highlight the main purpose of the model, as 
well as the entities, variables, scale, and processes below (see Annex D for specific 
details of the model according to the ODD protocol). 
Purpose 
The model seeks to examine how churning affects network resources and system 
characteristics in a spatial setting. Specifically, I consider how the degree distribution 
of agents differs based on varying strengths of tie formation and dissolution, as well as 
how these factors influence individual and spatial inequalities.  
25 
 
Entities, state variables, and scales 
The model has three types of entities: individual agents, their network connections, 
and square patches of residential locations. The network connections are assumed to 
be binary (i.e. either a link exists or it does not), non-redundant (i.e. no more than one 
link can exist between any two agents), undirected (i.e. the direction of the link is 
indistinguishable), and containing no self-loops (i.e. no links to oneself). Each patch is 
in the state of either empty or occupied by maximum one agent, and make up a square 
grid landscape of 100 × 100 (L=100).6 Since opposite edges are disconnected, the 
landscape represents a two-dimensional Euclidean surface. This is to facilitate the 
analysis of spatial inequalities between agents as well as specific core-periphery 
structures that are better represented in two-dimensional space. The patches have no 
state variables other than their relative position within the grid, which dictates the 
distances to other patches, and thus the distances between agents. Agents seek to 
expand the extent of their connections in the endeavor to increase their social capital 
while being constrained by the effort required to maintain their ties. Each individual 
agent is defined by both static and dynamic state variables. The static state variables 
comprise the agent type (whether extrovert or introvert), level of human capital, the 
propensity to form ties among incumbents (the tie formation parameter) and to 
dissolve existing ties (the tie dissolution parameter), and spatial coordinates, which are 
all predetermined at agent birth. The dynamic state variables are the individual degree 
connectivity, relative fitness, and social capital. The global variables are the degree 
distribution, aggregate social capital, and measure of inequality (the Gini coefficient). 
                                                 
6 Sensitivity analysis is run for different world sizes. See Appendix C. 
26 
 
 
Figure 2.1. ABM flow chart 
  
27 
 
The model’s temporal scale is such that each simulation is run until the population of 
agents (N) reaches 1,000, which results in a maximum link count of 499,500. 
Process overview and scheduling 
There are three processes in the model: 1) the linking of a newcomer to an incumbent 
member of the network, and the 2) formation and 3) dissolution of ties among 
incumbents. Regarding the first process, in each period a new agent enters the system 
and is assigned a random location on the spatial landscape, and with equal 
probabilities is deemed either an introvert or extrovert. Introverts are assigned a 
visibility range parameter value (v) of 15.7 After each incumbent node is assigned a 
value of Π𝑖,𝑗 (see equation (1)), the newcomer first chooses an incumbent at random 
and creates a link with probability Π𝑖,𝑗. In the event that the newcomer fails to create a 
link, it would continue to draw randomly from the full set of incumbents until a 
connection is formed.8 Tie formation between incumbents and tie dissolution are 
implemented using an exhaustive search method where each incumbent evaluates the 
probabilities for tie formation (see equations (3) and (4)) and tie decay (see equation 
(5)) for all candidate agents (i.e. all agents that are not currently linked for tie 
formation, and all agents that are linked for tie decay) in every time step, forming and 
deleting links accordingly. The sequence is such that first a newcomer is added to the 
network and linked, followed by tie formation and afterwards tie deletion processes 
for the incumbents.9 One time period ends when all three processes have been 
completed for all agents. 
                                                 
7 As with world size, sensitivity analysis is run for differing visibility ranges. See Appendix C. 
8 In other words, the re-selection is done with replacement. 
9 Thus it is possible to delete a link with an incumbent for which an agent has created a link in the 
same time period. 
28 
 
I conduct a parameter sweep on the tie formation parameter 𝜃𝑓 and the tie 
dissolution parameter 𝜃𝑑, each of which is allowed to vary between 0 and 1. The 
analyses employ the mean of five simulation runs for each parameter setting with a 
unique random number seed for every configuration.10 All simulations are 
implemented in NetLogo (Wilensky, 1999), a multi-agent programmable modeling 
environment popularly used worldwide. I examine the implications of different 
parameter settings for degree connectivity, network churning, and the presence or 
absence of power-law distributions. I also examine the distributional impact (measured 
by the Gini coefficient) of these settings, calculated by assessing the level of social 
capital obtained by the agents through their network ties. Finally, I consider the spatial 
distribution of agents with high or low levels of social capital to examine the role of 
network churn in a spatial context. 
2.5. Simulation results 
2.5.1. Degree distributions and the power law 
In seeking to establish the conditions in which the power law prevails, I first verify the 
agent-based algorithm by confirming that the power law holds when 𝜃𝑓 = 𝜃𝑑 = 0, i.e., 
for the original preferential model of Barabási and Albert (1999) in which incumbents 
are not allowed to form or dissolve ties with one another. Figure 2.2a shows that 
indeed the power law prevails in the long-run distribution of degree connectivity with  
                                                 
10 I choose five simulations as the optimal level that balances computational burden with potential 
variability, as the results suggest that the overall variability of network characteristics are minimal 
across different random seeds. This is due to the fact that each agent is subject to a stochastic process 
of initial linkage, link formation and decay with every other agent in the network at every time period. 
The enormous amount of stochasticity involved with constructing a final network of 1,000 agents thus 
renders the resulting network as a whole robust to large variations in aggregate characteristics. 
29 
 
 
Figure 2.2a-h. Degree distributions for select parameter settings. The X-axis corresponds 
to the degree of nodes, while the Y-axis to the number of nodes with such degrees, in log–
log scale. 
30 
 
this parameter setting, with a coefficient of determination R-squared of 0.9 and a 
power exponent of 1.9 which, as it turns out, roughly coincides with the lower bound 
of the range for many real networks (Barabási & Albert, 1999). 
The scale-free property however vanishes as soon as a small but positive 
propensity to form new ties is introduced among incumbent agents while holding 𝜃𝑑 =
0. Figures 2.2b-c reveal that for networks where links never decay, a slight increase in 
𝜃𝑓 results in the complete breakdown of the power law. The shifting of the points on 
the plot to the right as 𝜃𝑓 increases shows how rising propensity to form new ties 
among incumbents benefits the relatively disadvantaged (in the number of links) by 
awarding them with more connections.  
The power law returns when tie formation and decay are both present, but only 
when they are either roughly equal in strength or when the rate of tie dissolution is 
greater than that for tie formation. Figure 2.2d shows the resurgence of the scale-free 
property when 𝜃𝑓 = 0.4 and 𝜃𝑑 = 0.5, accompanied by a highly unequal distribution 
of social capital as evident from the Gini coefficient that is at least twice that for cases 
where churning is absent (i.e., when 𝜃𝑓 = 𝜃𝑑 = 0). This suggests that power law 
distributions that seem ostensibly similar could have vastly different implications for 
social equity. Churning characterizes most real-world networks, and the model 
predicts a much higher concentration of social capital at the top than the canonical PA 
model when both tie formation and decay are present. However, the power law breaks 
down again when incumbents form ties at a rate that far exceeds the rate at which ties 
are dissolved. Figures 2e-h show that the degree distribution approaches a bell-shaped 
31 
 
curve as the ratio 𝜃𝑓/𝜃𝑑 increases. At the same time social capital becomes more 
evenly distributed as the Gini coefficient declines.  
The results also reveal that the power law prevails only in networks with very 
low density. 11 Under the original PA model, a network grown starting from two 
agents up to 1,000 agents in increments of 1 inevitably results in a link count of 999, 
which corresponds to a network density of 0.2%. The power-law network depicted in 
Figure 2.2d has a link count of 236 and a much lower density of 0.047%. As I have 
shown, when tie formation dominates tie dissolution– resulting in a larger number of 
links and higher density – the power law breaks down.  
 Figure 2.3 depicts the fit of the power law curves under different combinations 
of the churn parameters 𝜃𝑓 (tie formation) and 𝜃𝑑 (tie dissolution). The results suggest 
that in general, higher values of 𝜃𝑑 relative to 𝜃𝑓 result in distributions that more 
closely follow power laws. The upper left portion of the figure is dominated by 
distributions with R-squared greater than 0.8. There is however substantial variation in 
the upper left extremes due to the sparseness of the network in these extremes as well 
as due to the stochastic evolution of the degree distributions. The figure confirms the 
results from Figure 2.2, with the power law fit decreasing substantially as tie 
formation becomes more prominent. In addition, it can be seen that introducing tie 
formation has a much larger effect in breaking the power law for lower values of 𝜃𝑑. 
For example, when 𝜃𝑑 = 0.6, an increase of 𝜃𝑓 from 0 to 0.5 still results in a degree 
distribution with R-squared of roughly 0.9. However, when 𝜃𝑑 = 0.1, similar R- 
                                                 
11 Network density is simply the number of links K divided by the number of possible links ?̂?, where 
?̂? = 𝑛(𝑛 − 1)/2  and n is the number of nodes in an un-directed network. 
32 
 
1.0
R sqr = 0.2
R sqr = 0.4 
R sqr = 0.6 
0.8 R sqr = 0.8 
R sqr = 1.0 
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
 (Tie formation)
f
Figure 2.3. Relationship between power law fit (R-squared) and network churn 
parameters. Darker colors represent higher R-squared. Values are the mean over 
five simulation runs each with a different random seed for every parameter 
configuration. 
 
squared values can be obtained only when 𝜃𝑓 is less than 0.1.  
 The ability of ABMs to capture system dynamics is advantageous in studying 
the evolution of networks. Under two different parameter configurations that both 
result in power law distributions, I show how the same system properties may emerge 
albeit with different micro-foundations. Figure 2.4 depicts network formation for 
when 1) 𝜃𝑓 = 𝜃𝑑 = 0 (Figures 2.4a-d), and when 2) 𝜃𝑓 = 0.4, 𝜃𝑑 = 0.5 (Figures 2.4e-
h), both of which exhibit power law properties (see Figure 2.2). In the absence of 
churning the evolution of degree connectivity is path dependent, much like in the 
33 
 
d (Tie dissolution)
4.a. 4.e.
 f = d = 0, N = 50 , K = 49 f = 0.4,d = 0.5, N = 50,K = 7
4.b. 4.f.
f =d = 0, N = 200, K =199 f = 0.4,d = 0.5, N = 200, K = 46
4.c. 4.g.
f =d = 0, N = 500, K = 499 f = 0.4,d = 0.5, N = 500,K =106
4.d. 4.h.
f =d = 0, N =1000, K = 999 f = 0.4,d = 0.5, N =1000,K = 237
 
Figure 2.4. Network formation dynamics under two different parameter configurations. 
4a-d is where 𝜽𝒇 = 𝜽𝒅 = 𝟎 while 4e-h is where 𝜽𝒇 = 𝟎. 𝟒 and 𝜽𝒅 = 𝟎. 𝟓. Black and white 
nodes are extroverts and introverts, respectively. Nodes are sized proportionately to 
their degree. 
34 
 
canonical PA model that awards higher degrees to incumbents that have been in the 
network longer. Thus the nodes in Figure 4a that initially command relatively higher 
connectivity in the early stages are able to maintain their advantage throughout. 
However, when network churn is present, those that initially had greater connectivity 
(Figure 4.e) quickly lose their advantage, being replaced by other nodes that were able 
to get ahead at different times. Thus while overall the two systems both converge to 
power law distributions, the trajectories are vastly different. This suggests that the 
original PA model is a special case within the general class of models for which the 
power law prevails. Furthermore, the results reveal that distinct bottom-up dynamics 
likely lead to differential access to social capital (see Figure 2.4) even for systems that 
exhibit similar macro-properties. 
 
2.5.2. Aggregate social capital 
Figure 2.5 and Appendix A display long-run levels of aggregate social capital and 
total link count K for different values of 𝜃𝑓 and 𝜃𝑑. As Figure 2.5 shows, aggregate 
social capital falls monotonically with higher propensity (𝜃𝑑) for incumbents to 
dissolve ties while holding tie formation (𝜃𝑓) constant. Such results are intuitively 
appealing, in that we should expect to see lower levels of trust and therefore social 
capital in a fluid environment where ties are easily broken. Furthermore, the marginal 
impact of higher values of 𝜃𝑑 tends to diminish for larger values of 𝜃𝑓 (see also 
Appendix A, table (a)). For example, an increase in 𝜃𝑑 from 0.5 to 0.6 while holding 
𝜃𝑓 fixed at 0.5 results in a 30 percent decline in aggregate social capital, but the same 
increase in 𝜃𝑑 while holding 𝜃𝑓 fixed at 1.0 results cuts the percentage decline to 20 
35 
 
6
5
d = 0
d = 0.1
4 d = 0.2
d = 0.3
d = 0.4
3
d = 0.5
d = 0.6
2
d = 0.7
d = 0.8
1 d = 0.9
d = 1
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
f  (Tie formation)  
Figure 2.5. Tie-formation (𝜽𝒇), decay (𝜽𝒅), and aggregate social capital. Values 
are the mean over five simulation runs each with a different random seed for 
every parameter configuration. 
 
percent. This suggests that, other things equal, greater inclination to form new ties 
increases the resiliency of social capital to forces that dissolve existing ties. 
Conversely, Figure 2.5 also shows that a higher inclination (𝜃𝑓) to connect to 
other incumbents is the rising tide that lifts aggregate social capital across different 
values of 𝜃𝑑. This can be traced to aggregate social capital being proportional to the 
total number of links in the network, which increases as 𝜃𝑓 rises. Less obvious is the 
finding that the marginal effect of tie formation diminishes at higher 𝜃𝑓 values. For 
example, a rise in 𝜃𝑓 from 0.4 to 0.5 holding 𝜃𝑑 constant at 0.5 brings about a 35 
percent increase in total social capital (see Appendix A), but a rise in 𝜃𝑓 from 0.9 to 
1.0 again at 𝜃𝑑 = 0.5 results in a much lower increase of 14 percent. In hindsight this is 
internally consistent with the logic of the model. Everybody is connected to almost 
36 
 
SCi  (Aggregate social capital, logarithm)
everybody else when 𝜃𝑓 is already near maximum, and so a further increase in 𝜃𝑓 will 
only have a limited effect on total social capital. 
Unexpectedly however, the marginal impact of tie formation accelerates for 
larger 𝜃𝑑’s. Using part of the above example, an increase in 𝜃𝑓 from 0.4 to 0.5 at 𝜃𝑑 = 
0.5 brings about a 35 percent increase in total social capital, but the same increase at 
𝜃𝑑 = 1.0 triples the increase to over 107 percent. It appears that while rapid decay 
(high 𝜃𝑑) foments a sparse network, it is precisely this limited connectivity that allows 
a higher rate of tie formation to exert greater influence on aggregate social capital. 
This can be explained by the higher chances to connect to agents endowed with higher 
human capital when overall connectivity is low. 
An important emergent outcome is the non-linear relationship between 
network churn and aggregate social capital. Figure 2.5 reveals a drastic decrease in 
social capital by a factor of 58 and a decrease in the number of connections by a factor 
of 62 (see also Appendix A) when 𝜃𝑑 is raised from 0.0 to 0.1. This is in contrast to 
changes at higher 𝜃𝑑 values where on average the decrease is less than two-fold for 
every 0.1 increment in 𝜃𝑑. These results suggest a phase transition – defined as a 
significant change of state when parameter values cross a certain threshold (Solé, 
Manrubia, Luque, Delgado, & Bascompte, 1996) – occurring in the vicinity of 𝜃𝑑 = 0, 
where the qualitative behavior of the system undergoes a significant alteration. 
Considered a signature of complex networks (Castellano, Marsili, & Vespignani, 
2000; Holme & Newman, 2006), in this case the phase transition is the change from a 
very sparse network with an average network density of 0.12% for non-zero values of 
𝜃𝑑 to a relatively dense network with an average density of 29.4% when 𝜃𝑑 = 0 . 
37 
 
The simulation results thus highlight the important role of tie dissolution when 
analyzing different types of networks. The findings suggest that networks in which ties 
are permanent are distinct from those in which ties are transient. Networks with 
permanent ties have significantly greater connectivity as the high density suggests, 
which results in higher levels of aggregate social capital. The introduction of even a 
small likelihood for ties to dissolve however shifts the system into one in which 
connectivity is much lower, and this transition occurs suddenly rather than gradually. 
The importance of such a distinction has been empirically observed. For example, 
Wilson and Portes (1980) document the experiences of immigrant minorities 
integrating into the US labor market, with findings suggesting that immigrant groups 
in enclaves characterized by more permanent ethnic and cultural ties frequently 
perform better economically than other minorities that are more fragmented spatially 
(and thus part of more transient social networks). The phase transition in this model 
appears in line with such evidence. 
 
2.5.3. Inequality 
Figure 2.6 and Appendix B show the implications of network churn for the 
distribution of social capital measured by the Gini coefficient. Figure 2.6a reveals that 
tie dissolution and the Gini coefficient are in general positively correlated, suggesting 
that a higher rate of tie decay amplifies inequality. This is an unexpected, emergent 
outcome because ties dissolve more rapidly for highly-connected agents (see equation 
(5)). Since agents with higher human capital also maintain more links, tie decay is 
expected to equalize the distribution of social capital. But while indeed higher rate of 
38 
 
6.a.
1.0
d = 1
0.8 d = 0.9
d = 0.8
d = 0.7
0.6 d = 0.6
d = 0.5
d = 0.4
0.4
d = 0.3
d = 0.2
d = 0.1
0.2
d = 0
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
f  (Tie formation)  
1.0
6.b.
0.8
f  = 0
f  = 0.1
f  = 0.2
0.6 f  = 0.3
f  = 0.4
f  = 0.5
0.4 f  = 0.6
f  = 0.7
f  = 0.8
0.2 f  = 0.9
f  = 1
0.0
d = 0 d = 0.00125 d = 0.0025 d = 0.005 d = 0.01 d = 0.015 d = 0.02
 
Figure 2.5a-b. Relationships between tie-formation (𝜽𝒇), decay (𝜽𝒅) and the Gini 
coefficient. For 2.6b, bars for each 𝜽𝒅 panel are ordered from the left in 
increasing levels of 𝜽𝒇. Values are the mean over five simulation runs each with a 
different random seed for every parameter configuration. 
39 
 
Gini coefficient Gini coefficient
tie decay causes high human capital agents to lose links faster, the preferential 
attachment (PA) mechanism ensures that the majority of these links are to lower 
human capital agents. As it turns out, agents with lower human capital lose ground in 
relative terms, and inequality rises as a result.  
 On the other hand, a higher rate of tie formation holding 𝜃𝑑 constant generally 
reduces inequality, unless ties are relatively permanent. Here it is important to 
recognize the two opposing forces at work. The first is the mechanism of preferential 
attachment (PA) initiated by the agent itself (or active PA). For an agent endowed with 
relatively low human capital, a new tie to a high human capital agent increases the 
former’s social capital more than the latter’s, and active PA thus tends to reduce 
inequality. The second is through being preferentially attached to (or passive PA) by 
other agents within the network. Passive PA favors the well-endowed, and thus tends 
to accentuate the gap between the highly connected and the relatively isolated. It turns 
out that which one dominates depends on the magnitude of the decay parameter 𝜃𝑑. 
When ties are permanent (𝜃𝑑 = 0), the effect of increasing 𝜃𝑓 follows an inverted U 
pattern where the Gini increases at first and then declines after a certain threshold is 
surpassed. Thus passive PA is the dominant force initially for lower values of 𝜃𝑓. This 
is followed by reduced inequality as the network nears saturation (i.e. full 
connectivity) when 𝜃𝑓 is comparatively large and active PA dominates. When some 
ties are transient however, equalization through the active mechanism begins to take 
full effect and higher 𝜃𝑓’s lower inequality across the board.  
The simulation results highlight the phase transition near 𝜃𝑑 = 0, marking the 
changing nature of the impact of incumbent tie formation. To shed light on the 
40 
 
transition, I run simulations with very small values of 𝜃𝑑 in the neighborhood of 𝜃𝑑 ≈
0. Figure 2.6b shows that for sufficiently low, non-zero rates of tie decay within the 
range 0 < 𝜃𝑑 ≤ 0.01 (panels 2 to 5), the relationship between 𝜃𝑓 and inequality 
roughly follows a U-shaped curve. In this regime, an increase in 𝜃𝑓 decreases 
inequality to a certain minimum, after which a further increase in 𝜃𝑓 does the opposite 
and actually increases inequality. As 𝜃𝑑 approaches 0, this pattern gradually reverts to 
the shape that characterizes that for 𝜃𝑑 = 0. The relationship between tie formation 
and inequality is thus highly nonlinear depending on the tie decay parameter. The 
implication is that when ties are relatively permanent, any policy aimed to promote 
connectivity could inadvertently privilege the connected even more. 
In addition to overall inequality captured by the Gini coefficient, I also 
examine the gap between higher and lower human capital agents to shed light on how 
differential access to human capital affects the trajectories for social capital.12 Figure 
2.7 depicts the evolution of agent inequality based on human capital levels for four 
different parameter settings, each embodying distinct trajectories. Here, the difference 
in social capital between high (75th percentile or above) and low (25th percentile or 
below) human capital agents is measured relative to the latter.13 Comparing the case 
for 𝜃𝑓 = 𝜃𝑑 = 0 and that for 𝜃𝑓 = 0.4, 𝜃𝑑 = 0.5, it can be seen that the two settings 
converge to similar values of inequality in the long run, yet the trajectory for the  
                                                 
12 Dynamic analyses of the Gini coefficient (not included here) suggest that the Gini is relatively stable 
for all parameter settings across time, after the network has evolved to include a sufficient number of 
agents for reliable Gini calculation. 
13 Specifically, inequality is measured at each time point as the difference in average social capital 
between high (75th percentile and above) and low (25th percentile and below) human capital agents, 
divided by the average social capital of the low human capital agents. 
41 
 
3.0
f = 0.4, d = 0.5
f = d = 0
f = 0.8, d = 0.01
2.5
f = 0.4, d = 0
2.0
1.5
1.0
0.5
0.0
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
Time (ticks)  
Figure 2.7. Differences in social capital between high (75th percentile and above) 
and low (25th percentile and below) human capital agents across time, calculated 
by dividing the raw difference in average social capital between high and low 
human capital agents by the average social capital of low human capital agents. 
The figure omits values for which 𝒕 < 𝟓𝟎 due to excessive volatility in values for a 
small set of agents. Values are the mean over five simulation runs each with a 
different random seed for every parameter configuration. 
 
network with churning is much more volatile. Both configurations yield scale-free 
degree distributions, but the results reveal that power laws also produce a highly 
skewed distribution of social capital with the higher human capital agents 
commanding levels of social capital twice that of the lower human capital agents. The 
oscillations for the churning case (𝜃𝑓 = 0.4, 𝜃𝑑 = 0.5) are due to the constant 
rewiring of links, and it is churning that allows lower human capital agents to improve 
their social standing vis-à-vis the more privileged ones. 
 
42 
 
Social capital inequalities { (high H.C. - low H.C.) / low H.C. }
2.5.4. Aggregate social capital and agent inequality 
The non-monotonic relationship shown in Figure 2.8 between distribution (measured 
by the Gini) and overall connectivity (measured by total link count) merits an 
explanation.14 As discussed in Section 2.5.2, what matters for connectivity is the ratio 
𝜃𝑓/𝜃𝑑 since aggregate social capital is strictly increasing in 𝜃𝑓 while strictly 
decreasing in 𝜃𝑑. The trajectory for inequality on the other hand is dependent on both  
1.0 Phase 1 Phase 2 Phase 3
0.8
0.6
d =f = 0− 
0.4
0.2 d = 0 
d = 0.00125 
0.00125 d  <= 0.1 
d  > 0.01 
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Log (links)  
Figure 2.8. Relationship between link count and the Gini coefficient. Values are 
the mean over five simulation runs each with a different random seed for every 
parameter configuration. 
                                                 
14 I use link count instead of aggregate social capital to make the results more intuitive, recognizing that 
the two are qualitatively similar in their behavior across parameter values (see Appendix A). 
43 
 
Gini coefficient
the magnitude of 𝜃𝑑 and its relative strength. Hence starting from a sparse network, an 
increase in the ratio 𝜃𝑓/𝜃𝑑 generally accomplishes both higher connectivity and a 
more equal distribution. But if the ratio is increased further by lowering tie decay rate 
𝜃𝑑 from a level already close to zero, then a tradeoff ensues where stronger 
connectivity accompanies greater concentration of social capital. However, a reversal 
is observed at sufficiently high levels of connectivity with the return of the inverse 
relationship. 
 More specifically, there are two transitions occurring at total link count 𝑘 ≈
35,000 and at 𝑘 ≈ 126,000, respectively, resulting in three distinct phases. To 
explain these results recall that an agent acquires social capital through two opposing  
mechanisms, namely active and passive PA. Which one dominates here hinges on the 
level of network activity and network density, which in turn depend on the 
combination of 𝜃𝑓 and 𝜃𝑑. Phase I represents sparse networks with low network 
densities between 0 and 7% where the fall in of inequality is driven by three factors. 
First, low human capital agents maintain a very small number of – if any – links. An 
increase in 𝜃𝑓 therefore renders their connection gains through active PA 
comparatively larger. Second, the gains for higher human capital agents being 
passively linked to are not as large as the gains for lower human capital agents actively 
initiating connections. Finally, even in sparse networks higher human capital agents 
command more links and thus have a higher probability for their ties to decay as long 
as 𝜃𝑑 > 0. 
Phase II includes networks with moderate density between 7 and 25%. Here 
inequality increases with aggregate social capital because in moderately dense 
44 
 
networks, the gains for lower human capital agents is not as large as in sparse 
networks for a significant number of high benefit links have already been exploited. 
This is in contrast to higher human capital agents who benefit from both active and 
passive attachment mechanisms. Finally, Phase III represents very dense networks 
nearing link saturation for which inequality falls once again as connectivity increases 
further. This is due to most incumbents in this regime already having links with most 
others, and thus a new tie is simply becoming the equalizing vehicle that closes the 
gap between lower and higher human capital agents.  
 
2.5.5. Spatial inequalities 
Space matters in the model as agents’ relative positions within the spatial landscape 
influence the level of social capital that they are able to maintain. I turn now to the 
locational patterns of agents commanding higher than average social capital, and 
compare these with those for the rest. Since the amount of aggregate social capital 
differs across different parameter settings, for comparison purposes I calculate the 
relative amount of social capital ̿̿𝑆̿̿𝐶𝑖 an agent has as: 
 
𝑆𝐶
̿̿̿̿ 𝑖
/∑𝑖 𝑆𝐶𝑖
𝑆𝐶𝑖 =  , 1/𝑁
where N is the total number of agents within the network. Thus an agent with a value 
of ̿̿𝑆̿̿𝐶𝑖 greater than 1 maintains a greater amount of social capital than when mean 
social capital is uniformly distributed among all agents. I then divide agents into two 
categories, based on whether they have values of ̿̿𝑆̿̿𝐶𝑖 greater than or less than (or 
equal to) 1. 
45 
 
 Figure 2.9 shows the long-run spatial distribution of agents with 𝑆̿̿̿̿𝐶𝑖 > 1 for 
select parameter settings, where the nodes are color-coded based on whether they are 
extroverts or introverts.15 Recall that extroverts are unencumbered in connecting with 
others while introverts are constrained by their spatial reach.16 The original PA model 
with 𝜃𝑓 = 𝜃𝑑 = 0 is shown in Figure 9a, where high social capital agents are 
randomly distributed with a significant number with 𝑆̿̿̿̿𝐶𝑖 greater than 1 on the outer 
edges of the grid. By contrast, Figures 2.9b-d indicate that a higher number of links 
(K) accompanies the clustering of the high social capital agents towards the center, 
which in turn implies the lower social capital agents primarily occupying the fringe. 
Figures 2.9e-f show that high social capital agents again disperse spatially for even 
larger values of K. What is striking is that the spatial distribution of high social capital 
agents closest to the center (Figure 2.9d) occurs precisely at the parameter setting for 
which the Gini coefficient is the lowest (see Figure 2.6b). This implies that, overall, 
spatial inequality and social capital inequality are inversely related. Other parameter 
settings with higher Gini coefficients indeed produce a lesser concentration towards 
the center.  
 The results can be explained when I take into consideration the distinct 
trajectories of link formation as K increases. For sparse networks, the paucity of links 
lowers the probability of linking to agents further away, since the model prioritizes 
agents that are close by. This results in agents occupying locally central positions 
                                                 
15 While not shown, due to the random spatial distribution of all agents the blank areas in Figure 9 are 
occupied by low social capital agents. 
16 Analysis of the spatial distribution of agents based on human capital levels (not shown here) suggest 
that the spatial distribution of social capital is independent from that of human capital, with no clear 
emergent pattern. 
46 
 
9.a   f =d = 0,Gini = 0.412, K = 999 9.b   f = 0.9,d = 0.1,Gini = 0.259, K = 4234
9.c   f = 0.7,d = 0.015,Gini = 0.15, K = 22383 9.d   f = 0.8,d = 0.01,Gini = 0.103, K = 35123
9.e   f = 0.3,d = 0,Gini = 0.367, K =101317 9.f   f =1,d = 0,Gini = 0.263, K = 270756  
Figure 2.6a-f. Spatial distribution of agents with ̿̿𝑺̿̿𝑪𝒊 > 𝟏 for representative 
parameter settings. Black and grey nodes are extroverts and introverts 
respectively. 
47 
 
being able to maintain higher levels of social capital, long-distance ties increasingly 
become likely and as a result agents commanding globally central positions have 
higher chances of connecting to or being connected. Centrally positioned agents then 
end up accumulating more social capital than agents on the fringe who have fewer 
options. However as density increases even further, the network becomes saturated to 
the point where central agents lose their advantage, for agents on the fringe are also 
able to maintain links with distant nodes. 
  Another key observation is the distinct pattern in the composition of high 
social capital agents relative to their breed as the number of links increases. In Figures 
2.9a-d the distribution of extroverts versus introverts is relatively even suggesting that 
in low density networks, introverts ceteris paribus have equal opportunities to acquire 
high levels of social capital. However, Figures 2.9e-f reveal that as the number of links 
crosses a certain threshold, introverts rapidly vanish from the set of high social capital 
agents. Figure 2.10a reveals that indeed there is a sharp increase in the absolute 
difference in average social capital between extroverts and introverts near log 4.55 ≈
35,000, which is precisely the point at which social capital inequality enters the first 
phase transition (see Figure 2.8). While not shown here, this sharp increase in social 
capital inequality between extroverts and introverts persists even after controlling for 
the overall higher levels of aggregate social capital in denser networks. This suggests 
that the transition from phase 1 to 2 in Figure 2.8 not only entails a sharp increase in 
aggregate inequality, but high levels of inequality between extroverts and introverts as 
well. At a link count of roughly 100,000, extroverts command levels of social capital 
48 
 
more than twice that of the introverts, which is substantial considering that for lower 
values of K they were virtually identical.  
It turns out that disparities increase in network density because of the limited visibility  
250
200
150
100
50
0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Log (links)  
Figure 2.7. Differences in social capital between introverts and extroverts as a 
function of link count, measured as raw differences (extrovert s.c. – introvert 
s.c.). Values are the mean over five simulation runs each with a different random 
seed for every parameter configuration. 
 
reach (v) of introverts. In the beginning, both extroverts and introverts mainly connect 
with those close by since pairs that are in the vicinity have a higher chance to form 
connections. As the network crosses a certain density threshold however, both types 
exhaust their local possibilities, yet extroverts continue to connect with other agents 
farther away while introverts are no longer able to establish new ties due to their 
spatial constraints.   
49 
 
Social capital inequalities { (extrovert - introvert) }
Overall, the tradeoff between spatial and individual inequality suggests that 
policies geared towards reducing one could adversely affect the other. In addition, the 
model offers one plausible explanation for the existence of agglomeration economies 
within a social interactions setting. The essential sources of agglomeration are off-
market knowledge exchanges, input and output linkages, as well as labor market 
pooling (Marshall, 1920), which are all driven to some extent by social interactions 
(Ioannides 2013). The model suggests that spatial agglomeration benefits individuals 
in the core by allowing them to maintain more social connections than those in the 
periphery, which in turn may result in better economic outcomes.  
 
2.6. Conclusions  
I have examined the evolution of degree distributions under different parameter 
configurations to establish conditions in which the power law is sustained and the 
cases in which it breaks down. While the presentation has focused on a few select 
parameter settings, sensitivity analysis reveal that the results are robust across 
different configurations for world size L and introvert visibility v (see Appendix C). 
Generally, I find that networks in which ties are scarce and the rate of tie dissolution is 
relatively high exhibit power law degree distributions. I also find that networks with 
link dissolution are fundamentally different from those in which ties are relatively 
permanent, underscoring the importance of distinguishing between the two types of 
networks. As a classic example, Watts (1999) shows that many social networks with 
tie decay are characterized by “small-world” properties, and these result in a very 
different degree distribution from that of networks where ties are permanent, which 
50 
 
Barabási and Albert (1999) find to exhibit the power law. In fact, the results suggest 
that the power law distributions found for networks grown under PA are just a special 
case of a broader class of networks that take into consideration churning dynamics.  
Of particular interest is the concentration of social resources, which I find to be 
closely related to network density as it evolves in three distinct phases. Sparse 
networks exhibit a decrease in social capital inequality as network density increases, 
moderately dense networks exhibit increases in inequality with higher density, and 
very dense networks exhibit a decrease in inequality as the network reaches full 
saturation. In a complete (i.e. fully connected) network no inequality would exist and 
the level of aggregate social capital would be maximal. However, such networks are 
extremely rare in real life. The model suggests that due consideration for the relative 
strength of tie formation and dissolution are warranted when aiming to mitigate 
disparities over control of network resources. For example, when considering the 
spread of tacit information, encouraging more networking activity in an ethnic enclave 
where ties are relatively permanent and dense would have very different results than 
encouraging such activity among trade association members where ties are weaker and 
more transient. This relationship between network density and inequality among 
agents is further complicated when considering the spatial aspects of inequality. I find 
that spatial inequality is greater – in the form of higher social capital agents being 
distributed near the core – when inequality among agents overall is low. The results 
suggest that we should acknowledge the potential tradeoff between spatial inequality 
and individual inequality with respect to social resources.  
51 
 
 In this paper, I have conceptualized social capital as a network resource rather 
than an individual resource that an agent is endowed with. This view is motivated by 
distributional considerations. The poor in particular – due to resource deprivation – are 
more likely to utilize membership in community networks that exchange help in crises 
than to resort to individual coping mechanisms (Gans, 1962). The results highlight 
counterintuitively that simply encouraging more networking activity among 
individuals may not result in the disadvantaged benefiting from the increased intensity 
of social interactions.  
Although this study does not pursue it, the current specification has enabled a 
new kind of analysis for furthering the understanding of network dynamics. The model 
establishes the impact of connectivity on social capital through equation (6). The 
accumulation of social capital however likely results in higher betweenness centrality 
(Freeman, 1977), which directly triggers additional rounds of tie decay and formation. 
An examination of closed-loop feedback effects of this type is beyond the scope of the 
present study but should be attempted in the future as an extension. 
Finally, while social capital has mainly been studied within the context of 
social networks, I note that it can be generalized to other domains – such as economic 
networks – in which resources obtained from network ties are of central importance. 
The model’s simple definition of social capital to be the human capital sum of link 
neighbors allows the framework to be readily extended to other types of networks, 
possibly by substituting human capital with a different resource of importance. I hope 
that this demonstration highlighting the complexity of network behavior in space and 
52 
 
in the presence of churning dynamics will stimulate further investigation of the 
implications for differential spatial patterns and social inequality.  
 
  
53 
 
APPENDIX A. 
Relationship between 𝜃𝑓, 𝜃𝑑, and (a) aggregate social capital ∑𝑖 𝑆𝐶𝑖 (b) link count (k). 
Δ  is the average change (in multiples) in (a) social capital and (b) link count 
compared to the next highest 𝜃𝑑 value. N  = 1,000, L = 100, v = 15. Values are 
averaged over 5 simulations. 
(a) Aggregate social capital (∑𝑖 𝑆𝐶𝑖) 
𝜃𝑑 
𝜃𝑓 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
0 1,215.4 5.0 2.1 1.8 0.6 - a - - - - - 
0.1 49,813.1 578.3 251.0 150.7 93.8 67.6 42.7 29.8 20.2 8.2 - 
0.2 80,618.1 1,106.5 508.4 306.1 208.1 137.0 95.7 71.5 49.3 22.4 1.7 
0.3 109,908 1,649.8 734.5 474.6 326.3 223.3 163.9 125.6 90.3 42.6 9.9 
0.4 136,943 2,199.3 990.4 618.1 435.0 288.0 227.0 173.6 115.8 71.8 26.8 
0.5 168,187 2,703.4 1,267.0 780.0 542.6 390.3 295.5 230.9 169.7 105.9 55.6 
0.6 189,633 3,280.7 1,513.8 915.7 680.7 492.3 394.0 288.7 219.6 166.8 99.6 
0.7 219,314 3,821.6 1,783.3 1,137.7 795.3 603.4 480.3 380.8 274.0 214.3 135.2 
0.8 242,426 4,307.1 2,044.5 1,279.0 932.3 706.1 550.8 445.6 349.5 265.1 185.7 
0.9 267,520 4,883.1 2,309.4 1,485.6 1,094.1 808.6 648.7 541.8 420.6 331.5 262.7 
1 285,562 5,348.8 2,547.3 1,631.3 1,227.8 921.6 763.8 601.1 517.2 434.4 334.1 
ΔSC 57.6 1.1 0.6 0.4 0.4 0.3 0.3 0.3 0.3 0.5 - 
a  For lower values of 𝜃𝑓, sufficiently high values of 𝜃𝑑 prohibit the network from maintaining any links, 
resulting in no social capital.  
54 
 
(b) Link count (K)  
𝜃𝑑 
𝜃𝑓 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
0 999 4 2 2 1 - - - - - - 
0.1 45,244 469 207 126 78 57 38 25 17 7 - 
0.2 73,048 930 412 249 172 112 81 61 43 21 1 
0.3 99,041 1,384 615 389 263 183 133 109 74 37 8 
0.4 122,936 1,853 827 512 356 239 192 146 98 63 24 
0.5 150,770 2,305 1,065 652 446 326 243 191 143 90 47 
0.6 173,607 2,815 1,296 780 565 409 329 240 185 140 83 
0.7 202,849 3,252 1,514 961 662 505 394 319 232 180 120 
0.8 226,062 3,718 1,748 1,092 790 586 467 374 292 229 160 
0.9 250,995 4,251 1,978 1,264 927 681 549 453 355 282 224 
1 270,693 4,683 2,232 1,398 1,036 776 640 517 434 366 283 
ΔK 62.0 1.2 0.6 0.4 0.4 0.3 0.3 0.3 0.3 0.5 - 
55 
 
APPENDIX B. 
Relationship between 𝜃𝑓, 𝜃𝑑, and the Gini coefficient. ΔGini is the average change in 
the Gini coefficient compared to the next highest 𝜃𝑑 value. N  = 1,000, L = 100, v = 
15. Values are averaged over 5 simulations. 
𝜃𝑑 
𝜃𝑓 
0 0.00125 0.0025 0.005 0.01 0.015 0.02 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
0 0.41 0.69 0.82 0.90 0.95 0.96 0.97 0.99 0.99 0.99 0.79 -  -  -  -  -  -  
0.1 0.32 0.22 0.24 0.25 0.28 0.31 0.35 0.68 0.82 0.89 0.92 0.94 0.96 0.97 0.98 0.99 -  
0.2 0.36 0.21 0.17 0.19 0.22 0.24 0.27 0.52 0.70 0.79 0.84 0.89 0.92 0.94 0.96 0.98 0.39 
0.3 0.37 0.26 0.17 0.15 0.20 0.21 0.23 0.44 0.61 0.71 0.78 0.84 0.88 0.90 0.93 0.96 0.99 
0.4 0.37 0.28 0.20 0.13 0.17 0.20 0.21 0.39 0.55 0.65 0.72 0.79 0.83 0.87 0.90 0.94 0.97 
0.5 0.36 0.31 0.23 0.14 0.15 0.18 0.19 0.35 0.49 0.59 0.66 0.74 0.79 0.83 0.87 0.91 0.95 
0.6 0.34 0.32 0.25 0.16 0.13 0.17 0.18 0.31 0.44 0.55 0.63 0.70 0.74 0.80 0.84 0.88 0.92 
0.7 0.32 0.32 0.27 0.18 0.12 0.15 0.17 0.29 0.41 0.50 0.58 0.65 0.71 0.75 0.80 0.84 0.90 
0.8 0.30 0.33 0.28 0.20 0.12 0.14 0.16 0.28 0.37 0.47 0.54 0.62 0.67 0.71 0.77 0.81 0.86 
0.9 0.28 0.33 0.30 0.22 0.12 0.13 0.15 0.25 0.36 0.43 0.50 0.57 0.63 0.67 0.73 0.78 0.82 
1 0.26 0.32 0.30 0.23 0.13 0.12 0.14 0.25 0.34 0.41 0.47 0.53 0.59 0.64 0.69 0.72 0.78 
ΔGini 0.02 0.11 0.17 0.05 -0.07 -0.06 -0.36 -0.21 -0.13 -0.06 -0.08 -0.05 -0.04 -0.04 -0.04 0.03 - 
56 
 
APPENDIX C. 
Robustness checks for world size L and neighborhood visibility of introverts v, at 
select parameter values in which 𝜃𝑓 = 𝜃𝑑. 
  
  
  
  
 
  
57 
 
APPENDIX D. 
Details of the model according to the ODD protocol. 
Design concepts 
Basic principles: see section 2.3 above.  
Emergence: I look for the emergence of a power law distribution under different 
specifications of the tie formation and dissolution parameters, as well as under what 
circumstances the power law breaks down. We also look for the emergence of high 
and low inequality in social capital among agents as well as spatial patterns of 
inequality based on different parameter settings. 
Adaptation: There is no explicit adaptation of the agents in terms of changes in static 
state variables. Nonetheless the agents adapt their behavior with respect to their levels 
of social capital, with higher social capital agents deleting more ties due to the limited 
social effort they can exert.  
Objective: The agents’ objective is to maximize their social capital in each time frame, 
by connecting to another agent that preferentially has a higher level of human capital 
(which would imply that these agents also have higher connectivity). However, the 
agents are constrained by distance, being able to connect more easily to others who are 
closer to themselves. Introverts are also constrained by their neighborhood visibility, 
only being able to connect to others within a fixed threshold distance.  
Sensing: Agents use probabilities in deciding which new connection to form and 
which tie to dissolve. The probabilities are calculated based on the information set that 
reveals the human capital and connectivity levels of other agents, as well as their 
relative distance to these agents. Introverts are assumed to be able to sense this 
58 
 
information only for agents within their visibility reach, while extroverts are assumed 
to be fully knowledgeable of all information regarding all agents in the network. 
Interaction: Agents interact directly with each other through the formation and the 
dissolution of links, which in the real world would represent the re-evaluation of social 
connections. Introverts interact only with other agents within their visibility reach, 
while extroverts interact with all other agents. All agents interact with all possible 
others in each time frame. 
Stochasticity: Stochastic processes are used to assign the spatial position of each agent 
at birth. In addition, the human capital of each agent is randomly drawn from a 
uniform distribution within the interval [0, 1]. Stochasticity also appears in all tie 
formation and dissolution processes, for these processes are determined based on 
probabilities which may yield different results based on different random number 
seeds.  
Collectives: The collective – or aggregate – level of social capital, the degree 
distribution, average levels of social capital for extroverts versus introverts, as well as 
spatial patterns of social capital inequality are represented in the model, and emerge 
from the behavioral characteristics of agents.  
Observation: I track the evolution of degree distribution, individual and aggregate 
levels of social capital, and inequality in social capital among agents as well as 
spatially. I use numeric data representations as well as graphs to present these outputs 
of interest, and also utilize maps of agents to present spatial patterns. 
Initialization 
59 
 
At the beginning of every simulation (t=0), two agents linked to each other are placed 
uniformly and randomly on a respective patch. These two agents are assigned into the 
introvert or extrovert breed randomly with equal probability, and their human capital 
is drawn from a uniform distribution with [0, 1] support. Introverts are assigned a 
specific neighborhood visibility reach value. In addition, the tie formation and 
dissolution parameters are set globally at specific values, and these values are assumed 
to be constant across agents.  
Input data 
The environment is assumed to be generic, and thus the model has no input data. 
Submodels 
See the process overview and scheduling section for details on each procedure. 
  
60 
 
REFERENCES 
 
Adamic, L. A., & Huberman, B. A. (2000). Power-law distribution of the world wide 
web. Science, 287(5461), 2115-2115.  
Aral, S., Muchnik, L., & Sundararajan, A. (2009). Distinguishing influence-based 
contagion from homophily-driven diffusion in dynamic networks. Proceedings 
of the National Academy of Sciences, 106(51), 21544-21549.  
Barabasi, A.-L. (2000). Linked: how everything is connected to everything else and 
what it means. Plume Editors.  
Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. 
Science, 286(5439), 509-512.  
Batty, M. (2013). The new science of cities: Mit Press. 
Bianconi, G., & Barabási, A.-L. (2001). Competition and multiscaling in evolving 
networks. Europhysics Letters, 54(4), 436.  
Borgatti, S. P., & Cross, R. (2003). A relational view of information seeking and 
learning in social networks. Management Science, 49(4), 432-445.  
Browning, C. R., Dietz, R. D., & Feinberg, S. L. (2004). The paradox of social 
organization: networks, collective efficacy, and violent crime in urban 
neighborhoods. Social Forces, 83(2), 503-534.  
Burt, R. S. (1992). Structural holes: The social structure of competition: Harvard 
university press. 
Burt, R. S. (2005). Brokerage and closure : an introduction to social capital. Oxford: 
Oxford University Press. 
61 
 
Campbell, K. E. (1990). Networks past: a 1939 Bloomington neighborhood. Social 
Forces, 69(1), 139-155.  
Cassi, L., & Plunket, A. (2014). Proximity, network formation and inventive 
performance: in search of the proximity paradox. The Annals of Regional 
Science, 53(2), 395-422.  
Castellano, C., Marsili, M., & Vespignani, A. (2000). Nonequilibrium phase transition 
in a model for social influence. Physical Review Letters, 85(16), 3536.  
Coleman, J. S. (1988). Social capital in the creation of human capital. American 
Journal of Sociology, 94, S95-S120.  
Cornwell, B., & Dokshin, F. A. (2014). The power of integration: affiliation and 
cohesion in a diverse elite network. Social Forces. doi: 10.1093/sf/sou068 
Cuperman, R., & Ickes, W. (2009). Big five predictors of behavior and perceptions in 
initial dyadic interactions: personality similarity helps extraverts and introverts, 
but hurts “disagreeables”. Journal of Personality and Social Psychology, 
97(4), 667.  
De Montis, A., Chessa, A., Campagna, M., Caschili, S., & Deplano, G. (2009). 
Complex networks analysis of commuting Complexity and spatial networks 
(pp. 239-255): Springer. 
de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510-
515. doi: 10.1126/science.149.3683.510 
Erdos, P., & Rényi, A. (1960). On the evolution of random graphs. Publications of the 
Mathematical Institute of the Hungarian Academy of Sciences, 5, 17-61.  
62 
 
Fritsch, M., & Kauffeld-Monz, M. (2010). The impact of network structure on 
knowledge transfer: an application of social network analysis in the context of 
regional innovation networks. The Annals of Regional Science, 44(1), 21-38.  
Gans, H. J. (1962). The Urban Villagers: Group and Class in the Life of Italians-
Americans: New York: Free Press of Glencoe. 
Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 
78(6), 1360-1380.  
Granovetter, M. (2005). The impact of social structure on economic outcomes. The 
Journal of Economic Perspectives, 19(1), 33-50.  
Greve, A., & Salaff, J. W. (2003). Social networks and entrepreneurship. 
Entrepreneurship Theory and Practice, 28(1), 1-22.  
Hampton, K., Sessions, L., Her, E., & Rainie, L. (2009). Social isolation and new 
technology. Pew Internet & American Life Project: Washington. 
Holme, P., & Newman, M. E. (2006). Nonequilibrium phase transition in the 
coevolution of networks and opinions. Physical Review E, 74(5), 056108.  
Ioannides, Y. M. (2013) From neighborhoods to nations: the economics of social 
interactions. Princeton University Press. 
Ioannides, Y. M., & Topa, G. (2010). Neighborhood effects: accomplishments and 
looking beyond them. Journal of Regional Science, 50(1), 343-362.  
Jackson, M. O. (2008). Social and economic networks (Vol. 3). Princeton: Princeton 
University Press. 
Kalmijn, M. (1998). Intermarriage and homogamy: causes, patterns, trends. Annual 
Review of Sociology, 395-421.  
63 
 
Karnstedt, M., Hennessy, T., Chan, J., Basuchowdhuri, P., Hayes, C., & Strufe, T. 
(2010). Churn in social networks Handbook of social network technologies 
and applications (pp. 185-220): Springer. 
Koka, B. R., Madhavan, R., & Prescott, J. E. (2006). The evolution of interfirm 
networks: Environmental effects on patterns of network change. Academy of 
Management Review, 31(3), 721-737.  
Kosmidis, K., Havlin, S., & Bunde, A. (2008). Structural properties of spatially 
embedded networks. Europhysics Letters, 82(4), 48005.  
Lin, N. (2001). Social capital: A theory of social structure and action. Cambridge, 
UK: Cambridge Univ. Press. 
Lin, N., & Dumin, M. (1986). Access to occupations through social ties. Social 
Networks, 8(4), 365-385.  
Freeman, L. (1977). A set of measures of centrality based on betweenness. 
Sociometry, 40(1): 35-41. 
Madden, M. (2012). Privacy management on social media sites. Pew Internet Report, 
1-20.  
Mansury, Y., & Gulyás, L. (2007). The emergence of Zipf's Law in a system of cities: 
An agent-based simulation approach. Journal of Economic Dynamics and 
Control, 31(7), 2438-2460.  
Mansury, Y., & Shin, J. (2015). Size, connectivity, and tipping in spatial networks: 
Theory and empirics. Computers, Environment and Urban Systems, 54, 428-
437.  
Marshall, A. (1920). Principles of Economics. London: MacMillan. 
64 
 
Mayhew, B. H., McPherson, M., Rotolo, T., & Smith-Lovin, L. (1995). Sex and ethnic 
heterogeneity in face-to-face groups in public places: an ecological perspective 
on social interaction. Social Forces, 74, 15-52.  
McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: homophily 
in social networks. Annual Review of Sociology, 415-444.  
Mehra, A., Kilduff, M., & Brass, D. J. (1998). At the margins: A distinctiveness 
approach to the social identity and social networks of underrepresented groups. 
Academy of Management Journal, 41(4), 441-452.  
Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56-63.  
Miller, J. H., & Page, S. E. (2009). Complex adaptive systems: An introduction to 
computational models of social life: Princeton university press. 
Moretti, E. (2004). Estimating the Social Return to Higher Education: Evidence from 
Longitudinal and Repeated Cross-Sectional Data. Journal of Econometrics, 
121(1-2), 175-212. 
Nijkamp, P., Rose, A., & Kourtit, K. (2014). Regional science matters: studies 
dedicated to Walter Isard: Springer. 
Portes, A. (1998). Social capital: Its origins and applications in modern sociology. 
Annual Review of Sociology, 24, 25.  
Putnam, R. D. (2001). Bowling alone: The collapse and revival of American 
community: Simon and Schuster. 
Railsback, S. F., & Grimm, V. (2011). Agent-based and individual-based modeling: a 
practical introduction: Princeton university press. 
65 
 
Sampson, R. J. (2004). Networks and neighbourhoods: The implications of 
connectivity for thinking about crime in the modern city. Demos Collection, 
155-166.  
Sasovova, Z., Mehra, A., Borgatti, S. P., & Schippers, M. C. (2010). Network churn: 
the effects of self-monitoring personality on brokerage dynamics. 
Administrative Science Quarterly, 55(4), 639-670.  
Smith, A., & Duggan, M. (2013). Online dating & relationships. Pew Internet & 
American Life Project.  
Solé, R. V., Manrubia, S. C., Luque, B., Delgado, J., & Bascompte, J. (1996). Phase 
transitions and complex systems: simple, nonlinear models capture complex 
systems at the edge of chaos. Complexity, 1(4), 13-26.  
Tobler, W. R. (1970). A computer movie simulating urban growth in the detroit 
region. Economic Geography, 46, 234-240.  
Torrens, P. M. (2007). A geographic automata model of residential mobility. 
Environment and Planning B: Planning and Design, 34(2), 200-222.  
Torrens, P. M. (2010). Agent‐based models and the spatial sciences. Geography 
Compass, 4(5), 428-448.  
Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of 
choice. Science, 211(4481), 453-458.  
Vega-Redondo, F. (2007). Complex social networks: Cambridge University Press. 
Verbrugge, L. M. (1983). A research note on adult friendship contact: a dyadic 
perspective. Social Forces, 62, 78-83.  
66 
 
Watts, D. J. (1999). Small worlds: the dynamics of networks between order and 
randomness: Princeton University Press. 
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. 
nature, 393(6684), 440-442.  
Wellman, B. (1996). Are personal communities local? A Dumptarian reconsideration. 
Social Networks, 18(4), 347-354.  
Wilensky, U. (1999). NetLogo. Center for Connected Learning and Computer-Based 
Modeling, Northwestern University. Evanston, IL.  
Wilson, K. L., & Portes, A. (1980). Immigrant enclaves: an analysis of the labor 
market experiences of Cubans in Miami. American Journal of Sociology, 295-
319.  
Xie, Y., Batty, M., & Zhao, K. (2007). Simulating emergent urban form using agent-
based modeling: Desakota in the Suzhou-Wuxian region in China. Annals of 
the Association of American Geographers, 97(3), 477-495.  
Zipf, G. K. (1949). Human behavior and the principle of least effort. New York: 
Hafner. 
 
  
67 
 
CHAPTER 3  
AGGLOMERATION, REGIONAL SOCIAL CAPITAL, AND 
ENTREPRENEURSHIP IN CITIES 
 
3.1. Introduction 
Entrepreneurship research, while still relatively new, has provided multiple models 
regarding the link between new firm formation and economic growth and 
development. For example, entrepreneurship has been touted to generate channels of 
creative destruction (Akcigit and Kerr 2010), where it allows the means of production 
to be used in newer and more efficient combinations (Schumpeter 1934). Others have 
suggested that entrepreneurship drives innovation by transforming general knowledge 
into economic knowledge that can be exploited for personal gain (Audretsch and 
Keilbach 2004). While many theories exist, most agree that entrepreneurship is the 
result of individuals or groups perceiving and acting upon economic opportunities that 
manifest in their surrounding environment. However, research regarding how exactly 
entrepreneurs perceive and act upon such opportunities is still in its infancy.  
 In this paper, I argue that social interactions, and more broadly social capital 
within the community or region, aids entrepreneurs in the early stages of forming new 
firms. Such a view that economic outcomes are driven by social forces is certainly not 
new. Marshall (1920) emphasized how “the mysteries of the trade become no mystery, 
but are, as it were, in the air” when elaborating on his theory of intellectual spillovers 
within agglomerations. Within this statement is implicit that intellectual spillovers are 
possible because individuals collocate and gain information through social linkages, 
which allow the knowledge “in the air” to be shared with one another. Saxenian 
68 
 
(1996) describes how new firms in Silicon Valley – due the horizontal integration of 
small firms in the region – benefited from such social linkages, as opposed to Route 
128 which was dominated by large, inward looking incumbent firms. The benefits of 
social capital are not confined to knowledge exchange. For example, entrepreneurs are 
aided in their search for financial capital and qualified recruits through the social ties 
they maintain (Stuart and Sorenson 2003), and it has even been argued that social 
capital aids entrepreneurs in building self-confidence (Sorenson and Audia 2000). 
 More recently, the literature on agglomeration has embraced the idea that the 
traditional factors that have been thought to cause agglomeration of economic activity 
also represent – at least in part – social interactions (Glaeser 2008; Ioannides 2013). 
Among others, proximity to customers and suppliers may reduce the costs of obtaining 
inputs or transporting goods to downstream consumers (Ellison, Glaeser, and Kerr 
2010; Fujita, Krugman, and Venables 1999), but it also may embody stronger social 
ties between similar firms and customers that increases trust and information exchange 
(Dahl and Sorenson 2012). Similarly, labor market pooling shields workers from firm-
specific shocks (Krugman 1991) and promotes better worker-firm matches (Helsley 
and Strange 1990), but it also represents social homophily (McPherson, Smith-Lovin, 
and Cook 2001). Critically, such representations of social interactions are inherently 
ego-centric, in that they view interactions as being shaped solely by one’s own 
network of relations. This leaves out the role of regions and geography in directly 
shaping and influencing the interactions and social capital that are available to its 
members. 
69 
 
A considerable body of literature within the economic geography and economics fields 
has thus developed which considers social interactions and social capital within the 
regional domain. Social aspects of the region has been viewed to be a crucial element 
of regional competitiveness (Kitson, Martin, and Tyler 2004; Porter 2003), where the 
social characteristics of a region are not simple aggregations of firms or individuals. 
Porter (1998) suggests that a key component of cluster formation and success is the 
degree of social embeddedness, the existence of facilitative social networks, social 
capital, and institutional structures. Similarly, Storper (1995, 2013) stresses the 
importance of “untraded interdependencies” such as networks of trust and cooperation 
as well as local norms and conventions when considering the success of regions. Thus, 
the natural question to ask is whether there is a role that regional social capital plays in 
promoting entrepreneurship, over and above the effect of social interactions at the 
micro-level.  
This paper attempts to unify the treatment of regional social capital and 
agglomeration economies as being part of the broader “entrepreneurial ecosystem” of 
a region, where the ecosystem takes its form in various types of networks and their 
linkages. In as much as the main Marshallian forces – being customer supplier 
linkages, labor market pooling, and knowledge spillovers – are also manifestations of 
interactions of a certain form, I argue that they along with social capital can naturally 
be analyzed using the characteristics of suitably defined networks of industries, social 
organizations, and knowledge. Utilizing network analysis and a detailed dataset of 
nonprofit organizations within the US, I focus on aspects of social capital that are 
embodied within the characteristics of networks of nonprofit organization 
70 
 
classifications within a region. The focus on nonprofit organizations follows that of 
previous studies of community social capital (Feld 1981; R. D. Putnam 2001), where it 
is viewed that many opportunities for social interaction within a community emerge 
within the context of voluntary associational activity. In concordance, customer and 
supplier links are viewed as linkages within an inter-industry network defined by 
input-output relationships, and labor market pooling is defined using network 
relationships based on industrial occupational composition. Finally, knowledge 
spillovers are defined using networks that consider the distribution of patents as well 
as institutions that promote knowledge creation within a given region. 
The main goal of this paper is to compare the relative effects of different 
Marshallian economies and community social capital on entrepreneurship, and to 
assess whether these forces can explain the large variations in regional 
entrepreneurship rates that have been previously documented in the literature (Acs and 
Armington 2004, 2006). In addition, I move beyond the traditional emphasis on 
specific industries (such as manufacturing) and attempt to identify whether the effects 
of these forces differ across a diverse set of industries. In particular, distinctions are 
made based on being traded, local, high-tech, or low-tech (Delgado, Porter, and Stern 
2016), as well as between manufacturing and non-manufacturing sectors.  
The paper is organized as follows. I begin by discussing the links between 
agglomeration theory and social interactions, and how network analysis can be utilized 
to measure these forces fruitfully within a unified framework. Section 3 presents the 
calculation of the various indices used in the empirical estimation, along with key 
descriptive statistics. Section 4 presents the empirical framework, while Section 5 
71 
 
discusses the main findings. I conclude with a final discussion of key insights and 
relevant policy implications. 
 
3.2. Related literature 
3.2.1 Entrepreneurship, agglomeration, and social capital 
The geographic concentration of economic activity has long been of central interest to 
urban economists, economic geographers, and regional scientists (Glaeser et al. 1992; 
Krugman 1991; Porter 2003; Storper and Christopherson 1987; Storper and Venables 
2004). Indeed, many studies have shown how aggregate activity is concentrated in 
large urban areas, with estimates suggesting that in the US, 2% of the land area in the 
lower 48 states is home to roughly 75% of the population (Rosenthal and Strange 
2004). Generally, agglomeration economies – or aggregate urban external effects – are 
thought to arise as the sum of a large number of individual externalities, both between 
establishments and individuals. Ultimately, external economies within cities arise due 
to productivity gains accrued through proximity, which in turn reduces transport costs, 
allows better access to specialized inputs and labor, and allows for “the mysteries of 
the trade” to be shared with one another (Marshall 1920). Even with the remarkable 
decrease in transportation costs and the development of knowledge diffusion 
mechanisms that readily traverse geographic boundaries, if anything this pattern of 
agglomeration has become stronger over the years (Storper and Venables 2004). It is a 
continuing trend of urban economies that employment and firms are geographically 
concentrated, with more successful regions experiencing a resurgence of 
agglomerative activity (Scott et al. 2001). 
72 
 
A key argument that is proposed in this paper is that social interactions and 
social capital have a geographic dimension, over and above that of individual 
networks of relations, whether they be between firms or individuals. Moreover, I argue 
that such social forces are an underlying agglomerative mechanism, much like the 
Marshallian microfoundations. It is a well-documented fact that individuals primarily 
have connections with others that reside or work in the same region, with the odds of 
maintaining relationships sharply declining with distance (McPherson, Smith-Lovin, 
and Cook 2001; Zipf 1949). Chen et al. (2010) document how the high concentration 
of venture capital in select regions may award advantages to entrepreneurs within 
locations for which venture capital is abundant, where even for similar projects the 
entrepreneur located within the region would enjoy an advantage compared to those 
farther away. As the venture capital market is highly influenced by social connections 
(Sorenson and Stuart 2001), such an example is a case where regional social capital 
benefits the entrepreneur, regardless of individual networks of relations. A similar 
argument can be made for immigrant entrepreneurs, who benefit extensively from 
regional social capital that is driven by the high concentration of  homogenous ethnic 
groups (Wilson and Portes 1980).  
The economic geography literature in particular has paid much attention to the 
geography of social interactions across a broad variety of applications. Currid and 
Williams (2010) study the spatial and geographic dimensions of the social milieu 
within the context of cultural industries, and find that social geography exhibits 
nonrandom spatial clustering and that such clusters tend to reinforce themselves. 
Bürker and Minerva (2014) consider the variability of civic capital both across and 
73 
 
within regions, and find that variable endowments of civicness affects economic 
outcomes, in particular the size distribution of plants. Many others have studied the 
relationship between geography and social interactions within different contexts, from 
the knowledge spillover activities of mobile inventors (Agrawal, Cockburn, and 
McHale 2006) to knowledge flows across European regions (Caragliu and Nijkamp 
2016), and even for manufacturing industry networks in Tanzania (Murphy 2003). The 
overwhelming consensus is that social interactions and social capital yields economic 
benefits, and most critically that this link is fundamental to theories of agglomeration 
(Kemeny et al. 2016).  
Like social capital, entrepreneurship is also geographically concentrated. 
Fairlie (2014) finds that entrepreneurship rates (calculated as the percentage of 
individuals age twenty to sixty-four who report owning a new business) differ 
significantly across states, with California, Montana, and South Dakota exhibiting 
higher and the Northeast and Midwest states showing lower levels of firm formation. 
The discrepancy is apparent at the Metropolitan Statistical Area (MSA) level as well, 
with Los Angeles-Long Beach-Santa Ana reporting entrepreneurship rates greater than 
0.5% of the labor force while Detroit-Warren-Livonia experiencing rates lower than 
0.2%. Since entrepreneurship is also a form of economic activity that benefits from 
productivity gains, it is intuitive to theorize that it too will benefit from agglomeration 
externalities of the type suggested by Marshall. Broadly, it has been theorized that the 
difference in rates of entrepreneurship can be explained by (1) differential returns to 
entrepreneurship, (2) differential availability of inputs and human capital, (3) 
74 
 
differential supplies of ideas, and (4) differences in the local culture (Glaeser, 
Rosenthal, and Strange 2010). 
 It is remarkable how the theories that hypothesize causes of differential rates of 
entrepreneurship parallel well-known theories of agglomeration. Thus as mentioned 
above, there is reason to believe that agglomeration externalities that result in 
productivity gains for the entrepreneur would also be influenced by social interactions 
and social capital at the regional level. As virtually all economic behavior is embedded 
in networks of social relations (Granovetter 1995; Ioannides 2013), it should be the 
case that social capital (of the positive sort) should affect positively entrepreneurial 
outcomes. An extensive literature has documented how the social capital of 
entrepreneurs impacts the success of their ventures (for example, Sorenson 2005; 
Stuart and Sorenson 2005), with personal and professional relationships with critical 
actors that act as brokers of valuable entrepreneurial resources being instrumental in 
aiding business creation (Hoang and Antoncic 2003).  
Within the literature, there exists a conceptual distinction between two types of 
community social capital; namely bonding and bridging (de Souza Briggs 1998; R. D. 
Putnam, Leonardi, and Nanetti 1993; R. D. Putnam 2001). While here I focus on the 
aggregate level, this distinction closely mirrors that of social capital theory at the 
individual level, which differentiates between strong ties that are characteristic of 
homophilous interactions (Coleman 1988) and weak ties that traverse greater social 
distance and connect otherwise disparate groups (Burt 2005; Granovetter 1973). At the 
regional level, bonding social capital is characteristic of strong, repeated interactions 
among individuals or groups who are like one another, and are thought of to promote 
75 
 
trust, reciprocity, and enforcement of social norms. Storper and Venables (2004), 
while not explicitly coining the term, talk about how such repeated interactions 
(termed face to face contact, or buzz) are beneficial by not only promoting trust, but 
also by allowing for rapid feedback in communication and reducing the tendency for 
free riding, among other factors. On the contrary, bridging social capital has been 
commonly thought of as aiding individuals in “getting ahead,” with widely dispersed 
ties to people and institutions with different backgrounds helping in gaining non-
redundant information and unique insights (Burt 2004; de Souza Briggs 1998; R. 
Putnam et al. 2004). Most famously, Jacobs (1969) promoted the closely related view 
that new ideas and innovation come from diversity, arguing that innovations are the 
product of cross-industry fertilization made possible by contact with individuals with 
different perspectives. Under this theoretical lens, in the context of entrepreneurship, 
bonding social capital at the regional level may help entrepreneurs by, for example, 
promoting easier access to resources (especially club goods), or by reducing 
transaction costs otherwise incurred when dealing with unfamiliar contacts. On the 
contrary, bridging social capital would positively benefit entrepreneurs who seek 
information on promising business ventures or innovative knowledge. This paper 
contributes to the growing literature on entrepreneurship and social interactions by 
distinguishing between these two forces, and accounting explicitly for their effects on 
entrepreneurship across a variety of different industries.  
3.2.2 A network theoretic approach to the entrepreneurial ecosystem 
As theories of agglomeration and social capital both embed the concept of interactions 
(whether among firms or individuals), I argue that they can be characterized under a 
76 
 
unifying lens that considers these interactions as linkages that are part of broader 
networks of industries, organizations, and individuals. A network theoretic approach 
to defining agglomeration externalities as well as social capital is appealing in that 
networks – by definition – are the joint set of nodes and their linkages, however these 
nodes are defined. Without explicitly considering network typologies, agglomeration 
scholars have already introduced network theoretic concepts in relating each of the 
Marshallian factors with various economic outcomes. For example, Ellison et al. 
(2010) derive a metric of coagglomeration that measures the strength of agglomerative 
forces between industry pairs. The index can be thought of as representing the strength 
of linkages between industries, defined over a geographic space. Furthermore, their 
metrics of proximity to suppliers and consumers, labor market pooling, and 
technology spillovers bear a noteworthy resemblance to proximity metrics that have 
been calculated in network settings (Hidalgo et al. 2007; Hidalgo and Hausmann 
2009). Similarly, Glaeser and Kerr (2009) also define Marshallian economies using 
indices that consider pairwise proximities of industries based on input-output, labor 
market, and technological proximities. The key distinction between these studies and 
those that explicitly consider network topologies is that the former only consider either 
dyadic relationships (i.e. between any two given industries) or direct linkages (i.e. 
between a particular industry and its first-degree neighbors), while the latter consider 
all linkages and the aggregate properties of networks. 
 Recent work on complex networks has focused on analytical methods and 
concepts which consider not only the linkages of individual nodes but also the 
aggregate characteristics of the network as a whole (Hausmann and Klinger 2006; 
77 
 
Hidalgo et al. 2007; Hidalgo and Hausmann 2009; Jackson 2008). In particular, 
Hidalgo et al. (2007) consider the concept of product space that is characterized by the 
revealed linkages between products in global trade, where the linkages are defined 
based on joint trade patterns between products across countries. In related work, 
Hidalgo and Hausmann (2009) develop a method to calculate the competitiveness of 
countries based on these revealed trade patterns that takes into account both direct and 
undirect linkages (i.e. links of neighbors, links of neighbors of neighbors, etc) between 
products and countries. Within the economic geography literature, firms, industries 
and products have been viewed within a network paradigm to map global cluster 
networks of high-tech industries, and common network techniques such as community 
structure detection have been used to explain increased geographic concentration in 
particular industries (Turkina, Van Assche, and Kali 2016).  
 A key issue when considering networks is how to define nodes and their 
linkages. I argue that each of the Marshallian economies as well as regional social 
capital should be represented by distinct networks of nodes and links based on their 
theoretical underpinnings, but under a unifying framework that considers them as 
characteristics of networks that represent the broader entrepreneurial ecosystem of a 
region. For example, proximity to suppliers and consumers can readily be 
characterized as a network of industries (the “industry space”), where the industries 
are related based on how much inputs or outputs they share (the linkages). However, 
the degree of proximity (i.e. strength of linkages) between industries in a region may 
also differ based on regional patterns of specialization (i.e. the overall position of the 
region within the industry space). Consider the case of the health care and 
78 
 
manufacturing industry, for which overall input-output linkages would be relatively 
weak. If a region is highly specialized in manufacturing, even if the linkages between 
the two industries are weak, the health care industry will be more susceptible to shocks 
in the manufacturing sector than, say the health care sector of a region for which 
manufacturing is relatively absent. Furthermore, if the manufacturing sector is itself 
strongly linked to the retail trade sector, and this sector is also highly specialized, 
shocks in the trade sector should have strong repercussions on both the manufacturing 
and the health care industry even if the direct link between health care and trade is 
relatively weak. It is because of this interconnectedness that the aggregate network 
characteristics of a region as a whole should also be considered in addition to 
individual pairwise proximities when measuring the strength of supplier and consumer 
linkages. The same argument applies for the other factors as well, differing only in 
how the nodes and linkages are defined. The metrics introduced in the following 
section are an attempt to merge and capture the key aspects of both the pairwise 
proximities and regional characteristics within an index that can be readily measured 
using available data sources.  
 
3.3. Data and variables 
The unit of analysis for the study is the MSA-industry pair, where MSAs are defined 
using the 2009 MSA definitions taken from the US Office of Management and Budget 
(OMB)17, and industries are defined at the 4-digit level using 2007 definitions of the 
                                                 
17 I focus on the MSA level for a number of reasons. First, data for some variables were not available 
for lower levels of geography, and in many cases, available data were too noisy for proper estimation. 
This is especially pronounced for patent data, where the locations assigned to the inventors has been 
noted as a source of random measurement error at lower levels of geography (Agrawal et al. 2014). 
79 
 
North American Industry Classification System (NAICS)18. I only considered MSAs 
within the lower 48 states, and further excluded 7 MSAs that did not have reliable 
demographic information from the American Community Survey (ACS).19 Due to the 
panel nature of the data, concordances between 2002, 2007, and 2012 NAICS 
classifications were made.20 For industries, I excluded agriculture, private households, 
and public administration, as well as some industries for which entrepreneurship data 
was not available.21 This resulted in a dataset that included 356 MSAs and 282 
industries, for a total of 100,392 MSA-industry pairs. The dataset spans the years 2005 
to 2013, where entrepreneurship data was collected for the years 2006 to 2013 while 
the underlying variables were for the years 2005 to 2012 (a 1 year lag). This resulted 
in 8 years of data, for a total of 803,136 observations. 
3.3.1 Entrepreneurship 
Entrepreneurship data were drawn from the Statistics of U.S. Businesses (SUSB), an 
annual dataset produced by the US Census Bureau that provides detailed geographic 
and industry level data on the number of establishments and employment levels, as 
                                                 
Furthermore, the metrics that proxy for the entrepreneurial ecosystem were deemed more accurate 
when considered at the MSA level, as opposed to the county or zip-code level, where regional 
characteristics may not be fully represented. 
18 The 4-digit NAICS level was used to strike a balance between appropriate granularity and error due 
to constructing concordances between different classifications. In addition, occupation data for 
industries below the 4-digit level suffered from a high percentage of non-disclosure, which rendered 
noisy estimates for the calculated metrics. 
19 The seven MSAs were Cape Girardeau-Jackson, MO-IL (16020), Carson City, NV (16180), Hinesville-
Fort Stewart, GA (25980), Lewiston, ID-WA (30300), Manhattan, KS (31740), Mankato-North Mankato, 
MN (31860), and Steubenville-Weirton, OH-WV (44600). 
20 The differences were minimal across these classifications at the 4-digit level. In cases where 4-digit 
codes for 2002 and 2012 NAICS classifications did not fully map onto the 2007 definitions, a 
concordance was made based on relative employment levels for industries at the 6-digit level that 
shared a 4-digit industry. Roughly 10 4-digit industries were affected by this mapping scheme. 
21 These included postal services (NAICS 4911), rail transportation (NAICS 4821), and insurance and 
employee benefit funds (NAICS 5251). 
80 
 
well as information on firm births, deaths, expansions, and contractions.22 The SUSB 
provides data for the universe of US establishments with paid employees, using the 
Census Bureau’s Business Register as its underlying source.23 In this sense, it is 
similar to other databases such as the Longitudinal Business Database (LBD), while 
having the advantage of being more readily available as its use is not restricted to 
qualified researchers through Census data centers. The SUSB distinguishes between 
single-unit start-ups (i.e. new firms) and start-ups that are part of a multi-unit 
enterprise. The primary focus of this paper is on single-unit start-ups, yet comparisons 
are made with start-up activity including those that are expansions of existing 
enterprises.  
 Counts of new firms for a given MSA-industry pair are calculated at the 4-digit 
industry level for the years 2006 to 2013, which is used as the main outcome variable. 
I focus on counts of new establishments instead of employment counts, as 
employment counts at detailed geographies and industries suffered from disclosure 
issues while establishment counts remained uncensored. Nonetheless, previous studies 
have shown that empirical results obtained from considering establishment counts are 
very much similar to those where employment counts were considered (Rosenthal and 
Strange 2003), and as such it is expected that the results will be little affected by the 
choice of establishments over employment.24 Considering that this timeline 
                                                 
22 While the publicly available data for the SUSB only provide establishment and employment change 
data at the state level, special tabulations are available at the county level for a reasonable cost. These 
tabulations were used to construct the dataset, aggregating county data to the MSA level using 2009 
MSA definitions. 
23 As such, the SUSB excludes non-employer businesses such as sole proprietors with no paid 
employees. 
24 Furthermore, start-ups typically begin with a very small number of employees, which should also 
make the difference between considering establishments instead of employment minimal. 
81 
 
encompasses the recent recession years, I was able to distinguish patterns of 
entrepreneurship before and after the recession.  
Table 3.1 provides a summary of the count of new firms as well as entry rates 
(calculated as new firm births divided by the number of incumbent establishments) for 
all industries as well as for select industry groups.25 It can be seen that 
entrepreneurship in the US is dominated by the local, low-tech, and non-
manufacturing sectors, with higher overall counts of new firms as well as higher entry 
rates. However, research suggests that the traded, high-tech, and manufacturing 
industries account for a disproportionate level of employment despite their small share 
of total establishments, and usually reward higher wages to their employees (Delgado, 
Porter, and Stern 2016), and as such disentangling the determinants of 
entrepreneurship for these industries is of great importance. Furthermore, when 
examining entrepreneurship rates before and after the recession, overall these 
industries remained relatively resilient, which suggests that these industries are less 
susceptible to shocks in the business cycle and may be more reliable sources of job 
creation. Overall, the manufacturing sector was the most resilient in terms of 
entrepreneurship rates, while the locally traded industries (which are highly dependent 
on local demand) were the hardest hit.  
3.3.2 Labor market pooling  
One of the main reasons firms agglomerate is to benefit from scale economies 
associated with a large labor pool (Ellison, Glaeser, and Kerr 2010). The benefits of  
                                                 
25 I use the classification scheme developed by Delgado, Porter, and Stern (2016) to distinguish 
between traded, local, high-tech, and low-tech industries. Manufacturing refers to the industries 
classified under sectors 31-33 in the 2007 NAICS. 
82 
 
Table 3.1. Count of new firms and entry rates for single and all establishment 
births 
Single (start-up) establishment births All establishment births 
Category Total 
Total births Rate Rate Rate Rate Rate Rate 
births ∆a 
2006-2013 all 2006- 2010- ∆s all 2006- 2010- 2006-2013 
(% of total) years 2009 2013 years 2009 2013 
(% of total) 
All industries 4,974,638 8.8 9.3 8.3 -10.0 6,206,174 11.0 11.6 10.4 -10.7 
Traded 1,162,522 9.0 9.3 8.7 -6.1 1,560,027 12.0 12.5 11.6 -7.4 
(23.4) (25.1) 
Local 3,812,116 8.8 9.3 8.2 -11.2 4,646,147 10.7 11.3 10.0 -11.8 
(76.6) (74.9) 
High-tech 24,409 5.0 5.2 4.8 -8.3 43,241 8.7 9.5 8.0 -15.5 
(0.5) (0.7) 
Low-tech 4,950,229 8.8 9.3 8.4 -10.0 6,162,933 11.0 11.6 10.4 -10.7 
(99.5) (99.3) 
Manuf. 147,034 6.1 6.2 6.0 -3.2 164,234 6.8 7.0 6.6 -5.3 
(3.0) (2.6) 
Non-manuf. 4,827,604 8.9 9.4 8.4 -10.3 6,041,940 11.2 11.8 10.5 -11.0 
(97.0) (97.4) 
           
Notes: Single establishment births refers to births excluding those part of an enterprise, while all 
establishment births includes all types of births. Entry rates are calculated as the average across the 
years of new firms divided by incumbent firms, in percentages. ∆s refers to the change in entry rates, 
calculated as the difference between rates for 2010-2013 and 2006-2009 divided by the rate for 2006-
2009, in percentages. 
 
such labor pools is well documented, with Marshall (1920) suggesting that a large 
labor market allows for workers to readily shift across employers, thus reducing labor 
market uncertainty. Helsley and Strange (1990) suggest that in addition to these 
benefits, a large labor pool facilitates better matches between firms and workers, 
which would increase firm productivity, while Combes and Duranton (2006) suggest 
83 
 
that entrepreneurs start firms in agglomerated areas due to better access to a suitable 
labor force. To measure the extent to which a particular industry within a given MSA 
is closely matched to the labor market characteristics of the region, I first construct a 
network of industries that are linked based on similarities in occupational composition. 
To do this, I use detailed data on the occupational composition of employment in 
industries taken from the Occupational Employment Statistics (OES) program 
administered by the Bureau of Labor Statistics, pooled across the panel years.26 This 
dataset provides detailed employment patterns for all industries across roughly 800 
occupations, and serves as the baseline for calculating the pairwise proximity between 
two given industries. This proximity measure is analogous to that calculated by Ellison 
et al. (2010), where the pairwise correlations between industries i and j across 
occupations is calculated based on employment shares as 
𝜙𝑙𝑎𝑏𝑜𝑟𝑖𝑗 = 𝐶𝑜𝑟𝑟
𝑠(𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑖𝑜, 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑗𝑜) 
where 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑖𝑜 is the fraction of industry i’s employment in occupation o, and 
𝐶𝑜𝑟𝑟𝑠 refers to Spearman’s rank correlation.27 The mean value for the proximity 
metric is 0.529, with the lowest proximity value being -0.061 between Leather and 
Hide Tanning and Finishing (NAICS 3161) and Colleges, Universities, and 
Professional Schools (NAICS 6113), while the highest value being 0.952 for Electrical 
and Electronic Goods Merchant Wholesalers (NAICS 4236) and Hardware, and 
Plumbing and Heating Equipment and Supplies Merchant Wholesalers (NAICS 4237).  
                                                 
26 I pool across the panel years to reduce the chance of variability due to external factors that are not 
related to actual similarities in employment patterns.  
27 I use Spearman’s correlations over Pearson correlations due to the high skewness of employment 
patterns and to mitigate the effect of outliers in the data. 
84 
 
 As mentioned previously, in order to more accurately consider the aggregate 
labor market characteristics of a region and the indirect linkages between link 
neighbors, I calculate the Eigenvector centrality of each industry based on the pairwise 
proximity measure.28 Eigenvector centrality has a long history within both the 
economics and sociology literature, going back to at least Leontief (1941). In essence, 
this measure of network centrality is based on the simple idea that a node is important 
if it is linked to other important nodes, and differs from degree centrality (i.e. the link 
counts) in that a node that may have a high (low) centrality if it is linked to others who 
are more (less) important, even if its degree is low (high). If a node’s centrality (i.e. 
importance) is proportional to the sum of neighbor’s centralities, this relationship can 
be represented simply as  
𝑐𝑙𝑎𝑏𝑜𝑟
1
𝑖 = ∑𝜙𝑖𝑗𝑐
𝑙𝑎𝑏𝑜𝑟
𝑗    𝑓𝑜𝑟 𝜆 ≠ 0 𝜆
𝑗≠𝑖
where 𝑐𝑙𝑎𝑏𝑜𝑟𝑖  denotes centrality and 𝜆 is a constant. In matrix form, this can be 
represented as 
λ𝐜 = 𝚽𝐜 . 
Hence the centrality vector 𝐜 is the eigenvector of the adjacency matrix 𝚽 associated 
with the eigenvalue λ, which gives the centrality measure its name.29 It is important to 
note that this centrality vector is calculated using the system of equations that are 
represented by this relationship, and thus considers both direct and indirect linkages.30  
                                                 
28 The correlations are rescaled to be between 0 and 1 in order to facilitate calculation of the centrality 
metric. 
29 The standard procedure is to choose λ as the largest eigenvalue such that the eigenvector 
centralities take non-negative values. 
30 I utilize the igraph package available for the R software environment to calculate the centrality 
measure. 
85 
 
Finally, in order to map the aggregate characteristics of the region onto the network, I 
compute the standard location quotient for each MSA-industry pair as 
𝐸𝑖𝑟𝑡
LQ𝑖𝑟𝑡 =   𝐸𝑖𝑡 
where 𝐸𝑖𝑟𝑡 is the share of establishments for industry 𝑖 in region 𝑟 at time 𝑡, and 𝐸𝑖𝑡 is 
the share of establishments for industry 𝑖 at time 𝑡 for the US. Then, the index for 
labor market proximity for a particular MSA-industry pair is calculated as  
∑ LQ ∙ 𝑐𝑙𝑎𝑏𝑜𝑟 ∙ 𝜙𝑙𝑎𝑏𝑜𝑟
𝑙𝑎𝑏𝑜𝑟 𝑗≠𝑖 𝑗𝑟𝑡 𝑗 𝑖𝑗PROX𝑖𝑟𝑡 = 𝑙𝑎𝑏𝑜𝑟   . ∑𝑗≠𝑖 LQ𝑗𝑟𝑡 ∙ 𝑐𝑗
This metric can be thought of as the weighted average proximity between industry i 
and the rest of the industry space, where the weights are proportional to both industry 
specialization patterns in the region and the centrality of the other industries.31 The 
highest calculated labor market proximity was 0.901 for Other Textile Product Mills 
(NAICS 3149) in the Dalton, GA MSA in year 2012, and the lowest proximity was 
0.540 for Colleges, Universities, and Professional Schools (NAICS 6113) in the same 
MSA in year 2010.  
3.3.3 Customer supplier linkages 
Another key reason why firms agglomerate is to reduce transportation costs in 
obtaining inputs or in shipping goods to customers. The concentration of firms in a 
region enables them to share a pool of suppliers while at the same time be closer to 
customers. Following previous work (Ellison, Glaeser, and Kerr 2010; Jofre-Monseny, 
Marín-López, and Viladecans-Marsal 2011), I utilize data from the 2007 Benchmark 
Input-Output Accounts published by the Bureau of Economic Analysis (BEA) 
                                                 
31 The eigenvector centrality is scaled such that it takes values between 0 and 1. 
86 
 
aggregated to the 4-digit industry level to calculate the pairwise proximity metric 
between two given industries based on customer-supplier relations.32 Specifically, the 
input-output proximity between two industries is calculated as 
𝜙𝐼𝑂𝑖𝑗 = max{ 𝐼𝑛𝑝𝑢𝑡𝑖←𝑗 , 𝐼𝑛𝑝𝑢𝑡𝑗←𝑖 , 𝑂𝑢𝑡𝑝𝑢𝑡𝑖→𝑗 , 𝑂𝑢𝑡𝑝𝑢𝑡𝑗→𝑖} 
where 𝐼𝑛𝑝𝑢𝑡𝑖←𝑗 is the share of industry i’s inputs that come from industry j, and 
𝑂𝑢𝑡𝑝𝑢𝑡𝑖→𝑗 is the share of industry i’s outputs sold to industry j.
33 The maximum of 
the four values corresponding to input and output flows is used due to asymmetries 
within and across input and output measures, as well as the fact that many industries 
report output sales to only itself (especially for non-manufacturing industries), which 
in such cases creates isolates (i.e. nodes with no links) within the network. Even so, a 
bulk of the pairwise industry proximities exhibited a value close to zero, while the 
highest proximity value was 0.742 between Iron and Steel Mills and Ferroalloy 
Manufacturing (NAICS 3311) and Steel Product Manufacturing from Purchased Steel 
(NAICS 3312). 
 The eigenvector centralities are calculated in an analogous manner to that of 
the labor market pooling metric, using the proximities defined above. Then the index 
for input-output proximity for a particular MSA-industry pair is calculated as 
 
∑𝑗≠𝑖 LQ𝑗𝑟𝑡 ∙ 𝑐
𝐼𝑂 𝐼𝑂
𝐼𝑂 𝑗
∙ 𝜙𝑖𝑗
PROX𝑖𝑟𝑡 =  ∑𝑗≠𝑖 LQ ∙ 𝑐
𝐼𝑂
𝑗𝑟𝑡 𝑗
                                                 
32 Specifically, I use the Make and Use tables (after redefinitions) to calculate the input-output 
proximities. 
33 These shares are calculated based on total industry input and output, which includes final demand 
and the government. 
87 
 
where 𝑐𝐼𝑂𝑗  is the eigenvector centrality based on input-output linkages. Again, this 
metric can be thought of as the weighted average proximity between a reference 
industry i and the rest of the industry space. The highest observed input-output 
proximity was 0.400 for Petroleum and Coal Products Manufacturing in the Midland, 
TX MSA in year 2005, while the lowest observed proximity was 0.0004 for Other 
Investment Pools and Funds (NAICS 5259) in the Morristown, TN MSA for the same 
year. 
3.3.4. Knowledge spillovers 
Despite their importance, knowledge spillovers are notoriously difficult to measure. 
They encompass many different areas of both economics and sociology, including 
growth theory and theories related to human capital accumulation. Unlike input 
sharing, knowledge spillovers are inherently a non-market exchange where the product 
is not bought or sold, and even in the case where there is an exchange, it is most likely 
to be a complicated venture between a variety of institutions (Rosenthal and Strange 
2004). Previous studies have mainly relied on direct evidence of knowledge spillovers 
through patent citation data (for example Glaeser and Kerr 2009; Jaffe, Trajtenberg, 
and Henderson 1993), or through Scherer’s (1984) technology matrix which measures 
R&D activity flows between industries. However, such sources only reflect flows of 
knowledge at the highest level, and arguably do not well represent the idea Marshall 
had in mind when mentioning how Sheffield cutlery workers took advantage of the 
secrets of their trade available as local public goods (Rosenthal and Strange 2004). 
Furthermore, in the case of patent citation data, the classification of industries only 
encompasses manufacturing industries, which renders related metrics useless when 
88 
 
considering industries in other sectors such as services or trade. The use of Scherer’s 
technology matrix is also difficult to justify in that it is based on data that predates the 
current study by over 30 years. 
 To overcome these challenges, I calculate two related metrics for knowledge 
spillovers that take into consideration the opportunities for knowledge exchange 
within a given region. As such, the metric is not region-industry specific, but rather 
one that is defined for a particular region as a whole.34 I argue that such a regional 
metric is more closely related to both Marshall’s and Jacobs’ view of how knowledge 
spillover takes place. It is important to note that as mentioned previously, knowledge 
spillover theories overlap considerably with theories of community social capital, 
especially of the bridging type that considers important a diverse set of interactions 
within a region which are conducive to the accumulation of a wide variety of non-
redundant information. As such, I view the derived metrics for knowledge spillovers 
also as a proxy for the bridging social capital of a region.  
 For the first metric, I make use of a detailed dataset on the universe of 
nonprofit organizations within the US collected as a collaboration between the 
National Center for Charitable Statistics (NCCS) and the Internal Revenue Service 
(IRS).35 This dataset provides detailed classifications as well as geographic data for all 
nonprofit organizations that are catalogued with the IRS for which forms 501(c) are 
filed. Nonprofits are classified into roughly 600 mutually exclusive groups based on 
                                                 
34 Other studies, such as Rauch (1993), consider average levels of education as a proxy for knowledge 
spillover capacity. 
35 The data are not publicly available, but are disclosed to qualified researchers at a minimal cost. 
Detailed information on the dataset as well as the classification scheme is provided at 
http://nccs.urban.org/Learn-About-NCCS-Data.cfm. 
89 
 
the National Taxonomy of Exempt Entities (NTEE) and encompass a broad variety of 
organizations ranging from arts and humanities organizations to religious 
congregations. To measure knowledge spillover capacity, within the set of nonprofits I 
only consider organizations that are classified within the education, medical research, 
science and technology, and social science groups, as well as those that report as being 
research institutes or organizations that conduct public policy analysis.36 This resulted 
in a total of 127 organization types being included. While this set of organizations is 
admittedly limited, most educational institutions (including those that are outside the 
formal educational system) are nonprofits, and nonprofits that primarily engage in 
research activity are likely to be involved in activities within a region for which 
informal knowledge spillover takes place. Furthermore, these organizations include 
those that primarily engage in charitable giving (such as grants) within the knowledge 
sector, which is another important form of spillover activity. As such, the presence of 
these organizations in a region should proxy for informal knowledge spillovers that 
are not captured by higher level measures such as patents, which are mainly filed for 
commercial gain.37 
 I define an organizational network for the informal knowledge sector where the 
different organization types are the nodes, and the linkages are defined as the revealed 
agglomeration patterns of these organizations across the MSAs. Specifically, I define 
                                                 
36 These correspond to NTEE major codes B (education), H (medical research), U (science and 
technology), and V (social science), as well as nonprofits that are classified under the subgroup 
“Research Institutes & Public Policy Analysis” under all major codes. 
37 Nonetheless, the metrics for knowledge spillover are weaker than that for input-output linkages or 
labor pooling, and as such it is anticipated that their relationship with entrepreneurship will be 
weaker. Furthermore, these metrics are not mutually exclusive to the input-output or labor pooling 
metrics, as knowledge spillovers may occur through input-output or labor market interactions. 
90 
 
the pairwise proximity between any two organization groups as the correlation across 
regions of establishment shares, or 
𝑖𝑛𝑓
𝜙 𝑠𝑖𝑗 = 𝐶𝑜𝑟𝑟 (𝐸𝑖𝑟 , 𝐸𝑗𝑟) 
where 𝐸𝑖𝑟 is the establishment share of group i in region r.
38 The idea is that if two 
organization types are closely related due to similar interests or requirements of 
physical factors, information, or technology, they will tend to be located in tandem, 
while dissimilar types will be less likely to be collocated within the same region.39 
Measured this way, the highest calculated pairwise proximity was 0.468 for Eye 
Diseases, Blindness & Vision Impairments Research (NTEE H41) and Surgical 
Specialties Research (NTEE H9B), while the lowest was surprisingly for Scholarships 
& Student Financial Aid (NTEE B82) and Student Sororities & Fraternities (NTEE 
B83) with a value of -0.456.  
 To capture the density of knowledge spillover linkages within a region, I 
calculate a network density measure that captures the average proximity between all 
organization groups, weighted by the relative regional presence of these organizations 
in the region, where 
𝑖𝑛𝑓 𝑖𝑛𝑓
DEN𝑟𝑡 =∑∑𝐸𝑖𝑟𝑡 ∙ 𝐸𝑗𝑟𝑡 ∙ 𝜙𝑖𝑗   . 
𝑖 𝑗
Higher density values would suggest that there is less knowledge spillover capacity 
within a region, as the knowledge pool would be limited to relatively like-minded 
                                                 
38 As for the labor pooling metric, I pool across the panel years to reduce variability due to erroneous 
external shocks. 
39 In this sense it is related in concept to the Ellison-Glaeser (EG) index of industry coagglomeration 
(Ellison, Glaeser, and Kerr 2010). 
91 
 
organizations.40 The lowest density value was 0.405 for Fort Collins-Loveland, CO in 
year 2008, while the highest value was 0.542 for St. George, UT in the same year.  
 The second knowledge spillover metric aims to capture the formal type of 
spillovers that are mainly due to commercialization of knowledge. I utilize annual 
MSA level patenting data from the US Patent and Trademark Office (USPTO), where 
patents are classified based on a detailed system that includes 473 patent 
classifications. To create a measure of the knowledge pool of the region, I constructed 
moving totals for patenting activity beginning with the year 2000, which was the 
earliest year for which MSA level data were available. Thus the sum of accepted 
patents for the years 2000 to 2005 represent the knowledge pool for the year 2005 (the 
beginning of the panel), and the sum for years 2001 to 2006 correspond to 2006 levels, 
𝑓𝑜𝑟
and so forth. The pairwise proximity and density (DEN𝑟𝑡 ) metrics were calculated 
analogously as for the informal knowledge spillover metric, where the underlying 
network consisted of the patent classes as nodes and correlations (proximities between 
patent classes) as linkages. The highest pairwise proximity was 0.787 for Active solid-
state devices (class 257) and Semiconductor device manufacturing (class 438), while 
the lowest was -0.120 for Static structures (class 052) and Single-crystal, oriented-
crystal, and epitaxy growth processes (class 117). The highest density value was 0.736 
for Burlington-South Burlington, VT in year 2012, and the lowest was 0.557 for 
Valdosta, GA in year 2010. 
 
3.3.5 Bonding social capital 
                                                 
40 Again, the correlations are rescaled to take values between 0 and 1. 
92 
 
In order to capture the bonding social capital of a region, I again utilized the NCCS 
nonprofit dataset, in this case excluding the organizations that were used to construct 
the informal knowledge spillover metric. In addition, I excluded some organization 
groups that did not directly relate to community activity, such as public utilities and 
transportation systems, corporate foundations, insurance providers, or other pension or 
retirement funds.41 This resulted in a total of 467 nonprofit groups, for which the 
pairwise proximities were calculated, analogous to the metrics for knowledge 
spillovers. The highest proximity was 0.605 for Christianity (NTEE X20) and 
Protestant (NTEE X21) congregations, while the lowest was -0.459 for private 
independent charities (NTEE T22) and community service clubs (NTEE S80). For the 
density metric (DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡 ), the highest value was 0.535 for Winston-Salem, NC in 
year 2008, while the lowest value was 0.482 for Augusta-Richmond County, GA-SC 
in year 2012. Contrary to the knowledge spillover metrics that calculate the bridging 
social capital of a region, according to social capital theory, a higher density value 
would be suggestive of a stronger community for which more social interactions and 
face-to-face contact are present. 
3.3.6 Other factors 
In addition to the key metrics described above, I included other factors that could 
influence entrepreneurship. Most importantly, following prior work (Delgado, Porter, 
                                                 
41 Specifically, corporate foundations (NTEE T21), public transportation systems (NTEE W40), public 
utilities (NTEE W80), insurance providers (NTEE Y20), state sponsored worker compensation 
reinsurance organizations (NTEE Y25), pension and retirement funds (NTEE Y30), teacher’s retirement 
fund associations (NTEE Y33), employee funded pension trusts (NTEE Y34), and multi-employer 
pension plans (NTEE Y35) were excluded. I also excluded unclassified organizations and organizations 
for which the headquarters were outside of the US. 
93 
 
and Stern 2010; Glaeser et al. 1992; Porter 2003), I included the location quotient 
(LQ𝑖𝑟𝑡) of the MSA-industry pair as a measure of specialization. This measure proxies 
for the degree to which the industry is over-represented in the MSA, and is theorized 
to have a strong positive effect on entry rates for a given MSA-industry pair. In 
addition, I included the total number of nonprofit organizations per capita (NPPC𝑟𝑡) as 
well as the total number of patents per capita (PATPC𝑟𝑡) to distinguish aggregate size 
effects of nonprofit organization presence and patent activity with the effects of the 
metrics described in the previous section.  
 I also included several control variables that have been theorized to impact 
new firm entry. I included industrial diversity, calculated as the Hirschman-Herfindahl 
index (HHI𝑟𝑡) based on 4-digit industry establishment counts for a given MSA. Most 
notably, Jacobs (1969) and Glaeser et al. (1992) suggest that industrial diversity may 
be a potential source of knowledge spillovers, in which case it would positively affect 
entrepreneurship. However, higher diversity may also represent the lack of strong 
clusters (Porter 2003) and localization economies, in which case the effect would be 
negative. Thus the direction of the relationship between diversity and entrepreneurship 
is ambiguous. 
 In addition, I included a metric of market access (MA𝑟𝑡) that proxies for the 
relative size of markets in neighboring regions. Higher market access would mean a 
higher potential for opportunities for interactions with neighboring regions, yet it may 
also result in the crowding out of local industries due to regional competition. As such, 
its relationship with entrepreneurship is also ambiguous. I calculate market access as  
94 
 
POP𝑠𝑡
MA𝑟𝑡 =∑  𝑑2𝑟𝑠
𝑠≠𝑟
where POP𝑠𝑡 is the population of the neighboring region and 𝑑
2
𝑟𝑠 is the square of the 
distance between the centroids of the MSAs. I set a threshold value of 300 miles in 
calculating this metric to reflect a reasonable distance for which motor vehicle travel 
could possibly occur within a day’s journey (Mukim 2014).42  
I also included homeownership rates (HOME𝑟𝑡) as well as the percentage of 
the population that is foreign born (FOREIGN𝑟𝑡) to control for other possible social 
factors that may confound the key variables of interest. Glaeser (2001) suggests that a 
significant determinant of social capital within a region is homeownership rates, for 
homeownership may create direct financial incentives for investment in social capital. 
The percentage of foreign born population has also been viewed to influence not only 
the social capital of a region (Wilson and Portes 1980), but also to be directly 
correlated with entrepreneurship rates (Reynolds and Curtin 2009), for foreigners are 
documented to exhibit significantly higher rates of entrepreneurship compared to 
native groups. Finally, I included the standard population (POP𝑟𝑡), per capita income 
(PCINC𝑟𝑡), and educational attainment (EDUC𝑟𝑡, measured as the percentage of the 
population 25 to 65 with a bachelor’s degree or higher) variables to control for other 
MSA level characteristics that may impact entrepreneurship rates.43 All demographic 
variables were calculated using  American Community Survey (ACS) 1 year 
                                                 
42 The centroids and their distances are calculated using county distance data housed within the 
National Bureau of Economic Research (NBER) database. MSA centroids are calculated as the centroid 
of the county within the MSA that was home to the largest fraction of the MSA population.  
43 Glaeser (2001) also points out that education levels are strong predictors of social capital, for more 
educated individuals are likely to invest more in their social connections. 
95 
 
Table 3.2. Select descriptive statistics for variables 
 N = 803,136 
Std 
Variables Mean Min Max 
Dev. 
     
Dependent variables (count of births)     
𝑠𝑖𝑛𝑔𝑙𝑒
Births of single establishments (B ) 4.877 32.30 0 2,911 𝑖𝑟𝑡
Births of all establishments (B𝑎𝑙𝑙) 6.143 36.06 0 3,034 𝑖𝑟𝑡
     
Industry by MSA by year characteristics     
Location quotient (LQ ) 1.069 2.183 0 184.7 𝑖𝑟𝑡
Labor market proximity (PROX𝑙𝑎𝑏𝑜𝑟) 0.776 0.0568 0.540 0.901 𝑖𝑟𝑡
Input-Output proximity (PROX𝐼𝑂 ) 9.32E-03 0.0107 4.23E-04 0.400 𝑖𝑟𝑡
     
MSA by year characteristics     
𝑖𝑛𝑓
Informal knowledge spillovers (DEN ) 0.475 0.0219 0.405 0.542 𝑟𝑡
𝑓𝑜𝑟
Formal knowledge spillovers (DEN ) 0.604 0.0257 0.557 0.736 𝑟𝑡
Bonding social capital (DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡 ) 0.506 0.0041 0.482 0.535 
Nonprofit organizations, per 1,000 
4.691 1.548 1.150 18.02 
(NPPC𝑟𝑡) 
Accepted patents, per 1,000 (PATPC𝑟𝑡) 1.442 2.346 0.0305 28.58 
     
Controls     
Industrial diversity (HHI ) 0.0151 0.00189 0.0120 0.0307 𝑟𝑡
Market access (MA𝑟𝑡) 2,455 2,161 51.09 18,289 
Population (POP𝑟𝑡) 710,705 1.58E+06 69,922 1.92E+07 
Per capita income (PCINC𝑟𝑡) 36,133 7,343 17,881 97,392 
Educational attainment, % (EDUC𝑟𝑡) 25.44 8.025 10 59.10 
Foreign born population, % (FOREIGN𝑟𝑡) 7.695 6.771 0.460 38.82 
Homeownership rate, % (HOME𝑟𝑡) 67.19 5.687 47.41 85.24 
Unemployment rate, % (UNEMP𝑟𝑡) 7.040 2.987 2.017 28.90 
     
96 
 
estimates44 while per capita income was calculated using BEA Regional Economic 
Accounts. Table 3.2 presents select descriptive statistics for the variables included in 
the analysis (the correlation matrix for the variables is presented in Appendix A). It is 
noteworthy to mention that the dependent variable (count of firm births) is highly 
skewed, with roughly 65% of the MSA-industry-year observations exhibiting birth 
counts of zero. This raises technical issues related to the econometric specification of 
the model, for OLS estimation based on the logged values of the dependent variable 
cannot be fruitfully carried out as most observations are dropped when logged values 
are used. Furthermore, common transformations – such as adding 1 and subsequently 
taking the logs – are also troublesome due to the high percentage of zero values. The 
next section highlights these issues and the estimation strategy used for the empirical 
analysis. 
 
3.4. Empirical framework 
3.4.1 Model specification 
I based the estimation framework on a location choice model of entrepreneurs where 
an establishment is born when it is possible to earn non-negative profits within a 
region, taking the existing economic environment as given (see Rosenthal and Strange 
(2003) for a review of the model). In this sense, regional characteristics which increase 
productivity will result in higher levels of establishment births, and entrepreneurs 
                                                 
44 In cases where 1 year estimates at the county level were unavailable, I utilized 3 year ACS estimates. 
In cases where county level data was not available throughout the panel years, I utilized MSA level 
data. This resulted in a minor source of measurement error, for MSA definitions are not constant for 
the ACS over the panel years. However, the number of MSAs affected was very small, and even in 
these cases the definition changes mostly applied to smaller counties within a given MSA, rendering 
these errors minimal. 
97 
 
compare profitability across locations. It is assumed that location and decisions to 
found a new establishment are made at time t – 1, and establishments are born in the 
subsequent time period t. Since the main outcome variable of interest is the count of 
new establishments in a MSA-industry pair at time t, Poisson estimates of the 
coefficients can also be given a random profit maximization interpretation 
(Guimaraes, Figueirdo, and Woodward 2003; Jofre-Monseny, Marín-López, and 
Viladecans-Marsal 2011). Hence the baseline specification of the model is 
𝐸(B 𝑙𝑎𝑏𝑜𝑟 𝐼𝑂𝑖𝑟𝑡) = exp (𝛼 + 𝛽0𝐵𝐷𝑈𝑀𝑖𝑟0 + 𝛽1LQ𝑖𝑟𝑡−1 + 𝛽2PROX𝑖𝑟𝑡−1 + 𝛽3PROX𝑖𝑟𝑡−1
𝑖𝑛𝑓 𝑓𝑜𝑟
+ 𝛽5DEN𝑟𝑡−1 + 𝛽6DEN𝑟𝑡−1 + 𝛽 DEN
𝑠𝑜𝑐𝑖𝑎𝑙
4 𝑟𝑡−1 + 𝛽7NPPC𝑟𝑡−1
+ 𝛽8PATPC𝑟𝑡−1 + X𝑟𝑡−1𝜸 + I𝑖 + R𝑟 + T𝑡) 
where I𝑖, R𝑟, and T𝑡 are the set of industry, MSA, and year fixed effects (for a total of 
646 fixed effects in the model) and X𝑟𝑡−1 is the set of control variables described in 
the previous section. Following Blundell et al. (1995) and Delgado et al. (2010), I also 
include an indicator variable for any pre-existing start-up activity for the years 2003 
and 2004 (𝐵𝐷𝑈𝑀𝑖𝑟0) to control for additional unobservable characteristics of MSA-
industry pairs which may impact establishment births. All of the explanatory variables 
are logged, and standardized to have mean zero and unit standard deviation to aid 
interpretation (Ellison, Glaeser, and Kerr 2010; Glaeser and Kerr 2009).45  
 As mentioned previously, high skewness and the large number of zero births 
for MSA-industry pairs presents a problem in linear estimation, which is one reason 
                                                 
45 Some of the MSA-industry pairs have zero values for the location quotients, and as such I sum 1 with 
the location quotient values before log transformation. Also, a dummy variable that indicates whether 
LQ𝑖𝑟𝑡−1 = 0 was also included. 
98 
 
why the Poisson model is preferred. Furthermore, other non-linear models such as the 
Tobit or Negative Binomial suffer from the incidental parameters problem, where a 
large number of fixed effects leads to inconsistent estimation of the parameters under 
fixed 𝑇, 𝑁 → ∞ asymptotics, since the number of parameters that need to be estimated 
grows arbitrarily large (Chamberlain 1980; Hsiao 1986). The Poisson model does not 
suffer from such bias, its consistency does not rest on additional assumptions 
concerning the distribution of the dependent variable with respect to the covariates 
(unlike the negative binomial), and the mean-variance equality restriction of the 
Poisson model may be relaxed using fully robust standard errors clustered at the panel 
level (Cameron and Trivedi 2013; Wooldridge 2010). Nonetheless, as has been 
previously noted (Rosenthal and Strange 2003), the problem of noisy estimates of 
fixed effects decreases as the number of observations per fixed effect grows large (in 
this case over 2,000), and thus as a robustness check I run the same regression using a 
fixed effects Probit model with a dummy for positive or zero births as the dependent 
variable.46  
3.4.2 Endogeneity concerns 
It is important to note that there may be a number of other explanations for variations 
in new establishment counts in a particular MSA-industry pair. Most importantly, 
                                                 
46 I also conducted a preliminary analysis using a zero-inflated negative binomial regression (not 
reported), utilizing the Chamberlain-Mundlak Conditionally Correlated Random Effects (CCRE) model 
with cluster means in place of the fixed effects to check whether the large number of zeros and 
possible overdispersion affected the results. However, results were qualitatively similar to that of the 
Poisson model, which suggests that the large number of fixed effects and relevant controls adequately 
explained the excess zeros and apparent overdispersion in the count data. In addition, due to the large 
number of observations and variables as well as the complexity of two-step models, this method 
generated convergence problems in the estimation routines, which is why the Poisson model was 
preferred. 
99 
 
natural advantages of a region (such as proximity to an airport, rivers, or the sea) 
should positively impact establishment births regardless of Marshallian forces 
(Ellison, Glaeser, and Kerr 2010) or the social capital of a region, which may result in 
the agglomeration of new firms being the cause, and not the result of various inter-
industry relations. Other difficult to measure factors such as the business culture of a 
region may also impact entry of new businesses. I include the full range of MSA, 
industry, and year fixed effects together to control for such unobservables. MSA fixed 
effects control for time-invariant characteristics such as natural endowments, climate, 
or other geographic features of the region, while industry fixed effects control for 
industry characteristics that are constant over time. Year fixed effects control for time-
specific shocks such as macroeconomic conditions or business cycles, which is 
especially important in this setting due to the most recent recessionary years being 
included in the analysis.  
Using the count of new firms as the dependent variable also partially addresses 
omitted variables and simultaneity biases. Rosenthal and Strange (2003) point out that 
entrepreneurs are unconstrained by previous decisions and make location choices 
taking the existing economic environment as exogenously given. Furthermore, Becker 
and Henderson (2000) point out that time persistent location determinants can be 
successfully controlled for by conditioning firm births on the stock of pre-existing 
firms, which is done in this setting with the inclusion of the location quotient (as well 
as the number of nonprofit organizations). For the social capital variables, I explicitly 
consider the opportunity for social interactions through the existing characteristics of 
nonprofit organizations in the region, rather than through other variables such as 
100 
 
measures of trust or reciprocity (possibly obtained from survey data) which have been 
criticized to be vague and plagued with endogeneity issues (Durlauf 2002; Glaeser 
2001). Thus these social variables can be viewed in the same perspective as other 
industry characteristics, and suffer less from endogeneity. Nonetheless, I lack enough 
variation in the data to include MSA by industry fixed effects, and am unable to fully 
control for omitted variables at the MSA-industry level (such as city policies that favor 
specific industries).47 However, the inclusion of the indicator variable for start-up 
activity pre-dating the panel years is hoped to soak up a portion of these 
unobservables. Overall, even with the careful selection of control variables and arsenal 
of fixed effects, I am cautious to interpret the results as evidence of causality and 
rather interpret them as partial correlations. 
3.5. Results 
3.5.1 All industries  
I first report the baseline results for the Poisson and Probit models, including all 
industries as well as the full set of fixed effects. The Poisson estimates can be 
interpreted as a 𝛽 × 100 percent increase in the count of new establishments for a 1 
standard deviation increase in an explanatory variable. In other words, a 1 standard 
deviation increase in labor market proximity (PROX𝑙𝑎𝑏𝑜𝑟𝑖𝑟𝑡 ) for a MSA-industry pair 
would increase the count of new establishment births by 16.2% (Table 3.3). The Probit 
model reports average marginal effects, and can be interpreted as a 𝛽 change in the 
probability of positive births for a 1 standard deviation increase in an explanatory   
                                                 
47 This also presents a problem in estimation as the inclusion of MSA by industry fixed effects excludes 
nearly half of the observations, for the fixed effects perfectly predict outcomes for MSA-industry pairs 
that experience zero births throughout the panel years. 
101 
 
Table 3.3. Births of single (start-up) and all establishments 
 Poisson  Probit 
Single start-ups All start-ups Single start-ups All start-ups 
Variables 
 (1)  (2)  (3)  (4) 
          
𝐵𝐷𝑈𝑀𝑖𝑟0 0.125*** 0.091*** 0.019*** 0.023*** 
 (0.009) (0.008) (0.001) (0.001) 
LQ𝑖𝑟𝑡 0.721*** 0.714*** 0.062*** 0.066*** 
 (0.008) (0.007) (0.001) (0.001) 
PROX𝑙𝑎𝑏𝑜𝑟𝑖𝑟𝑡  0.162*** 0.115*** 0.062*** 0.065*** 
 (0.030) (0.023) (0.003) (0.003) 
PROX𝐼𝑂𝑖𝑟𝑡 0.004 0.015 0.012*** 0.013*** 
 (0.013) (0.010) (0.001) (0.001) 
𝑖𝑛𝑓
DEN𝑟𝑡  -0.039*** -0.041*** -0.003* -0.003* 
 (0.008) (0.008) (0.002) (0.002) 
𝑓𝑜𝑟
DEN𝑟𝑡  -0.018* -0.006 -0.003 -0.004* 
 (0.010) (0.010) (0.002) (0.002) 
DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡  0.059*** 0.060*** 0.003** 0.004*** 
 (0.006) (0.006) (0.001) (0.001) 
NPPC𝑟𝑡 -0.056*** -0.065*** -0.006* -0.008** 
 (0.013) (0.013) (0.003) (0.003) 
PATPC𝑟𝑡 0.079*** 0.083*** 0.013*** 0.013*** 
 (0.015) (0.015) (0.003) (0.003) 
HHI𝑟𝑡 0.039*** 0.035*** 0.003* 0.001 
 (0.006) (0.007) (0.002) (0.002) 
MA𝑟𝑡 -0.574*** -0.440*** -0.026 -0.071** 
 (0.149) (0.149) (0.033) (0.034) 
POP𝑟𝑡 -0.086 -0.106 0.007 0.027 
 (0.106) (0.105) (0.026) (0.026) 
PCINC𝑟𝑡 0.074*** 0.089*** 0.004 0.007** 
 (0.011) (0.011) (0.003) (0.003) 
EDUC𝑟𝑡 -0.020** -0.016* 0.000 0.002 
 (0.008) (0.008) (0.002) (0.002) 
FOREIGN𝑟𝑡 -0.051*** -0.044*** -0.001 -0.002 
 (0.009) (0.009) (0.002) (0.002) 
HOME𝑟𝑡 -0.002 -0.001 0.001 0.002 
 (0.006) (0.006) (0.001) (0.001) 
UNEMP𝑟𝑡 -0.018*** -0.010 0.003** 0.003* 
 (0.007) (0.006) (0.001) (0.002) 
     
Fixed effects V V V V 
Observations 803,136 803,136 803,136 803,136 
     
Notes: Columns 3 and 4 report average marginal effects. To aid convergence (especially with 
likelihoods of very large magnitude), the dependent variables for the Poisson models are divided by 
1E+06. This has no effect on the parameter estimates nor the standard errors, and only affects the 
absolute magnitudes of the log likelihoods. Relative magnitudes among Poisson models remain 
relevant. In parentheses are panel robust standard errors clustered at the MSA by industry level. All 
variables are logged, and are standardized to have unit standard deviation to aid interpretation.  
*** p<0.01, ** p<0.05, * p<0.1 
102 
 
variable. Thus for example, a 1 standard deviation increase in input-output proximity 
would increase the probability of new establishment births by 0.012. The explanatory 
variables are standardized, and thus I am able to compare the relative magnitude of 
their effects on the dependent variables of interest.  
 For both the Poisson and Probit models, it can be seen that specialization 
(LQ𝑖𝑟𝑡) within a given MSA-industry pair is strongly associated with an increase in the 
count of new establishments, regardless of whether only single unit start-ups or all 
start-ups are considered. In the Poisson model, the effect of specialization is much 
stronger than the other variables of interest, and this strong relationship is consistent 
with previous work suggesting that strongly specialized clusters are conducive to start-
up activity (Delgado, Porter, and Stern 2010; Porter 1998). The strong negative impact 
of the market access (MA𝑟𝑡) variable across specifications suggests that in general the 
crowding out effects due to a large market outside of the region overwhelms the 
benefits from gaining more opportunities for cross-regional interactions. Surprisingly, 
the percentage of foreign born population in the region (FOREIGN𝑟𝑡) is seen to 
negatively impact new establishment births. However, pooled regressions without 
MSA fixed effects (not reported here) show this variable to be strongly positively 
correlated with new firm formation. This suggests that while MSAs with more 
foreigners do experience more establishment births, when considering within MSA 
variation this effect is negative (i.e. an increase in foreigners within a region results in 
lower establishment birth counts). Overall, while the coefficient signs and relative 
magnitudes are generally similar, the estimates for the Probit specifications are much 
less precisely estimated compared to their Poisson counterparts, suggesting that much 
103 
 
information is lost when considering establishment births of any magnitude to be 
equal. 
 When comparing the relative strength of the Marshallian factors, it can be seen 
that labor market proximity is the dominating force. This is consistent with previous 
work (Jofre-Monseny, Marín-López, and Viladecans-Marsal 2011; Rosenthal and 
Strange 2001) which find that labor market pooling exerts the most robust effect, 
compared to input-output linkages or knowledge spillovers. Surprisingly, input-output 
proximity (PROX𝐼𝑂𝑖𝑟𝑡) is insignificant in the Poisson specification while strongly 
significant in the Probit model, suggesting that different mechanisms for input-output 
linkages govern birth probabilities as opposed to the count of new firm births. The 
insignificance of the input-output proximity variable in the Poisson specification can 
be explained when considering that the dataset encompasses all types of industries. 
Observing the underling BEA Input-output accounts data, it can be seen that only the 
manufacturing industries are largely dependent on customer-supplier connections, 
while other industries are not so reliant on such linkages. 
 The knowledge spillover and social capital variables are consistent with 
agglomeration and social capital theory. The estimates imply that more diverse 
knowledge pools (more bridging social capital) positively affect new firm formation 
𝑖𝑛𝑓 𝑓𝑜𝑟𝑚
(captured by the negative coefficients for the DEN𝑟𝑡  and DEN𝑟𝑡  variables), and 
that strong local ties within the community (DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡 ) also benefit entrepreneurship. 
The effect on new establishment births is stronger for the informal spillover metric, 
which could be due to the weakness of the formal knowledge spillover metric which 
only considers the highest level of knowledge creation captured by patenting activity. 
104 
 
However, an increase in the number of patents per capita (PATPC𝑟𝑡) is seen to be 
strongly correlated with new firm formation, which may reflect the positive effect of 
innovative firms and organizations within the region. The negative coefficient for the 
nonprofit organizations per capita variable (NPPC𝑟𝑡) suggests that the benefits that 
arise due to more associational activity are more than offset by crowding out effects. 
Since most nonprofits have no employees and thus are not included in 
entrepreneurship counts (as the SUSB only considers employer firms), more 
nonprofits within a region may be the result of potential entrepreneurs deciding to 
become self-employed in the nonprofit sector, as opposed to founding new 
establishments with paid employees.  
Considering that the US economy experienced much change during and after 
the recent global recession, it is noteworthy to consider whether the relative effects of 
the key variables are robust for different time periods. Thus I run a model (Appendix 
B) in which I include a full set of interaction terms between a dummy equal to 1 for 
years 2009 to 2012 and each of the right-hand side variables, excluding the fixed 
effects. This results in the non-interacted variables corresponding to the coefficient 
estimates for the years prior to the recession (2005 to 2008), while the interaction 
terms correspond to the differences in the coefficient estimates for the years prior to 
and after the recession period. The coefficients for the years 2009 to 2012 are 
calculated by summing the coefficients for the non-interacted and interacted terms.48 
The general results are similar to that for the aggregate regression in the previous 
                                                 
48 This procedure allows for the relative magnitudes of the coefficients to be compared across 
different groups, as well as testing for the significance in the difference of coefficient estimates across 
these groups. 
105 
 
section. As an additional robustness check, I also ran the aggregate model from Table 
3.3 excluding the years 2009 to 2011, to check if the exclusion of the recessionary 
period affected overall results (Appendix C). Again, the results were qualitatively 
identical to that of the model including all panel years. Thus, the general conclusion 
that labor market pooling, knowledge spillovers, and community social capital are 
important for all industries across the board regardless of economic conditions holds.  
3.5.2 Traded versus local industries 
I now turn to a distinction in industry types that has interested many agglomeration 
scholars; namely that between traded versus local industries (Delgado, Porter, and 
Stern 2016; Porter 2003). Traded industries are those that are theorized to be more 
geographically concentrated, serving outer markets and producing goods and services 
that are sold outside of the region. Local industries mainly serve local markets, and are 
thus driven by local demand factors suggesting that their distribution is more even and 
proportional to the size of local markets. Previous studies suggest that traded 
industries, while being the minority, provide for a disproportionate amount of 
employment while also awarding employees with higher wages (Porter 2003). As 
such, local industries are generally thought to be of lesser significance when 
considering the economic performance of regions. Nonetheless, local industries 
generate the bulk of new firms (see Table 3.1), and as such studying the factors that 
drive entrepreneurship in these industries is also of importance. Furthermore, local 
industries encompass those that perform a supporting role to their traded counterparts, 
and as such a healthy supply of new firms from industries of both types should be 
important for regional growth. 
106 
 
To test whether different factors govern the birth of new establishments in the 
traded and local sectors, I subset the industries into these two categories based on the 
definitions provided by Delgado et al. (2016) and the US Cluster Mapping Project 
(USCMP), which define traded and local clusters based on 6-digit NAICS codes.49 As 
I utilize 4-digit NAICS industry codes, a concordance was made in cases where a 
particular 4-digit industry mapped on to more than one cluster definition. Specifically, 
in such cases I assigned the 4-digit industry to the cluster code for which the majority 
of its employment was situated, based on employment data taken from the County 
Business Patterns (CBP). As traded clusters tend to agglomerate more compared to 
their local counterparts, it is expected that traded industries should be more influenced 
by the Marshallian factors. However, the theorized effects of the social capital 
variables are not as clear. 
 Table 3.4 reports the results for the regressions based on these industry 
definitions, where again I utilize interaction terms between an indicator variable for 
traded versus local industries and the right-hand side variables. I present the results for 
both single-establishment (model 1) as well as all types of establishment (model 2) 
births, and I focus only on the key variables of interest for brevity. The results largely 
coincide with previous studies, where for traded industries both the labor market 
proximity and input-output proximity metrics are highly significant with relatively 
large magnitudes. For local industries, only the labor market proximity metric was 
significant, and the difference in estimated coefficients with the traded sector was 
highly significant at -0.176. The results were qualitatively similar when considering all  
                                                 
49 Details on industry classification methodologies are provided at 
http://clustermapping.us/content/cluster-mapping-methodology.  
107 
 
Table 3.4. Births of single (start-up) and all establishments, traded versus local 
industries, Poisson estimates 
 
DV: Count of single establishment 
DV: Count of all establishment births 
births 
(2) 
  (1) 
Variables Traded Local Difference Traded Local Difference 
        
𝐵𝐷𝑈𝑀𝑖𝑟0 0.068*** 0.119*** 0.051*** 0.064*** 0.074*** 0.009 
 (0.012) (0.013) (0.018) (0.010) (0.011) (0.015) 
LQ𝑖𝑟𝑡 0.644*** 0.799*** 0.155*** 0.659*** 0.770*** 0.111*** 
 (0.009) (0.013) (0.016) (0.009) (0.011) (0.014) 
PROX𝑙𝑎𝑏𝑜𝑟𝑖𝑟𝑡  0.262*** 0.086** -0.176*** 0.169*** 0.072*** -0.097** 
 (0.042) (0.034) (0.051) (0.037) (0.027) (0.044) 
PROX𝐼𝑂𝑖𝑟𝑡 0.060*** 0.033 -0.027 0.074*** 0.023* -0.050*** 
 (0.017) (0.020) (0.020) (0.014) (0.012) (0.016) 
𝑖𝑛𝑓
DEN𝑟𝑡  -0.050*** -0.035*** 0.014** -0.045*** -0.039*** 0.006 
 (0.009) (0.008) (0.006) (0.008) (0.008) (0.005) 
𝑓𝑜𝑟
DEN𝑟𝑡  -0.038*** -0.013 0.025*** -0.014 -0.005 0.009 
 (0.011) (0.010) (0.008) (0.011) (0.010) (0.006) 
DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡  0.083*** 0.050*** -0.033*** 0.075*** 0.054*** -0.021*** 
 (0.007) (0.006) (0.005) (0.007) (0.006) (0.004) 
NPPC𝑟𝑡 -0.073*** -0.047*** 0.025*** -0.070*** -0.061*** 0.009 
 (0.014) (0.013) (0.007) (0.014) (0.013) (0.006) 
PATPC𝑟𝑡 0.124*** 0.064*** -0.060*** 0.104*** 0.073*** -0.031*** 
 (0.017) (0.015) (0.011) (0.016) (0.014) (0.009) 
       
Control vars. V   V   
Fixed effects V   V   
Observations 803,136   803,136   
Log L -40.33     -50.53     
       
Notes: See notes for Table 3.3. Both models include the full set of control variables and MSA, 4-digit 
NAICS, and year fixed effects. For both models, the estimates are calculated by adding interaction 
terms between a dummy equal to 1 for traded industries and all right-hand side variables excluding the 
fixed effects. Thus the coefficients for the non-interacted variables coincide with the parameter 
estimates for the reference group (traded), while the coefficients for the interaction terms coincide with 
the differences in parameter estimates between the two groups. The coefficients for the comparison 
group are obtained by summing the parameter estimates for the interacted and non-interacted terms, and 
the standard errors are based on the estimated variance-covariance matrix, using the lincom routine in 
Stata. 
*** p<0.01, ** p<0.05, * p<0.1 
108 
 
types of establishment births (model 2), suggesting that indeed Marshallian factors are 
important determinants of entrepreneurship in the traded industries. 
When considering knowledge spillovers, again both the informal and formal 
spillover metrics exhibited coefficients of larger magnitude for the traded industries. 
The estimated differences in coefficient magnitude between the traded and local 
industries was also highly significant when considering single establishment births, 
although this difference was not so pronounced when considering all types of 
establishment births. This suggests that not only are knowledge spillovers more 
important for the traded industries, but also that bridging social capital in the form of a 
diverse knowledge pool is positively associated with new establishment births. The 
magnitude of the coefficients for the informal and formal knowledge spillover metrics 
are comparable to that for the input-output proximity metric, suggesting that their 
relative importance is non-trivial. Surprisingly the coefficient for the bonding social 
capital metric was comparatively large in magnitude, being even larger than that for 
input-output proximity. Furthermore, bonding social capital was estimated to be more 
important for the traded industry, with a statistically significant difference of 0.033. At 
face value, this suggests that the benefits of a strong local community – such as 
reduced transaction costs due to trust and reciprocity or access to club goods – are 
more important for the traded sector, and is consistent with previous studies that 
consider repeated face to face contact and social interactions as being instrumental in 
shaping economic outcomes (Storper and Venables 2004; Saxenian 1996). 
3.5.3 High-tech versus low-tech entrepreneurship 
109 
 
I also consider whether the relative effects of the Marshallian factors and social capital 
variables differ for the high-tech versus low-tech industries. I again utilize the industry 
definitions provided by Delgado et al. (2016) and the USCMP to index high-tech 
industries, which encompass the aerospace vehicles and defense (cluster code 1), 
biopharmaceuticals (cluster code 5), communications equipment and services (cluster 
code 8), downstream chemical products (cluster code 11), information technology and  
Table 3.5. Births of single (start-up) and all establishments, high-tech versus low-
tech industries, Poisson estimates 
 
DV: Count of single establishment 
DV: Count of all establishment births 
births 
(2) 
  (1) 
Variables High-tech Low-tech Difference High-tech Low-tech Difference 
        
𝐵𝐷𝑈𝑀𝑖𝑟0 0.091*** 0.125*** 0.034 0.183*** 0.088*** -0.095*** 
 (0.035) (0.009) (0.036) (0.031) (0.008) (0.032) 
LQ𝑖𝑟𝑡 0.524*** 0.722*** 0.198*** 0.636*** 0.714*** 0.079*** 
 (0.024) (0.008) (0.025) (0.014) (0.007) (0.016) 
PROX𝑙𝑎𝑏𝑜𝑟𝑖𝑟𝑡  0.163*** 0.162*** -0.001 0.027 0.115*** 0.088 
 (0.063) (0.030) (0.095) (0.083) (0.023) (0.085) 
PROX𝐼𝑂𝑖𝑟𝑡 0.147*** 0.004 -0.143*** -0.029 0.016 0.045 
 (0.044) (0.013) (0.044) (0.037) (0.010) (0.038) 
𝑖𝑛𝑓
DEN𝑟𝑡  -0.038** -0.039*** -0.001 -0.060*** -0.040*** 0.020 
 (0.017) (0.008) (0.015) (0.015) (0.008) (0.013) 
𝑓𝑜𝑟
DEN𝑟𝑡  -0.020 -0.018* 0.002 0.045*** -0.007 -0.052*** 
 (0.020) (0.010) (0.017) (0.017) (0.010) (0.014) 
DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡  0.087*** 0.059*** -0.028** 0.077*** 0.060*** -0.017 
 (0.013) (0.006) (0.012) (0.012) (0.006) (0.011) 
NPPC𝑟𝑡 -0.074*** -0.056*** 0.018 -0.091*** -0.065*** 0.026 
 (0.024) (0.013) (0.021) (0.022) (0.013) (0.019) 
PATPC𝑟𝑡 0.234*** 0.078*** -0.156*** 0.065** 0.083*** 0.018 
 (0.031) (0.015) (0.028) (0.029) (0.015) (0.025) 
       
Control vars. V   V   
Fixed effects V   V   
Observations 803,136   803,136   
Log L -40.33   -50.53   
       
Notes: See notes for Tables 3 and 4. 
*** p<0.01, ** p<0.05, * p<0.1 
110 
 
analytical instruments (cluster code 23), and medical devices (cluster code 30) 
clusters. All high-tech clusters are traded industries, and as such it is expected that the 
relative effects should be similar to those for the traded sector, with possible 
differences in the knowledge spillover metrics. 
Table 3.5 presents the results, where again interaction terms are used to 
differentiate between industry groups. As expected, the results are qualitatively similar 
to those for traded versus local industries for most variables when considering single 
establishment births. Surprisingly however, the formal knowledge spillover metric 
becomes insignificant, being replaced with a very strong effect for the number of 
patents per capita (PATPC𝑟𝑡) metric. This suggests that for high-tech industries which 
rely heavily on R&D and patenting, the overall innovative capacity of a region, which 
is likely to be influenced by an agglomeration of high-tech firms, is more important 
compared to a diverse knowledge pool. This is consistent with Saxenian’s (1996) 
analysis of Silicon Valley and Route 128, where these regions began to attract high-
tech entrepreneurship through the existence of a large number of incumbent firms in 
similar industries. When considering all types of entrepreneurship (model 2), there is a 
general inconsistency in the estimated coefficients for the labor market proximity, 
input-output proximity, and formal knowledge spillover metrics compared to model 1. 
Considering that many high-tech firms are also high-growth firms with multiple 
establishments, these inconsistencies are likely to be capturing the founding of firm 
branches, and not true entrepreneurship in the sense of new firm formation.  
3.5.4 Manufacturing versus non-manufacturing entrepreneurship 
111 
 
As a final step, I re-estimate the model, in this case differentiating between 
manufacturing and non-manufacturing industries. Table 3.6 presents the results. 
Previous literature related to agglomeration economies and entrepreneurship has to a  
 
Table 3.6. Births of single (start-up) and all establishments, manufacturing 
versus non-manufacturing industries, Poisson estimates 
 
DV: Count of single establishment 
DV: Count of all establishment births 
births 
(2) 
  (1) 
Non- Non-
Variables Manuf. Difference Manuf. Difference 
manuf. manuf. 
        
𝐵𝐷𝑈𝑀𝑖𝑟0 0.088*** 0.146*** 0.058** 0.068*** 0.101*** 0.033 
 (0.020) (0.011) (0.023) (0.018) (0.009) (0.020) 
LQ𝑖𝑟𝑡 0.545*** 0.751*** 0.206*** 0.554*** 0.736*** 0.182*** 
 (0.021) (0.009) (0.023) (0.019) (0.007) (0.020) 
PROX𝑙𝑎𝑏𝑜𝑟𝑖𝑟𝑡  0.358*** 0.132*** -0.227*** 0.379*** 0.092*** -0.288*** 
 (0.057) (0.030) (0.060) (0.053) (0.023) (0.056) 
PROX𝐼𝑂𝑖𝑟𝑡 0.080*** 0.020 -0.060*** 0.111*** 0.023** -0.088*** 
 (0.019) (0.014) (0.023) (0.018) (0.011) (0.021) 
𝑖𝑛𝑓
DEN𝑟𝑡  -0.033*** -0.039*** -0.005 -0.031*** -0.041*** -0.009 
 (0.010) (0.008) (0.008) (0.010) (0.008) (0.007) 
𝑓𝑜𝑟
DEN𝑟𝑡  -0.031** -0.017* 0.013 -0.021 -0.006 0.015 
 (0.015) (0.010) (0.011) (0.014) (0.010) (0.010) 
DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡  0.072*** 0.058*** -0.014** 0.080*** 0.059*** -0.020*** 
 (0.008) (0.005) (0.006) (0.008) (0.006) (0.006) 
NPPC𝑟𝑡 -0.091*** -0.054*** 0.037*** -0.091*** -0.064*** 0.027*** 
 (0.016) (0.013) (0.010) (0.015) (0.013) (0.009) 
PATPC𝑟𝑡 0.154*** 0.076*** -0.079*** 0.153*** 0.080*** -0.073*** 
 (0.023) (0.015) (0.018) (0.022) (0.015) (0.018) 
       
Control vars. V   V   
Fixed effects V   V   
Observations 803,136   803,136   
Log L -40.33   -50.53   
       
Notes: See notes for Tables 3 and 4. 
*** p<0.01, ** p<0.05, * p<0.1 
 
large extent focused on the manufacturing sector, as Marshall’s microfoundations 
most readily map onto the needs of manufacturing firms. As such, it is expected that 
112 
 
all three Marshallian externalities will be more significantly correlated with 
entrepreneurship for the manufacturing sector compared to the non-manufacturing 
sector. The estimated coefficients are as expected, with labor market proximity again 
being the dominating force out of the three Marshallian factors. While the labor 
market proximity metric continues to be significant for non-manufacturing industries, 
its coefficient value decreases to nearly one-thirds that for the manufacturing sector, 
and input- output proximity becomes insignificant. Similar to the high-tech industries, 
the number of patents per capita continues to exert a strong effect on births for the 
manufacturing industries, and is consistent with previous research that suggests 
patenting activity is a key determinant of entrepreneurship for manufacturing (Akcigit 
and Kerr 2010; Ellison, Glaeser, and Kerr 2010). Overall the results are robust when 
considering all types of establishment births. The bonding social capital variable is 
again more strongly related to firm births for the manufacturing sector, which suggests 
that manufacturing industries benefit from similar social externalities as to the traded 
and high-tech industries. 
 
3.6. Conclusions 
Overall, this study provides strong support for the role of different types of social 
interactions in promoting entrepreneurship. I find evidence consistent with social 
network and social capital theory, which suggests the importance of both strong 
bonding ties of repeated interactions within communities and weak bridging ties of 
long range connections between different groups. This is a key contribution in that 
previous studies of social interactions within the economic geography literature have 
113 
 
not distinguished between these different types of interactions. The results taken 
together suggest that these social forces exert a non-trivial impact on the number of 
new firm births in a region-industry pair, over and above the Marshallian forces. 
Considering that the Marshallian forces themselves also represent to some degree 
social factors – such as homophilous interactions and trust gained through repeated 
interactions – the fact that the measures of bonding and bridging social capital 
continue to have a strong effect on entrepreneurship suggests that as a whole, social 
factors may be just as important as economic factors when examining the forces that 
drive entrepreneurship in regions. I hope that further research will clarify whether one 
is more dominant over the other in promoting entrepreneurship, and whether this 
relationship changes for different industries. 
   The broad results are consistent across a range of industry categories. The 
basic conclusion is that both Marshallian economies – with the exception of customer 
supplier linkages – and social capital are important in promoting entrepreneurship 
regardless of the industry. This result is also non-trivial considering that most previous 
studies have focused on a narrow subset of industries in testing the effects of 
agglomerative forces on entrepreneurship (e.g., Glaeser and Kerr 2009). Customer 
supplier linkages only seem to be significant for the traded, high-tech, and 
manufacturing industries, while across the board, the effect of labor market pooling 
seems to be the strongest and most robust across industry classes. I also find that the 
effects of bonding and bridging social capital are stronger for the traded, high-tech, 
and manufacturing industries compared to their counterparts. This is consistent with 
previous studies (Ellison, Glaeser, and Kerr 2010; Rosenthal and Strange 2001), 
114 
 
considering that social interactions are closely linked to theories of knowledge 
spillovers. Traded industries – which comprise all of the high-tech industry 
classifications as well as the bulk of the manufacturing sector – are more dependent on 
agglomeration economies (Delgado, Porter, and Stern 2016), and thus the effects of 
both strong repeated homophilous interactions and weak heterophilous interactions 
should be expected to be stronger for these industries compared to local industries 
which benefit less from knowledge spillovers and more from local demand.  
 The relative effects of the Marshallian micro-foundations and social capital are 
consistent when we consider not only the birth of new firms, but also all establishment 
births including new establishments of existing firms. While further research is 
needed, this suggests that the mechanisms that promote entrepreneurship for small 
firms are similar to those that are important for multi-locational firms. Future research 
should further study whether the effect of the key forces described in this study are 
consistent across a wide variety of firm sizes – including multi-national conglomerates 
– and how the relative importance of these factors varies with firm size. 
115 
 
APPENDIX A. 
Pairwise correlation matrix of variables  
 LQ 𝑙𝑎𝑏𝑜𝑟 𝐼𝑂 𝑠𝑜𝑐𝑖𝑎𝑙 𝑖𝑛𝑓 𝑓𝑜𝑟𝑖𝑟𝑡 PROX𝑖𝑟P𝑡RO X𝑖𝑟D𝑡 EN𝑟𝑡 DEN 𝑟𝑡 D EN𝑟𝑡 N PPC𝑟𝑡P ATPCH𝑟𝑡H I𝑟𝑡 MA𝑟𝑡 POP𝑟𝑡 𝑃𝐶INC𝑟E𝑡D UC𝑟𝑡F OREIGHNO𝑟M𝑡 E𝑟𝑡 
0.0
PROX𝑙𝑎𝑏𝑜𝑟               𝑖𝑟𝑡  1 
𝐼𝑂 0.0 0.4PROX               𝑖𝑟𝑡 6 0 
-
0.0 0.0
DEN𝑠𝑜𝑐𝑖𝑎𝑙             𝑟𝑡  0.00 5 
1 
𝑖𝑛𝑓 0.0 0.0 0.0 0.2
DEN𝑟𝑡  
           
0 0 0 3 
-
𝑓𝑜𝑟 0.0 0.0 0.2 0.0
DEN  0.0           𝑟𝑡 1 4 6 8 
0 
- - -
0.0 0.0 0.1
NPPC𝑟𝑡 0.0 0.0 0.0
         
1 0 4 
2 0 4 
0.0 0.0 0.0 0.2 0.0 0.7 0.3
PATPC          𝑟𝑡 3 0 2 3 6 0 2 
- - - - - -
0.0 0.0
HHI  0.0 0.0 0.0 0.0 0.3 0.3        𝑟𝑡 0 6 
6 1 1 6 5 0 
- -
0.0 0.0 0.0 0.0 0.0 0.2 0.1
MA𝑟𝑡 0.0 0.0
      
0 1 2 9 1 2 0 
4 1 
- -
0.0 0.0 0.1 0.4 0.0 0.3 0.3 0.0
POP𝑟𝑡 0.0 0.2
     
5 0 0 1 9 3 4 4 
9 7 
- -
0.0 0.0 0.4 0.2 0.3 0.3 0.4 0.0 0.4
PCINC  0.0 0.0     𝑟𝑡 1 6 0 8 6 8 8 9 5 
1 7 
- - - -
0.0 0.0 0.3 0.5 0.4 0.6 0.3 0.6
EDUC𝑟𝑡 0.0 0.0 0.2 0.0
   
1 8 2 3 6 7 9 0 
2 7 1 3 
-
0.0 0.0 0.0 0.3 0.2 0.3 0.2 0.0 0.0 0.4 0.2 0.2
FOREIGN  0.2   𝑟0𝑡  0 6 6 4 5 4 7 5 9 9 9 
9 
- - - - - - - - -
0.0 0.0 0.1 0.0 0.1
HOME𝑟𝑡 0.0 0.1 0.1 0.0 0.0 0.1 0.0 0.2 0.4
 
1 1 8 3 1 
4 8 9 0 9 8 6 5 3 
- - - - - - - -
0.0 0.0 0.1 0.2 0.2 0.0 0.1
UNEMP𝑟𝑡0 .0 0.0 0.0 0.2 0.2 0.0 0.3 0.11 5 9 6 2 3 2 
0 2 5 2 0 8 0 1 
116 
 
APPENDIX B. 
Births of single (start-up) and all establishments, before and after the recession, 
Poisson estimates  
DV: Count of single establishment 
DV: Count of all establishment births 
births 
(2) 
  (1) 
Variables 2005-2008 2009-2012 Difference 2005-2008 2009-2012 Difference 
        
𝐵𝐷𝑈𝑀𝑖𝑟0 0.140*** 0.108*** -0.031*** 0.105*** 0.077*** -0.028*** 
 (0.010) (0.011) (0.011) (0.009) (0.01) (0.011) 
LQ𝑖𝑟𝑡 0.715*** 0.728*** 0.013** 0.704*** 0.724*** 0.020*** 
 (0.008) (0.009) (0.006) (0.007) (0.008) (0.005) 
PROX𝑙𝑎𝑏𝑜𝑟𝑖𝑟𝑡  0.180*** 0.146*** -0.034*** 0.133*** 0.097*** -0.036*** 
 (0.030) (0.030) (0.003) (0.023) (0.023) (0.003) 
PROX𝐼𝑂𝑖𝑟𝑡 0.003 0.002 -0.001 0.019* 0.007 -0.011*** 
 (0.013) (0.013) (0.003) (0.010) (0.010) (0.003) 
𝑖𝑛𝑓
DEN𝑟𝑡  -0.011 -0.025*** -0.015*** -0.018** -0.027*** -0.009** 
 (0.007) (0.009) (0.004) (0.007) (0.008) (0.004) 
𝑓𝑜𝑟
DEN𝑟𝑡  -0.045*** -0.043*** 0.002 -0.030*** -0.029*** 0.000 
 (0.010) (0.009) (0.004) (0.010) (0.009) (0.004) 
DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡  0.016*** 0.027*** 0.011*** 0.023*** 0.032*** 0.010*** 
 (0.005) (0.005) (0.003) (0.005) (0.005) (0.003) 
NPPC𝑟𝑡 -0.008 -0.025** -0.017*** -0.017 -0.029** -0.011*** 
 (0.012) (0.012) (0.004) (0.012) (0.012) (0.004) 
PATPC𝑟𝑡 0.036** 0.034** -0.002 0.030** 0.033** 0.002 
 (0.015) (0.015) (0.005) (0.014) (0.014) (0.005) 
       
Control vars. V   V   
Fixed effects V   V   
Observations 803,136   803,136   
Log L -40.33   -50.53   
       
Notes: See notes for Tables 3.3 and 3.4.  
*** p<0.01, ** p<0.05, * p<0.1 
117 
 
APPENDIX C. 
Aggregate model excluding years 2009 to 2011 
 Poisson  Probit 
Single start-ups All start-ups Single start-ups All start-ups 
Variables 
 (1)  (2)  (3)  (4) 
          
𝐵𝐷𝑈𝑀𝑖𝑟0 0.120*** 0.088*** 0.020*** 0.023*** 
 (0.010) (0.009) (0.001) (0.001) 
LQ𝑖𝑟𝑡 0.717*** 0.709*** 0.062*** 0.065*** 
 (0.008) (0.007) (0.001) (0.001) 
PROX𝑙𝑎𝑏𝑜𝑟𝑖𝑟𝑡  0.155*** 0.102*** 0.063*** 0.065*** 
 (0.028) (0.023) (0.004) (0.004) 
PROX𝐼𝑂𝑖𝑟𝑡 0.005 0.019* 0.012*** 0.013*** 
 (0.013) (0.010) (0.002) (0.001) 
𝑖𝑛𝑓
DEN𝑟𝑡  -0.043*** -0.046*** -0.002 -0.003 
 (0.012) (0.012) (0.002) (0.002) 
𝑓𝑜𝑟
DEN𝑟𝑡  -0.012 0.000 -0.000 -0.001 
 (0.012) (0.012) (0.003) (0.003) 
DEN𝑠𝑜𝑐𝑖𝑎𝑙𝑟𝑡  0.067*** 0.067*** 0.002 0.004** 
 (0.007) (0.007) (0.002) (0.002) 
NPPC𝑟𝑡 -0.069*** -0.073*** -0.005 -0.008* 
 (0.017) (0.018) (0.004) (0.004) 
PATPC𝑟𝑡 0.078*** 0.083*** 0.018*** 0.017*** 
 (0.017) (0.017) (0.004) (0.004) 
HHI𝑟𝑡 0.010 0.013 -0.000 -0.002 
 (0.008) (0.009) (0.002) (0.002) 
MA𝑟𝑡 -0.547*** -0.492*** -0.025 -0.048 
 (0.164) (0.174) (0.040) (0.041) 
POP𝑟𝑡 -0.010 0.008 0.026 0.045 
 (0.119) (0.121) (0.030) (0.031) 
PCINC𝑟𝑡 0.110*** 0.122*** 0.004 0.010*** 
 (0.014) (0.015) (0.004) (0.004) 
EDUC𝑟𝑡 -0.023** -0.022* -0.005 -0.003 
 (0.011) (0.011) (0.003) (0.003) 
FOREIGN𝑟𝑡 -0.042*** -0.032*** -0.004 -0.002 
 (0.012) (0.012) (0.003) (0.003) 
HOME𝑟𝑡 0.002 0.001 0.002 0.002 
 (0.007) (0.007) (0.002) (0.002) 
UNEMP𝑟𝑡 -0.018** -0.006 0.001 0.002 
 (0.007) (0.007) (0.002) (0.002) 
     
Fixed Effects V V V V 
Observations 501,960 501,960 501,960 501,960 
     
Notes: See notes for Table 3.  
*** p<0.01, ** p<0.05, * p<0.1 
118 
 
REFERENCES 
 
Acs, Zoltan, and Catherine Armington. 2004. “Employment Growth and 
Entrepreneurial Activity in Cities.” Regional Studies 38 (8): 911–27. 
———. 2006. Entrepreneurship, Geography, and American Economic Growth. 
Cambridge University Press Cambridge. 
Agrawal, Ajay, Iain Cockburn, Alberto Galasso, and Alexander Oettl. 2014. “Why 
Are Some Regions More Innovative than Others? The Role of Small Firms in 
the Presence of Large Labs.” Journal of Urban Economics 81: 149–65. 
Agrawal, Ajay, Iain Cockburn, and John McHale. 2006. “Gone but Not Forgotten: 
Knowledge Flows, Labor Mobility, and Enduring Social Relationships.” 
Journal of Economic Geography 6 (5): 571–91. 
Akcigit, Ufuk, and William R Kerr. 2010. “Growth through Heterogeneous 
Innovations.” No. w16443. NBER working paper. 
Audretsch, David B, and Max Keilbach. 2004. “Entrepreneurship and Regional 
Growth: An Evolutionary Interpretation.” Journal of Evolutionary Economics 
14 (5): 605–16. 
Becker, Randy, and Vernon Henderson. 2000. “Effects of Air Quality Regulations on 
Polluting Industries.” Journal of Political Economy 108 (2): 379–421. 
Blundell, Richard, Rachel Griffith, and John Van Reenen. 1995. “Dynamic Count 
Data Models of Technological Innovation.” The Economic Journal, 333–44. 
119 
 
Bürker, Matthias, and G Alfredo Minerva. 2014. “Civic Capital and the Size 
Distribution of Plants: Short-Run Dynamics and Long-Run Equilibrium.” 
Journal of Economic Geography 14 (4): 797–847. 
Burt, Ronald S. 2004. “Structural Holes and Good Ideas.” American Journal of 
Sociology 110 (2): 349–99. 
———. 2005. Brokerage and Closure: An Introduction to Social Capital. Oxford 
University Press. 
Cameron, A C, and Pravin K Trivedi. 2013. Regression Analysis of Count Data. Vol. 
53. Cambridge, UK: Cambridge University Press. 
Caragliu, Andrea, and Peter Nijkamp. 2016. “Space and Knowledge Spillovers in 
European Regions: The Impact of Different Forms of Proximity on Spatial 
Knowledge Diffusion.” Journal of Economic Geography 16 (3): 749–74. 
Chamberlain, Gary. 1980. “Analysis of Covariance With Qualitative Data.” Review of 
Economic Studies 47: 225–38. 
Chen, Henry, Paul Gompers, Anna Kovner, and Josh Lerner. 2010. “Buy Local? The 
Geography of Venture Capital.” Journal of Urban Economics 67 (1): 90–102. 
Coleman, James S. 1988. “Social Capital in the Creation of Human Capital.” 
American Journal of Sociology 94: S95–120. 
Combes, Pierre-Philippe, and Gilles Duranton. 2006. “Labour Pooling, Labour 
Poaching, and Spatial Clustering.” Regional Science and Urban Economics 36 
(1): 1–28. 
120 
 
Currid, Elizabeth, and Sarah Williams. 2010. “The Geography of Buzz: Art, Culture 
and the Social Milieu in Los Angeles and New York.” Journal of Economic 
Geography 10 (3): 423–51. 
Dahl, Michael S, and Olav Sorenson. 2012. “Home Sweet Home: Entrepreneurs’ 
Location Choices and the Performance of Their Ventures.” Management 
Science 58 (6): 1059–71. 
Delgado, Mercedes, Michael E Porter, and Scott Stern. 2010. “Clusters and 
Entrepreneurship.” Journal of Economic Geography 10 (4): 495–518. 
———. 2016. “Defining Clusters of Related Industries.” Journal of Economic 
Geography 16 (1): 1–38. 
Durlauf, Steven N. 2002. “On The Empirics Of Social Capital.” The Economic 
Journal 112 (483): F459–79. 
Ellison, Glenn, Edward L Glaeser, and William R Kerr. 2010. “What Causes Industry 
Agglomeration? Evidence from Coagglomeration Patterns.” The American 
Economic Review 100 (3): 1195–1213. 
Fairlie, Robert W. 2014. Kauffman Index of Entrepreneurial Activity 1996 - 2013. 
Kansas City, MO: Kauffman Foundation. 
Feld, Scott L. 1981. “The Focused Organization of Social Ties.” American Journal of 
Sociology, 1015–35. 
Fujita, Masahisa, Paul Krugman, and Anthony J. Venables. 1999. The Spatial 
Economy: Cities, Regions, and International Trade. Cambridge, MA: MIT 
Press. 
121 
 
Glaeser, Edward L. 2001. “The Formation of Social Capital.” Canadian Journal of 
Policy Research 2 (1): 34–40. 
Glaeser, Edward L. 2008. Cities, Agglomeration, and Spatial Equilibrium. Oxford 
University Press. 
Glaeser, Edward L., H.D. Kallal, J. A. Scheinkman, and A. Shleifer. 1992. “Growth in 
Cities.” Journal of Political Economy 100: 1126–52. 
Glaeser, Edward L, and William R Kerr. 2009. “Local Industrial Conditions and 
Entrepreneurship: How Much of the Spatial Distribution Can We Explain?” 
Journal of Economics & Management Strategy 18 (3): 623–63. 
Glaeser, Edward L, Stuart S Rosenthal, and William C Strange. 2010. “Urban 
Economics and Entrepreneurship.” Journal of Urban Economics 67 (1): 1–14. 
Granovetter, Mark. 1973. “The Strength of Weak Ties.” American Journal of 
Sociology 78 (6): 1360–80. 
———. 1995. “The Economic Sociology of Firms and Entrepreneurs.” In The 
Economic Sociology of Immigration, edited by Alejandro Portes. New York: 
Russell Sage. 
Guimaraes, Paulo, Octávio Figueirdo, and Douglas Woodward. 2003. “A Tractable 
Approach to the Firm Location Decision Problem.” Review of Economics and 
Statistics 85 (1): 201–4. 
Hausmann, Ricardo, and Bailey Klinger. 2006. “The Evolution of Comparative 
Advantage: The Impact of the Structure of the Product Space.” CID Working 
Paper, no. 106. 
122 
 
Helsley, Robert W, and William C Strange. 1990. “Matching and Agglomeration 
Economies in a System of Cities.” Regional Science and Urban Economics 20 
(2): 189–212. 
Hidalgo, César A, and Ricardo Hausmann. 2009. “The Building Blocks of Economic 
Complexity.” Proceedings of the National Academy of Sciences 106 (26): 
10570–75. 
Hidalgo, César A, Bailey Klinger, A-L Barabási, and Ricardo Hausmann. 2007. “The 
Product Space Conditions the Development of Nations.” Science 317 (5837): 
482–87. 
Hoang, Ha, and Bostjan Antoncic. 2003. “Network-Based Research in 
Entrepreneurship: A Critical Review.” Journal of Business Venturing 18 (2): 
165–87. 
Hsiao, Cheng. 1986. Analysis of Panel Data. New York: Cambridge University Press. 
Ioannides, Yannis M. 2013. From Neighborhoods to Nations: The Economics of 
Social Interactions. Princeton University Press. 
Jackson, Matthew O. 2008. Social and Economic Networks. Vol. 3. Princeton: 
Princeton University Press. 
Jacobs, Jane. 1969. The Economy of Cities. New York: Vintage. 
Jaffe, Adam B, Manuel Trajtenberg, and Rebecca Henderson. 1993. “Geographic 
Localization of Knowledge Spillovers as Evidenced by Patent Citations.” The 
Quarterly Journal of Economics, 577–98. 
Jofre-Monseny, Jordi, Raquel Marín-López, and Elisabet Viladecans-Marsal. 2011. 
“The Mechanisms of Agglomeration: Evidence from the Effect of Inter-
123 
 
Industry Relations on the Location of New Firms.” Journal of Urban 
Economics 70 (2): 61–74. 
Kemeny, Tom, Maryann Feldman, Frank Ethridge, and Ted Zoller. 2016. “The 
Economic Value of Local Social Networks.” Journal of Economic Geography 
16 (5): 1101–22. 
Kitson, Michael, Ron Martin, and Peter Tyler. 2004. “Regional Competitiveness: An 
Elusive yet Key Concept?” Regional Studies 38 (9): 991–99. 
Krugman, Paul. 1991. Geography and Trade. Cambridge, MA: MIT Press. 
Leontief, Wassily W. 1941. The Structure of American Economy, 1919-1929. 
Cambridge, MA: Harvard University Press. 
Marshall, Alfred. 1920. Principles of Economics. London: MacMillan. 
McPherson, Miller, Lynn Smith-Lovin, and James M Cook. 2001. “Birds of a Feather: 
Homophily in Social Networks.” Annual Review of Sociology, 415–44. 
Mukim, Megha. 2014. “Coagglomeration of Formal and Informal Industry: Evidence 
from India.” Journal of Economic Geography, 329–351. 
Murphy, James T. 2003. “Social Space and Industrial Development in East Africa: 
Deconstructing the Logics of Industry Networks in Mwanza, Tanzania.” 
Journal of Economic Geography 3 (2): 173–98. 
Porter, Michael E. 1998. “Location, Clusters, and the New Microeconomics of 
Competition.” Business Economics, 7–13. 
———. 2003. “The Economic Performance of Regions.” Regional Studies 37 (6–7): 
549–78. 
124 
 
Putnam, Robert D. 2001. Bowling Alone: The Collapse and Revival of American 
Community. Simon and Schuster. 
Putnam, Robert D, Robert Leonardi, and Raffaella Y Nanetti. 1993. Making 
Democracy Work: Civic Traditions in Modern Italy. Princeton university press. 
Putnam, Robert, Ivan Light, Xavier de Souza Briggs, William M. Rohe, Avis C. 
Vidal, Judy Hutchinson, Jennifer Gress, and Michael Woolcock. 2004. “Using 
Social Capital to Help Integrate Planning Theory, Research, and Practice.” 
Journal of the American Planning Association 70 (2): 142–92. 
Rauch, James E. 1993. “Productivity Gains from Geographic Concentration of Human 
Capital: Evidence from the Cities.” Journal of Urban Economics, no. 34: 380–
400. 
Reynolds, Paul D, and Richard T Curtin. 2009. New Firm Creation in the United 
States: Initial Explorations with the PSED II Data Set. Vol. 23. Springer. 
Rosenthal, Stuart S, and William C Strange. 2001. “The Determinants of 
Agglomeration.” Journal of Urban Economics 50 (2): 191–229. 
———. 2003. “Geography, Industrial Organization, and Agglomeration.” Review of 
Economics and Statistics 85 (2): 377–93. 
———. 2004. “Evidence on the Nature and Sources of Agglomeration Economies.” 
Handbook of Regional and Urban Economics 4: 2119–71. 
Saxenian, AnnaLee. 1996. Regional Advantage: Culture and Competition in Silicon 
Valley and Route 128. Cambridge, MA: Harvard University Press. 
125 
 
Scherer, Frederic. 1984. “Using Linked Patent and R&D Data to Measure 
Interindustry Technology Flows.” In R&D, Patents, and Productivity, 417–64. 
University of Chicago Press. 
Schumpeter, Joseph A. 1934. The Theory of Economic Development: An Inquiry into 
Profits, Capital, Credit, Interest, and the Business Cycle. Cambridge, MA: 
Harvard University Press. 
Scott, Allen J, John Agnew, Edward W Soja, and Michael Storper. 2001. Global City-
Regions: An Overview. Oxford University Press. 
Sorenson, Olav. 2005. “Social Networks and Industrial Geography.” In 
Entrepreneurships, the New Economy and Public Policy, 55–69. Springer. 
Sorenson, Olav, and Pino G Audia. 2000. “The Social Structure of Entrepreneurial 
Activity: Geographic Concentration of Footwear Production in the United 
States, 1940–1989.” American Journal of Sociology 106 (2): 424–62. 
Sorenson, Olav, and Toby E Stuart. 2001. “Syndication Networks and the Spatial 
Distribution of Venture Capital Investments.” American Journal of Sociology 
106 (6): 1546–88. 
Souza Briggs, Xavier de. 1998. “Brown Kids in White Suburbs: Housing Mobility and 
the Many Faces of Social Capital.” Housing Policy Debate 9 (1): 177–221. 
Storper, Michael. 1995. “Competitiveness Policy Options: The Technology‐regions 
Connection.” Growth and Change 26 (2): 285–308. 
———. 2013. Keys to the City: How Economics, Institutions, Social Interaction, and 
Politics Shape Development. Princeton University Press. 
126 
 
Storper, Michael, and Susan Christopherson. 1987. “Flexible Specialization and 
Regional Industrial Agglomerations: The Case of the US Motion Picture 
Industry.” Annals of the Association of American Geographers 77 (1): 104–17. 
Storper, Michael, and Anthony J Venables. 2004. “Buzz: Face-to-Face Contact and the 
Urban Economy.” Journal of Economic Geography 4 (4): 351–70. 
Stuart, Toby E, and Olav Sorenson. 2003. “The Geography of Opportunity: Spatial 
Heterogeneity in Founding Rates and the Performance of Biotechnology 
Firms.” Research Policy 32 (2): 229–53. 
———. 2005. “Social Networks and Entrepreneurship.” In Handbook of 
Entrepreneurship Research, 233–52. Springer. 
Turkina, Ekaterina, Ari Van Assche, and Raja Kali. 2016. “Structure and Evolution of 
Global Cluster Networks: Evidence from the Aerospace Industry.” Journal of 
Economic Geography, August. doi:10.1093/jeg/lbw020. 
Wilson, Kenneth L, and Alejandro Portes. 1980. “Immigrant Enclaves: An Analysis of 
the Labor Market Experiences of Cubans in Miami.” American Journal of 
Sociology, 295–319. 
Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. 
MIT press. 
Zipf, George Kingsley. 1949. Human Behavior and the Principle of Least Effort. New 
York: Hafner. 
 
 
  
127 
 
CHAPTER 4  
PATHWAYS FOR ENTREPRENEURSHIP DRIVEN ECONOMIC GROWTH: 
ENVISIONING THE INDUSTRY SPACE 
 
4.1. Introduction 
Many theories exist as to why economic growth takes place. Among others, one of the 
oldest theories of economic growth emphasizes capital deepening, which refers to the 
increase in physical capital per worker (Smith 1776). In this simple model, more 
capital per worker increases productivity – and thus wages – by allowing each worker 
to work more efficiently, as opposed to the case where production is labor intensive. 
In more recent years, endogenous growth theory (Lucas 1988; Romer 1986) has 
developed, and suggests that the accumulation of human capital and resulting 
technological advances are the main cause of increased productivity and income per 
worker. Finally, at least with urban economies, agglomeration effects have been 
theorized to play the most critical role in growth, by allowing cities to reap the 
benefits of physical proximity and increasing returns to scale (Marshall 1920; Glaeser 
2008).  
The most fundamental aspect of cities is this prevalence of agglomeration. 
Firms agglomerate to reap the benefits of increasing returns, which are gained 
primarily through proximity and thus a reduction in transport costs. Most famously, 
Marshall (1920) emphasized the importance of three types of agglomeration 
economies, namely those of goods, people, and ideas. Qualitatively, Saxenian’s (1996) 
study of Silicon Valley and Route 128, among others, provided a glimpse as to why 
some regions succeed while others decline, and urban economists and regional 
128 
 
scientists alike have developed a lengthy literature on why cities develop (Christaller 
1966; Losch 1954; Krugman 1991; Glaeser 2008). Yet while much progress has been 
made on why economic growth takes place, relatively little has been done with regards 
to how cities should promote growth given this theoretical backdrop. 
Possibly one of the reasons for such paucity is due to the complexity of the 
urban economy itself. The urban economy is a complex system that encompasses 
countless different elements, including but not limited to firms, local governments, 
institutions, and people and their social networks. Externally, urban economies are 
also influenced by their surrounding environment through competition and 
collaboration with neighboring areas. These different internal and external factors 
constitute an “ecosystem” where a plethora of elements act together to shape the 
overall outcome of the system as a whole (Batty 2013). Naturally, a holistic theory of 
economic development at the local level is extremely difficult to develop, because 
fully understanding all aspects of this ecosystem and how these elements interact with 
one another is almost impossible. Nevertheless, a theory as to how economic growth 
should take place, if provided for, should prove valuable to planners and policy 
makers alike, for it would allow for the efficient use of public resources towards 
achieving optimal growth.  
This paper attempts to bridge this gap between what is available and what is 
needed by integrating new insights from complexity science and development 
economics with more traditional theories of economic development that exist in the 
urban planning and urban economics literatures to study optimal patterns of economic 
growth, albeit narrowly defined. Rather than taking the holistic approach, I focus on 
129 
 
one of the key underlying aspects of growth, namely structural change (W. A. Lewis 
1954). This transformation of economies is characterized by a continuous evolution of 
the underlying technologies, capital, institutions, and social fabric such that markets 
evolve and new products emerge. At the national level, development economics has 
traditionally focused on the shift from agriculture to manufacturing and services 
(Solow 1956), which leads to productivity gains and thus growth. However, at the 
local level, this type of simplistic transformation need not always occur. Due to spatial 
proximity, local economies are much more integrated with each other, and trade 
occurs much more easily than across national borders. Thus for example, places in 
central Iowa and South-central Illinois are still able to rely heavily on corn farming 
and processing as a main industry while maintaining comparable levels of wages.  
  In order to provide evidence for a theory of how local economic growth 
through structural change should take place, I consider the collection of individual 
industries as elements of an economic “ecosystem.” Thus rather than attempting to 
document all aspects of a local economy, I focus just on industries, which makes the 
analysis much more tenable. Nonetheless, when modeling structural change, this focus 
on industries should be suitable, as the change in composition of industries within a 
given local economy should represent well the structural change that is occurring 
within. Another critical aspect of ecosystems that is emphasized throughout this paper 
is the inter-relatedness of its components, which in this particular case would be how 
different industries are related to one another. Thus in order to properly model the 
industry ecosystem, I construct a network of the “industry space,” where industries are 
linked to each other based on how similar they are. This terminology as well as the 
130 
 
broad conceptual framework follow that of Hidalgo, Klinger, Barabási, and 
Hausmann’s (2007) work on the “product space,” which constructs a similar network 
of products (instead of industries), but differs in its focus on national economies. After 
constructing such a model, I develop a simple measure of a city’s position within the 
industry space that is based on specialization patterns, and conduct empirical analyses 
at the city level to discern how the position of cities within the industry space effects 
economic growth. I also utilize GIS to add a spatial dimension to the general empirical 
results and see if cities’ spatial positions affect growth pathways. 
 
4.2. Related literature  
4.2.1 Traditional theories of economic growth  
The characteristic feature of early growth theories is that production involves three 
inputs; namely labor, capital, and natural resources. Most famously, Adam Smith’s 
(1776) “The Wealth of Nations” emphasized capital accumulation and increased labor 
productivity as the engine of growth, by stating that income per capita must in every 
nation be regulated by two different circumstances; first, by the skill, dexterity, and 
judgment with which its labour is generally applied; and, secondly, by the proportion 
between the number of those who are employed in useful labour, and that of those who 
are not so employed. 
Accordingly, Smith’s focus was on determining the factors that enhanced labor 
productivity, that is, the factors that affected the skill, dexterity, and judgment of 
workers. The key argument for Smith was that the division of labor – both within firms 
and industries as well as between them –  was critical in achieving this increased 
131 
 
productivity, which in turn depended largely on capital accumulation. Smith – along 
with later scholars such as Ricardo (1891) and Malthus (1888) – argued that larger 
divisions of labor created more productive processes, which resulted in increasing 
returns and thus larger markets. 
 The ‘neoclassical’ school of economic thought superseded this classical theory 
of growth, asserting that the factors of production – labor, capital, and natural resources 
such as land – were scarce, and thus an increase in capital only had a temporary and 
limited impact on growth due to diminishing returns. Thus for continuous growth to 
take place, other exogenous factors needed to be taken into consideration (Cassel 1932; 
Domar 1947; Harrod 1948). Most famously, the neoclassical model of Solow (1956) 
and Swan (1956) posited that increases in the rate of economic growth were dependent 
on mainly two factors; the first being increased investments, and the second being 
technological progress. As in the classical model, an increased proportion of GDP that 
is invested leads to increases in capital and thus growth, yet diminishing returns due to 
scarcity of resources leads to convergence in growth rates. This is offset by 
technological progress, which is exogenous in the model and is theorized to increase the 
productivity of both labor and capital, resulting in sustained growth rates.  
 Neoclassical theories of growth were met with strong criticisms due to their 
many simplifying assumptions, including a single production function that was assumed 
for all economies, as well as identical trajectories of growth that did not explain 
empirical discrepancies in growth patterns. As such, new growth theory (i.e. 
endogenous growth) emerged in the 1980s, attempting to explain the poor performance 
of many developing countries that had implemented policies aligned with neoclassical 
132 
 
theories. Unlike neoclassical models, new growth theory considered technological 
progress to be endogenous, emphasizing that economic growth results from increasing 
returns to the use of knowledge rather than labor and capital (Aghion and Howitt 1992; 
Lucas 1988; Romer 1986). Workers with greater knowledge, education, and training 
were theorized to increase rates of technological advancement, which boosted output 
and thus economic growth. The theory argued that the higher rate of returns as expected 
in the Solow-Swan model is greatly eroded by lower levels of complementary 
investments in human capital (education), infrastructure, or research and development. 
In addition, knowledge was theorized to be different from other economic goods 
because of 1) its possibility to grow boundlessly, 2) it being able to be reused at zero 
additional cost, and 3) its creation of spillover benefits to other firms and industries. 
 
4.2.2 Urban economic growth 
Theories of growth at the urban level differ from theories developed within the 
development economics literature mentioned above in that the geographical 
perspective is greatly emphasized. While it is still the case that capital deepening, 
investment, human capital, and technological progress are important in promoting 
sustained economic growth, growth at the urban level is to a large extent dependent on 
agglomeration economies. Within cities, the physical proximity of labor and capital 
reduces transport costs, thus increasing productivity and promoting growth by creating 
externalities related to the sharing of production inputs, the pooling of labor markets, 
and the spillover of knowledge (Marshall 1920). Other than bringing the inputs of the 
production process together, cities also increase productivity and income by 
133 
 
facilitating face-to-face communication (Ioannides and Topa 2010; Ioannides 2013) 
and thus serving as the engines of economic growth (Lucas 2001). 
 Specialization is one aspect of urban economies that promotes productivity 
gains. Specialization arises because denser aggregations of urban communities with a 
large number of firms producing in proximity can support firms that are more 
specialized in producing intermediate products. The gains from specialization also 
extend to the production of services. For example, specialized legal services – such as 
taxation, copyright law, or secured transactions – can be provided for more efficiently 
by firms that concentrate in specific areas. Specialization increases the opportunities 
for cost reduction through the routinization or automation of production, and the 
specialized firms and industries can provide for a wider spectrum of customers due to 
larger markets in urban areas.  
 Large urban labor markets also facilitate better matches between worker skills 
and job requirements, and help to weather fluctuations in labor demand at the firm 
level. In an urban economy, an increase in the number of workers across the skills 
spectrum increases the probability that workers with a specific skill set will exist in the 
labor market (Helsley and Strange 1990). This in turn reduces costs associated with 
job searches and training, while also generating better skill matches and thus higher 
net wages for workers. This higher wage incentivizes workers to migrate to cities, 
which reinforces the process and creates additional agglomeration externalities. 
 Knowledge spillovers are also a key aspect of agglomeration economies, and is 
one of the most studied (Berry and Glaeser 2005; Glaeser, Scheinkman, and Shleifer 
1995; Moretti 2004; Rauch 1993; Shapiro 2006). As is the case for national 
134 
 
economies, an increase in the education or skills of a worker increases worker 
productivity, which induces competition among employers to increase wages to match 
higher productivity levels. However, a key difference is that at the urban level, face-
to-face interactions – in both formal and informal settings – create a multiplier effect 
by allowing workers with more human capital to better share and communicate their 
skills with peers. Thus if workers with higher human capital generate more and better 
ideas, an increase in human capital at the urban level increases rates of technological 
innovation. Glaeser et al. (1995) show with a cross section of cities that cities with 
higher human capital levels experienced large increases in per-capita income over the 
period 1960 to 1990. In addition, spillovers due to human capital externalities have 
been shown to benefit the lesser skilled, suggesting that human capital also has a 
favorable distribution effect (Moretti 2004). 
 Apart from agglomeration economies, regional economic theory has to a large 
extent been influenced by export-base theory (North 1955; Tiebout 1956). Traditional 
theories of economic growth at the national level mentioned in the previous section are 
to a large extent supply oriented, presuming that factor and product cost adjustments 
boost supply and resource utilization. Export-base theory on the other hand is 
fundamentally demand oriented, essentially making it a Keynesian-type model (W. C. 
Lewis 1972). It argues that an economy may be bifurcated into two sectors; an export 
oriented sector and a non-export oriented sector. Here, the export sector trades outside 
the region’s boundaries, bringing in money into the local economy and thus providing 
for further growth, while the non-export oriented sector supplies local consumption 
goods and amenities whose activity depends on the sales of the export sector. Thus the 
135 
 
focus of growth is on promoting production in the export sector, which is theorized to 
create ‘multiplier effects’ by inducing growth in other related sectors that support the 
sector that is exporting its goods outside the region.  
Anecdotally, many regions have seen economic growth by developing their export-
oriented sectors, as in the case of the IT industry in Silicon Valley or the 
biotechnology industry in Boston. Nonetheless, export-base theory has been met with 
much criticism. Primarily, promoting development of the export sector has been touted 
as simply shifting productive activity from one region to another, resulting in a zero-
sum outcome where localities compete for firms by providing extensive tax incentives 
and other benefits that have been proved to be ineffective while degrading the quality 
of local government services (Bartik 1992, 2005; Zheng and Warner 2010). Others 
have argued that the role of urban density in promoting consumption is equally as 
important, finding that the local amenities provided and thus the attractiveness of the 
‘consumer city’ promotes growth by retaining workers and attracting migrants 
(Glaeser, Kolko, and Saiz 2001; Glaeser and Gottlieb 2006). Acknowledging this 
ongoing debate, the next section highlights studies that have attempted to provide 
theory and evidence suggesting optimal pathways for economic growth. 
 
4.2.3 Pathways for economic growth 
Within the development economics literature, one of the most prominent theories of 
growth that focuses on patterns of development is the structural change model 
(Chenery 1960; W. A. Lewis 1954). This model demonstrates how an economy 
transforms from the subsistence level concerned with agriculture for personal 
136 
 
consumption to a modern industrialized economy. The Lewis (1954) model considers 
two sectors in an underdeveloped economy with an overpopulated agricultural sector 
and an urbanized manufacturing economy to which the excess labor migrates. The 
theory suggests that the excess labor migrating to the manufacturing sector brings 
about productivity gains and an expansion of output. The Patterns of Demand model 
of Chenery (1960), while similar to that of Lewis, focuses on the changing 
composition of consumer demand from emphasis on food commodities to multiple 
manufactured goods and services.  
The early models of Lewis and Chenery have been met with criticism 
regarding their underlying restrictions, such as the assumption of unlimited supply of 
rural labor and neglect of agriculture as a viable sector. Such criticisms 
notwithstanding, structural change models are unique in that they focus more on the 
pattern of development, rather than why development takes place. For the purposes of 
the current analysis, this focus on the ‘how’ of economic development is why this 
model is chosen as a starting point, albeit in a general sense. Defining structural 
change broadly as a shift in the basic way a market or economy functions or operates 
(Todaro and Smith 2012), the insights of the early models can be utilized in a more 
contemporary, urban setting. Although not explicitly considering urban economies in 
general, many scholars have already expanded the model to incorporate structural 
changes not only in agriculture and manufacturing, but across all economic functions 
including urbanization, growth of populations, and trade (Chenery and Taylor 1968; 
Chenery, Syrquin, and Elkington 1975; Kuznets 1971).  
137 
 
Considering structural change as the underlying mechanism in which 
economies grow, recent studies that merge complexity science with economic growth 
provide a good starting point to analyze optimal pathways for growth (Hidalgo et al. 
2007; Hidalgo and Hausmann 2009). Most relevant to this analysis, Hidalgo et al. 
(2007) develop a network model of the ‘product space’ consisting of all products that 
are manufactured for export. Products are linked to each other by a measure of 
proximity based on observed co-production patterns across countries, where the links 
are stronger if the production of two products move in tandem across countries. The 
study shows that the product space has a distinct core-periphery structure, with less 
advanced products such as agricultural goods constituting the periphery, and the core 
being populated by more advanced goods such as vehicles, machinery, or electronics. 
The authors empirically demonstrate that more advanced countries populate (i.e. 
export) products nearer the core, and conclude that development should ideally aim to 
shift a country’s product mix towards products nearer the core of the network, home to 
the more advanced products that yield higher returns. 
 This current study aims to extend upon this work by considering the network 
of industries (instead of products), and linking these industries based on agglomeration 
patterns, thus creating what will be referred to as the ‘industry space.’ Considering 
industries allows for the analysis to be better aligned with structural change theory by 
suggesting that a change in the underlying structure of economies is better represented 
by a shift in the underlying industrial structure of economies rather than the products 
that these industries produce. Furthermore, linking these industries based on 
agglomeration patterns merges complexity science and structural change theory with 
138 
 
the urban economics and regional science literature by allowing the spatial aspects of 
economies to enter into the design of the network itself. Ultimately, the goal of this 
study is to suggest optimal pathways for structural change – through shifts in the 
underlying industry structure of urban economies – that maximizes growth. 
 
4.3. The industry space 
4.3.1 The Ellison-Glaeser (EG) index of coagglomeration 
In order to construct a network of industries (i.e. the industry space) and measure the 
relative positions of cities within this constructed network, first a measure of pairwise 
relationships (which correspond to the links in the network) between any two 
industries is needed. This measure should ideally capture both 1) how similar the two 
industries are across a variety of dimensions – such as input-output linkages, 
occupational mix, and the utilization of knowledge – as well as 2) the observed 
location patterns of the two industries, with industries that tend to be collocated 
earning a higher value.  
 An ideal starting point for such a measure is the Ellison and Glaeser (1997, 
hereafter EG) index, which is a single-industry metric based on a “dartboard” model 
of location choice, where firms sequentially choose locations in order to maximize 
profits. The EG index 𝛾𝑖 is such that: 
𝐺𝑖/(1 − ∑
2
𝑚 𝑥𝑚)  − 𝐻𝑖
𝛾𝑖 = , 𝑤ℎ𝑒𝑟𝑒 1 − 𝐻𝑖
𝑀
𝐺𝑖 = ∑(𝑠𝑚𝑖 − 𝑥
2
𝑚)  . 
𝑚=1
139 
 
Here 𝑠𝑚𝑖 is the share of industry i’s employment contained in region m, and 𝑥𝑚 is 
another measure of the size of region m, such as the region’s share of population or 
aggregate employment. Thus 𝐺𝑖 can be considered as a simple measure of raw 
geographic concentration, while Hi is the plant-level Herfindahl index of employment 
for industry i. 
The index takes a value of zero when observed employment is only as 
concentrated as when firms choose locations by throwing darts at a map (Ellison and 
Glaeser 1997). The index has several advantageous properties. First, it is an index of 
agglomeration, which embodies the tendency for firms to collocate due to similarities 
across a wide variety of aspects including labor, goods used for production, and 
knowledge. Second, it controls for the lumpiness of employment by accounting for 
plant size through the plant Herfindahl measure. This is advantageous for industries 
for which plant sizes are unusually large, for in these cases the observed 
agglomeration of employment is not all due to agglomeration per se, but also due to 
the fact that large plant sizes preclude a wide dispersion of employment across areas. 
Finally, the index is theoretically invariant to industry size and the granularity of 
geographic data, and thus facilitates comparisons across industries, regions, and time.  
 Nonetheless, the EG index is insufficient for this analysis since it is a single-
industry metric, and not a pairwise metric which measures the relationship between 
any two given industries. Ellison, Glaeser, and Kerr (2010) propose a modified version 
of the EG index, the EG coagglomeration index, which measures the coagglomeration 
of two industries, and it is this metric which is used in this analysis as a measure of the 
strength of relation between two given industries. Ellison, Glaeser, and Kerr (2010) 
140 
 
show that the EG coagglomeration index is equivalent to the EG index when the 
number of industries equals two, and that the index can be regarded as a measure of 
agglomerative strength in a location choice model. The EG coagglomeration index for 
industries i and j is 
∑𝑀𝑚=1(𝑠𝑚𝑖 − 𝑥𝑚)(𝑠𝑚𝑗 − 𝑥𝑚)
𝛾𝑖𝑗 =   . 1 − ∑𝑀 2𝑚=1 𝑥𝑚
Here, 𝑠𝑚𝑖 is the share of industry i’s new establishment births – as opposed to 
employment shares of incumbent firms – within region m, and 𝑥𝑚 is a measure of the 
size of area m, which in this case is the share of new establishment births in the region 
with respect to the US.. The use of establishment counts in lieu of employment levels 
is mainly due to data availability, for employment counts for new establishments at a 
detailed industry classification level is difficult to obtain. I also consider only the 
coagglomeration of new establishments instead of that of existing firms, to better 
account for structural change. Finally, I only consider the new establishment births of 
single establishment businesses and exclude multi-establishment firms to better 
capture true entrepreneurial activity. Since single establishments are usually much 
smaller in size (typically less than five employees), this has an added benefit of 
mitigating the error in considering establishment counts over employment levels. As 
opposed to the original EG index, the EG coagglomeration index for two industries 
does not contain the plant level employment Herfindahl index Hi, which makes data 
collection much easier because establishment level employment counts does not need 
to be measured.  
 Instead of focusing on just the manufacturing industries as has been done in 
previous agglomeration studies (Ellison and Glaeser 1997; Ellison, Glaeser, and Kerr 
141 
 
2010; Rosenthal and Strange 2001, 2003), I compute pairwise EG coagglomeration 
index values for all industries at the 4-digit industry level using the 2007 North 
American Industry Classification System (NAICS), excluding agriculture, private 
households, public administration, as well as some industries for which 
entrepreneurship data was not available.50 The 4-digit NAICS level is used in order to 
strike a balance between granularity and error in constructing concordances between 
the 2002, 2007, and 2012 NAICS classifications, which needs to be done due to the 
panel nature of the data spanning the years 2006 to 2013. The data for new 
establishment counts is drawn from the Statistics of U.S. Businesses (SUSB), an 
annual dataset produced by the US Census Bureau that provides detailed geographic 
and industry level data on the count of new establishments, as well as firm births, 
deaths, expansions, and contractions. I calculate the EG coagglomeration index at the 
Metropolitan Statistical Area (MSA) level for the 2006 to 2013 panel years, using the 
2009 MSA definitions published by the US Office of Management and Budget 
(OMB).51 The result is a total of 39,340 observations of the EG coagglomeration index 
for 281 industries in each panel year. 
 Table 4.1 lists the ten most and least coagglomerated industry pairs according 
to the calculated EG coagglomeration measures, averaged across panel years. A 
comparison between the calculated values and those of Ellison et al. (2010) shows a 
striking similarity, with textile and apparel industries exhibiting a very high tendency  
                                                 
50 Entrepreneurship data was not available for postal services (NAICS 4911), rail transportation (NAICS 
4821), monetary authorities - central bank (NAICS 5211), and insurance and employee benefit funds 
(NAICS 5251).  
51 I only considered the MSAs within the lower 48 states, and further excluded some MSAs for which 
boundaries changed significantly during the panel years. This resulted in a total of 348 MSAs being 
included in the analysis. 
142 
 
Table 4.4. Ellison-Glaeser (EG) coagglomeration index values 
1.a. Highest ten industries 
EG 
Rank Industry i (4 digit NAICS code) Industry j (4 digit NAICS code) 
index 
Cut and Sew Apparel Manufacturing Independent Artists, Writers, and Performers 
1 0.147 
(3152) (7115) 
Cut and Sew Apparel Manufacturing 
2 Apparel Wholesalers (4243) 0.143 
(3152) 
Cut and Sew Apparel Manufacturing 
3 Motion Picture, Video Industries (5121) 0.128 
(3152) 
Cut and Sew Apparel Manufacturing Agents for Artists, Athletes, Entertainers 
4 0.099 
(3152) (7114) 
Independent Artists, Writers, and Performers 
5 Apparel Wholesalers (4243) 0.089 
(7115) 
6 Apparel Knitting Mills (3151) Cut and Sew Apparel Manufacturing (3152) 0.085 
Independent Artists, Writers, and Performers 
7 Motion Picture, Video Industries (5121) 0.085 
(7115) 
8 Apparel Wholesalers (4243) Motion Picture and Video Industries (5121) 0.082 
9 Apparel Knitting Mills (3151) Apparel Wholesalers (4243) 0.081 
Agents for Artists, Athletes, Entertainers 
10 Apparel Wholesalers (4243) 0.074 
(7114) 
 
1.b. Lowest ten industries 
EG 
Rank Industry i (4 digit NAICS code) Industry j (4 digit NAICS code) 
index 
1 Coal Mining (2121) Cut and Sew Apparel Manufacturing (3152) -0.036 
2 Coal Mining (2121) Apparel Wholesalers (4243) -0.036 
3 Oil and Gas Extraction (2111) Cut and Sew Apparel Manufacturing (3152) -0.035 
Cut and Sew Apparel Manufacturing 
4 Pipeline Transportation of Natural Gas (4862) -0.033 
(3152) 
5 Oil and Gas Extraction (2111) Apparel Wholesalers (4243) -0.033 
6 Support Activities for Mining (2131) Apparel Wholesalers (4243) -0.033 
7 Support Activities for Mining (2131) Cut and Sew Apparel Manufacturing (3152) -0.033 
Cut and Sew Apparel Manufacturing 
8 Agriculture Machinery Manufacturing (3331) -0.032 
(3152) 
Agriculture Machinery Manufacturing 
9 Apparel Wholesalers (4243) -0.031 
(3331) 
10 Sawmills and Wood Preservation (3211) Apparel Wholesalers (4243) -0.029 
143 
 
for coagglomeration. This is in spite of the fact that the Ellison et al. (2010) study 
considers only manufacturing industries, while the current study considers a broader 
industry spectrum, and also that the two studies differ in both industry classification 
schemes (SIC versus NAICS) and time (1987 versus 2006 to 2013). This suggests that 
coagglomeration patterns across industries does not vary much over time. A final 
observation is that the observed maximum and minimum values of the EG 
coagglomeration index are also very similar to those of Ellison et al. (2010), 
suggesting that as theorized, the index allows for the comparison of coagglomeration 
values between different geographic units as well as across time.  
 
4.3.2. The EG coagglomeration index and the industry space 
As mentioned previously, this paper attempts to merge the literature on 
coagglomeration highlighted in the previous section with theories of development, 
especially that which is highlighted in Hidalgo et al. (2007) and Jacobs (1969). As 
mentioned previously, models within the development economics literature largely 
focus on either 1) the mix of productive factors such as physical capital, labor, and 
land (Heckscher and Ohlin 1991), or 2) the transformation of production towards more 
advanced products via technological change (Romer 1986). At the regional level, 
traditional central place theory suggests that there exists a hierarchy of urban areas, 
with large metropolises harboring a greater set of goods and services while smaller 
cities are limited in their production mix (Christaller 1966; Losch 1954).  
Structural change theory’s argument of country level development being a process of 
shifting towards an ever more advanced product mix, while informative, is less 
144 
 
relevant at the regional level. Most notably, at the subnational level there exists greater 
competition as well as integration among cities and regions due to lower transportation 
costs and the relative ease of moving goods and services across borders. Jacobs (1969) 
and Thompson (1968) both propose similar theories of city growth that take into 
consideration such nuances. In the initial stage, cities export only a few products, 
concentrating on specializing their production mix such that their comparative 
advantages for such products is maximized. In the second stage, a gradual process of 
economic maturation occurs where locally produced goods and services substitute 
imports. The third stage is characterized by connections with other cities and cluster 
economies which together build a diversified regional metropolis, and the latter stage 
is one of “new work” where new skills and businesses are created based on the 
enlarged and diversified economy which fuels innovation.  
Considering this theoretical backdrop, there seems to exist conflicting views as 
to how regions should develop. In light of the constructed network of the industry 
space, the development economics literature suggests that development should be 
directed solely towards moving towards the core of the network where the export-
oriented, high-spillover industries reside, while the urban planning and urban 
economics literature suggests that development paths are nonlinear. In order to 
facilitate further analysis of conflicting development theories, I first visualize the 
industry space, constructed by connecting 4-digit NAICS industries by the pairwise  
 
145 
 
 
Figure 4.8. Network of industries based on Ellison-Glaeser coagglomeration 
index (traded industries highlighted, nodes sized based on weighted-degree) 
  
146 
 
 
Figure 4.2. Network of industries. Industries are colored based on their average 
annual pay, and sized based on weighted degree centrality 
147 
 
EG coagglomeration index values. 52 Figures 4.1 and 4.2 correspond to this industry 
space, where in Figure 1 the dark nodes correspond to the traded industries as 
classified by Delgado, Porter, and Stern (2010, 2016), and in Figure 4.2 the darker 
nodes are industries with higher average annual pay. First it can be observed that much 
like the product space of Hidalgo et al. (2007), the constructed industry space also 
exhibits a clear core-periphery structure, with the nodes within the core mostly 
corresponding to the traded industries. This is as expected, since most of the traded 
industries are within the manufacturing sector, which usually tend to coagglomerate 
with each other (Marshall 1920; Delgado, Porter, and Stern 2016). From Figure 4.2 
however, it can be seen that the relationship between network position and wages is 
not as clear, with a significant number of industries occupying the periphery also 
exhibiting relatively higher levels of average pay. This suggests that if higher income 
levels are a development objective, the directionality of structural change may well 
point in different directions depending on the circumstances.  
 Table 4.2 lists the weighted degree centrality (i.e. the sum of the link weights 
for any given industry) of the ten most and least central industries. A higher centrality 
value is suggestive of an industry that exhibits strong coagglomerative patterns with 
many other industries, and also suggests that entrepreneurship in such an industry may 
have high potential spillover effects. It can be seen that the industries with the highest 
centrality values are mainly industries within the transportation, mining, and  
                                                 
52 The networks are visualized based on the method used by Hidalgo et al. (2007), where in the first 
step, a “skeleton” of the network is constructed using the Maximum Spanning Tree algorithm. This 
algorithm in essence produces a set of N-1 links (N being the number of industries) that connect all 
nodes in the network with its most proximal neighbor. Subsequently, all links above a certain 
threshold value are added to the skeleton to differentiate between more and less central nodes, while 
at the same time keeping the network visualization tractable.  
148 
 
Table 4.5. Weighted degree centrality of 4 digit NAICS industries 
4.2.a. Highest ten industries 
Weighted degree  
Rank Industry (4 digit NAICS code) 
centrality 
1 Other Pipeline Transportation (4869) 0.709 
2 Pipeline Transportation of Crude Oil (4861) 0.665 
3 Pipeline Transportation of Natural Gas (4862) 0.663 
4 Cut and Sew Apparel Manufacturing (3152) 0.631 
5 Oil and Gas Extraction (2111) 0.575 
Independent Artists, Writers, and Performers 
6 
(7115) 0.573 
Aerospace Product and Parts Manufacturing 
7 
(3364) 0.506 
8 Support Activities for Mining (2131) 0.456 
Agriculture, Construction, and Mining Machinery 
9 
Manufacturing (3331) 0.411 
10 Motion Picture and Video Industries (5121) 0.375 
 
4.2.b. Lowest ten industries 
Weighted degree  
Rank Industry (4 digit NAICS code) 
centrality 
1 Taxi and Limousine Service (4853) -0.575 
2 Department Stores (4521) -0.569 
3 School and Employee Bus Transportation (4854) -0.441 
4 Dry cleaning and Laundry Services (8123) -0.436 
5 Securities and Commodity Exchanges (5232) -0.412 
6 Grocery Stores (4451) -0.411 
7 Other General Merchandise Stores (4529) -0.408 
8 Apparel Knitting Mills (3151) -0.350 
9 Charter Bus Industry (4855) -0.270 
10 Specialty Food Stores (4452) -0.263 
149 
 
manufacturing sectors, which is as expected considering that such sectors benefit more 
from the fundamental forces of agglomeration as documented by Marshall (1920). The 
industries with the lowest centrality values tend to be those that are associated with 
locally oriented services, broadly corresponding to local area amenities. 
 
4.3.3. Cities’ positions within the industry space 
Having constructed the industry space, I loosely follow the method of Hidalgo et al. 
(2007) and first visualize the relative positions of particular cities within the network 
by coloring the industries based on the location quotient of entrepreneurship. Figure 
4.3 depicts the positions of New York and Los Angeles within the industry space, 
where darker nodes correspond to industries with higher location quotients for 
entrepreneurship. The two cities are chosen in this case because they represent the two 
MSAs with the largest populations. It can be seen that the positions of the two cities 
differ, with New York being positioned more towards the periphery of the network 
compared to Los Angeles. New York scores a location quotient for entrepreneurship 
close to 4.9 for both Securities and Commodity Exchanges (NAICS 5232) and 
Apparel Knitting Mills (NAICS 3151), while Los Angeles exhibits a location quotient 
of 8.1 for Cut and Sew Apparel Manufacturing (NAICS 3152), 5.7 for Independent 
Artists, Writers, and Performers (NAICS 7115), and 5.0 for Motion Picture and Video 
Industries (NAICS 5121). 
 While visualization of the individual positions of cities within the industry 
space is revealing, nonetheless the analysis can be benefited by a general measure of 
the position of any given city within the network. I construct a metric which  
150 
 
 
 
Figure 4.3. Entrepreneurship activity for the New York-Northern New Jersey-
Long Island MSA (top) and Los Angeles-Long Beach-Santa Ana MSA (bottom). 
Nodes are colored based on the location quotient of entrepreneurial activity 
151 
 
corresponds to the weighted average centrality of industries for a particular MSA such 
that  
𝑏
∑ 𝑖𝑚𝑖 𝑐𝑖 × 𝐵
𝐶𝑚 =
𝑚  
𝑏
∑ 𝑖𝑚𝑖 𝐵𝑚
where 𝑐𝑖 is the weighted degree centrality of industry i, 𝑏𝑖𝑚 is the count of new 
establishment births for industry i in region m, and 𝐵𝑚 is the aggregate count of new 
establishment births in region m. The metric is thus simply the average centrality of all 
the industries for which entrepreneurship takes place within the region, weighted by 
the share of entrepreneurship in each particular industry. A higher value is suggestive 
of an MSA being located nearer to the core of the network, where the centrality values 
of individual industries is higher, and a lower value suggests that an MSA is located 
nearer towards the periphery. 
 Table 4.3 lists the calculated average centrality values for MSAs based on this 
metric. The MSAs that score the highest and lowest values represent a distinct pattern. 
The highest MSAs are smaller cities that are located in relatively geographically 
isolated areas less surrounded by other urban areas. The lowest MSAs are the large 
metropolises, including cities such as New York, Washington DC, Philadelphia, and 
Boston. Such a striking pattern suggests that the smaller, more isolated urban areas 
exhibit higher levels of entrepreneurship in industries that exhibit strong 
agglomeration patterns, such as manufacturing, mining, transportation and other 
traded industries. The larger urban areas that are part of a greater urban system on the 
contrary exhibit higher levels of entrepreneurship in industries that are 1) rare, and 2) 
geared towards providing local amenities. This is consistent with central place theory  
152 
 
Table 4.6. Average centrality of MSAs 
4.3.a. Highest ten MSAs 
Average weighted 
Rank Metropolitan Statistical Area 
centrality 
1 Midland, TX 0.108 
2 Odessa, TX 0.079 
3 Farmington, NM 0.049 
4 Wichita Falls, TX 0.047 
5 Grand Junction, CO 0.045 
6 Abilene, TX 0.042 
7 Longview, TX 0.041 
8 Lafayette, LA 0.040 
9 Houma-Bayou Cane-Thibodaux, LA 0.040 
10 San Angelo, TX 0.039 
 
4.3.b. Lowest ten MSAs 
Average weighted 
Rank Metropolitan Statistical Area 
centrality 
1 Trenton-Ewing, NJ -0.052 
New York-Northern New Jersey-Long Island, NY-
2 
NJ-PA -0.050 
3 Atlantic City-Hammonton, NJ -0.047 
4 Albany-Schenectady-Troy, NY -0.036 
5 Vineland-Millville-Bridgeton, NJ -0.035 
6 Ocean City, NJ -0.033 
Washington-Arlington-Alexandria, DC-VA-MD-
7 
WV -0.032 
8 Bridgeport-Stamford-Norwalk, CT -0.032 
Philadelphia-Camden-Wilmington, PA-NJ-DE-
9 
MD -0.031 
10 Boston-Cambridge-Quincy, MA-NH -0.030 
153 
 
(Christaller 1966; Losch 1954) and theories of urban development outlined by Jacobs 
(1969) and Thompson (1968), which hypothesizes that large urban areas in the latter 
phases of development are able to produce goods and services that require a greater 
local market to sustain their existence. For example, a high level of new establishment 
births in the department stores or specialty food stores industries is unlikely in small 
urban areas where the demand for such goods and services is relatively scarce. 
 
4.4. Empirical framework 
The main goal of the empirical analysis is to determine whether the positions of MSAs 
within the industry space influence economic growth, and if so, the directionality of 
structural change. Thus it is assumed that the network positions of MSAs at time t -1 
(i.e. the average centrality 𝐶𝑚) impact the growth of the MSA at time t. The outcome 
of interest is a measure of economic size, and in this case I utilize 1) employment, 2) 
log GDP, and 3) log GDP per capita as the relevant metrics. I also include a host of 
control variables that have been utilized in previous studies of growth. I include 
industrial diversity using the Hirschman-Herfindahl Index in order to differentiate the 
effects of increased diversity from the effects of change in network positions of 
MSAs. I also include a measure of market access that proxies for the relative size of 
neighboring markets, in order to control for spatial clustering effects, where  
POP𝑠𝑡
MA𝑟𝑡 =∑  . 𝑑2𝑟𝑠
𝑠≠𝑟
Here POP𝑠𝑡 is the population of the neighboring region and 𝑑
2
𝑟𝑠 is the square of the 
distance between the centroids of the MSAs. I set a threshold value of 300 miles in 
calculating this metric to reflect a reasonable distance for which a market may be 
154 
 
defined. In addition, I also include the aggregate entrepreneurship rate of the MSA, 
calculated as the number of new establishments divided by thousands in the 
laborforce, as well as the log of population to account for city size. Finally, I include 
various demographic controls such as the unemployment rate, homeownership rate, 
educational attainment, as well as the share of manufacturing firms and the number of 
patents per capita.  
 The specification of the model is a simple OLS regression with fixed effects: 
𝐺𝑟𝑜𝑤𝑡ℎ𝑚𝑡 = 𝛼 + 𝛽1𝐶𝑚𝑡−1 + 𝑋𝑚𝑡−1𝛽𝑐 +𝑀𝑚 + 𝑇𝑡 + 𝜀𝑚𝑡−1 
where 𝐶𝑚𝑡−1 is the average centrality measure for MSAs, 𝑋𝑚𝑡−1 is the set of control 
variables, and 𝑀𝑚 and 𝑇𝑡 are the MSA and year fixed effects, respectively. In later  
Table 4.7. Summary statistics 
      
Variables N Mean Standard Min Max 
Deviation 
      
Average 2,784 0.000979 0.0204 -0.0778 0.147 
centrality 
Patents per 2,784 1.443 2.359 0.0305 28.58 
capita 
Diversity 2,784 0.0150 0.00185 0.0120 0.0307 
(Hirschman-
Herfindahl 
Index) 
Market access 2,784 2,412 2,120 51.09 18,289 
Log population 2,784 12.71 1.057 11.16 16.77 
Unemployment 2,784 7.013 2.991 2.017 28.90 
rate 
Homeownership 2,784 67.05 5.636 47.41 85.08 
rate 
Educational 2,784 25.54 8.045 10 59.10 
attainment 
Manufacturing 2,784 11.91 6.718 0 53.97 
share 
New 2,784 3.384 1.159 1.486 12.04 
establishments / 
1,000 labor force 
155 
 
specifications, I include a square term for the centrality measure 𝐶𝑚𝑡−1 in order to test 
for the significance of the Jacobs (1969) hypothesis that pathways for growth are 
nonlinear. I also include interactions terms between the centrality measure 𝐶𝑚𝑡−1 and 
the log population and manufacturing share variables, to account for differential 
effects of centrality on growth based on varying levels of city size and manufacturing 
intensity. I utilize panel data for the years 2006 to 2013, corresponding to eight panel 
years, for a total of 2,784 observations (348 MSAs × 8 years).  
It is important to note that there may be many other explanations for the 
variations in growth levels across MSAs. While the list of control variables is far from 
exhaustive, the careful selection of variables coupled with the utilization of panel data 
(and thus fixed effects) is hoped to soak up a large portion of the unobservables. The 
single most pertinent of these is natural advantages, where growth has been noted to 
be largely influenced by geographic advantages such as proximity to water bodies or 
other physical features (Ellison, Glaeser, and Kerr 2010). The inclusion of MSA fixed 
effects largely eliminates the confounding of the results due to such time invariant 
characteristics at the MSA level, and the year fixed effects eliminate the effects of 
macroeconomic shocks such as the recent recessionary period which affected the 
nation as a whole. 
 
4.5. Results 
4.5.1 OLS estimates 
I first present the main empirical results estimating the effect of MSAs positions 
within the industry space on economic growth. The average weighted centrality 
156 
 
Table 4.5. Regression results – Log employment 
 (1) (2) (3) (4) 
Dependent variable: Employment     
     
Average weighted centrality 0.902* -0.166** -0.042 -0.076 
 (0.473) (0.067) (0.061) (0.053) 
Average weighted centrality2    1.375 
    (0.939) 
Log population 1.032*** 0.646*** 0.558*** 0.547*** 
 (0.008) (0.060) (0.067) (0.066) 
Average weighted centrality × Log population    -0.179*** 
    (0.048) 
Manufacturing share 0.007*** 0.003*** 0.003*** 0.003*** 
 (0.002) (0.000) (0.000) (0.000) 
Average weighted centrality × Manuf. share    -0.043*** 
    (0.009) 
-
Hirschman-Herfindahl Index 26.025*** 2.929*** 4.655** 4.719** 
 (5.095) (0.718) (1.909) (1.950) 
Unemployment rate -0.026*** -0.013*** -0.016*** -0.015*** 
 (0.002) (0.000) (0.001) (0.001) 
Number of establishment births per labor force -0.005 0.031*** 0.021*** 0.022*** 
 (0.007) (0.002) (0.002) (0.002) 
Patents per capita -0.004 -0.004 -0.005* -0.005** 
 (0.006) (0.002) (0.003) (0.003) 
Homeownership rate 0.008*** -0.001 -0.000 -0.000 
 (0.002) (0.000) (0.000) (0.000) 
Educational attainment 0.011*** 0.001 0.000 0.001 
 (0.002) (0.000) (0.000) (0.000) 
Market access -0.000 -0.000 -0.000 0.000 
 (0.000) (0.000) (0.000) (0.000) 
Constant 11.458*** 11.682*** 11.663*** 11.629*** 
 (0.162) (0.058) (0.060) (0.058) 
     
MSA fixed effects  X X X 
Year fixed effects   X X 
N 2,784 2,784 2,784 2,784 
R-squared 0.681 0.636 0.678 0.689 
     
  
can be seen that the average centrality measure is only weakly positively significant 
when no fixed effects are included, and changes signs when MSA fixed effects are 
included. When both MSA and year fixed effects are included, the centrality measure 
157 
 
measure is mean centered in order to make interpretation of the linear and quadratic 
terms within the empirical model more straightforward. Table 4.5 presents the results 
for the specification in which the outcome variable is the log of employment levels. It 
ceases to be significant, and the addition of the quadratic and interactions terms does 
not change this outcome. When it comes to economic growth in terms of employment 
change, the results suggest that not the average centrality of the MSA, but rather 
industrial diversity, city size, manufacturing share, and aggregate entrepreneurship 
levels are more influential in determining growth.  
Turning to the results where GDP is the outcome of interest, it can be seen 
from columns 1 to 3 in Table 4.6 that the average centrality measure is insignificant 
when not considering the quadratic relationship between network position and 
economic growth. However, in column 4 it can be seen that the quadratic term is 
positive and highly significant, lending strength to Jacobs’ (1969) and Thompson’s 
(1968) argument that the relationship between industry mix and growth is nonlinear. 
The positive coefficient for the quadratic term suggests that for MSAs with average 
centrality values below the tipping point, it is more beneficial for growth to continue 
on a pathway for development that shifts the position of the MSA within the industry 
space towards the network periphery. On the contrary, for MSAs with average 
centrality values above the tipping point, the results suggest that it may be more 
beneficial to continue on a path of concentration of entrepreneurship in the industries 
within the core of the network. Thus according to the results, small cities such as 
Midland Texas or Farmington New Mexico with high average centrality values would 
benefit from more entrepreneurship in highly central industries such as manufacturing,  
  
158 
 
Table 4.6. Regression results – Log GDP 
 (1) (2) (3) (4) 
Dependent variable: GDP     
     
Average weighted centrality 1.098 -0.160 -0.061 -0.188* 
 (0.988) (0.113) (0.098) (0.098) 
Average weighted centrality2    7.990*** 
    (2.798) 
Log population 1.074*** 1.106*** 0.882*** 0.876*** 
 (0.011) (0.128) (0.116) (0.110) 
Average weighted centrality × Log population    -0.202** 
    (0.089) 
Manufacturing share 0.001 0.001 0.004** 0.003** 
 (0.002) (0.002) (0.002) (0.002) 
Average weighted centrality × Manuf. share    -0.022 
    (0.015) 
-
Hirschman-Herfindahl Index 15.831** 4.308*** 8.916*** 8.237*** 
 (6.997) (1.274) (3.007) (3.006) 
- - - -
Unemployment rate 0.020*** 0.005*** 0.020*** 0.020*** 
 (0.003) (0.001) (0.002) (0.002) 
Number of establishment births per labor force -0.002 0.033*** 0.028*** 0.027*** 
 (0.010) (0.004) (0.006) (0.005) 
Patents per capita 0.009 0.005 0.006 0.006 
 (0.006) (0.006) (0.007) (0.006) 
Homeownership rate -0.001 0.000 0.001* 0.001** 
 (0.002) (0.001) (0.001) (0.001) 
Educational attainment 0.013*** 0.001 0.000 0.000 
 (0.002) (0.001) (0.001) (0.001) 
Market access 0.000 -0.000 -0.000 -0.000 
 (0.000) (0.000) (0.000) (0.000) 
Constant 9.580*** 9.448*** 9.351*** 9.357*** 
 (0.201) (0.099) (0.088) (0.086) 
     
MSA fixed effects  X X X 
Year fixed effects   X X 
N 2,784 2,784 2,784 2,784 
R-squared 0.667 0.258 0.377 0.389 
     
159 
 
Table 4.7. Regression results – Log GDP per capita 
 (1) (2) (3) (4) 
Dependent variable: GDP per capita     
     
Average weighted centrality 1.274 -0.199* -0.059 -0.188* 
 (0.908) (0.111) (0.097) (0.098) 
Average weighted centrality2    8.054*** 
    (2.831) 
Log population 0.077*** 0.257** 0.056 0.052 
 (0.010) (0.121) (0.117) (0.111) 
Average weighted centrality × Log population    -0.197** 
    (0.090) 
Manufacturing share 0.002 0.001 0.003* 0.003* 
 (0.002) (0.002) (0.002) (0.002) 
Average weighted centrality × Manuf. share    -0.018 
    (0.016) 
Hirschman-Herfindahl Index -15.821** 3.639*** 8.246*** 7.543** 
 (6.526) (1.276) (2.984) (2.983) 
Unemployment rate -0.019*** -0.005*** -0.019*** -0.018*** 
 (0.003) (0.001) (0.002) (0.002) 
Number of establishment births per labor force -0.008 0.031*** 0.023*** 0.023*** 
 (0.010) (0.004) (0.006) (0.005) 
Patents per capita 0.010* 0.005 0.006 0.005 
 (0.005) (0.006) (0.006) (0.006) 
Homeownership rate -0.000 0.000 0.001* 0.001** 
 (0.002) (0.001) (0.001) (0.001) 
Educational attainment 0.013*** 0.001 0.000 0.000 
 (0.002) (0.001) (0.001) (0.001) 
Market access 0.000 -0.000 -0.000 -0.000 
 (0.000) (0.000) (0.000) (0.000) 
10.639** 10.565** 10.464** 10.473**
Constant * * * * 
 (0.186) (0.095) (0.086) (0.084) 
     
MSA fixed effects  X X X 
Year fixed effects   X X 
N 2,784 2,784 2,784 2,784 
R-squared 0.471 0.250 0.355 0.367 
     
160 
 
mining, or transportation, while large cities such as New York, Philadelphia, or 
Boston would benefit from more entrepreneurship in industries near the network 
periphery, such as those that are geared towards local amenities. Interestingly, this 
result is also in line with the argument for ‘consumer cities,’ which suggests that larger 
urban areas attract people and firms due to the diversity and quality of local amenities. 
Because of mean centering, the negative and marginally significant estimate for the 
linear average centrality term simply means that at the aggregate mean, the effect of 
average centrality on GDP growth is negative. 
 The interaction terms between average centrality and population and 
manufacturing share respectively both exhibit negative coefficients, yet only the 
interaction term for population is significant. The results suggest that a larger city size 
mitigates the effects of the average centrality measure, yet the share of manufacturing 
firms in the region has no clear relationship on the marginal effects of the average 
centrality measure. Similar to the regression with employment levels as the outcome 
variable, controls such as industrial diversity, population, unemployment, and 
aggregate entrepreneurship continue to be significant.  
 Table 4.7 presents the results for the specification where the outcome of 
interest is GDP per capita. The coefficient estimates as well as the significance levels 
are very much similar to those for which GDP was considered as the outcome. This is 
especially the case for the linear and quadratic terms of the average centrality measure, 
suggesting that the estimated tipping point is very much similar for both 
specifications. This suggests that similar to overall GDP growth, growth in individual  
161 
 
 
Figure 4.4. Average weighted centrality versus linear prediction for GDP, with 
95% confidence intervals 
 
 
Figure 4.5. Average marginal effects of centrality measure at different levels of 
population, with 95% confidence intervals 
162 
 
wealth also is similarly influenced by the network positions of MSAs within the 
industry space.   
As an additional step, I examine the predictive margins as well as the marginal 
effects of the average centrality measure on GDP growth. I exclude the same analysis  
for GDP per capita due to redundancy. Figure 4.4 plots the relationship between 
average centrality values and the linear prediction, with 95% confidence intervals, for 
average centrality values within the observed range in the data. Figure 4.5 plots the 
relationship between average centrality values and average marginal effects, at 
representative values of log population. It can be seen visually that the tipping point 
mentioned above occurs very close to 0, which corresponds to the aggregate mean. 
While the large confidence interval corresponding to the point estimate of the vertex 
of the parabola lends caution to the direct interpretation of the results, nonetheless the 
overall shape of the curve strongly suggests that the relationship between average 
centrality and GDP and GDP per capita growth is nonlinear. 
 
4.5.2. A mapping of average centrality values for MSAs 
As a final step, I consider the geographic distribution of the average centrality values 
for MSAs by grouping MSAs into two categories corresponding to high and low 
average centrality values. Figure 4.6 depicts the geographic location of the MSAs 
included in the analysis, with the darker MSAs corresponding to the cities with 
centrality values above the average.  
The mapping is consistent with the previously outlined theories of Jacobs 
(1969) and Thompson (1968). The MSAs with the higher centrality values are (with a  
163 
 
 
Figure 4.6. MSA groupings by centrality levels. 
164 
 
few exceptions, most notably Los Angeles) mostly small cities that are isolated from 
other urban areas.53 Thus within the scheme of economic development theory, these 
urban areas can be viewed as being in the early to mid stages of development, where 
greater specialization in export oriented traded industries such as manufacturing, 
transportation, and mining is more beneficial for economic growth. Thus policy 
prescriptions which focus on fostering entrepreneurship in these highly central 
industries would be more beneficial for these MSAs as opposed to policies which 
promote a shift towards industries in the periphery of the industry space.  
On the contrary, most of the MSAs that are below the average are either large 
metropolises or part of a regional urban system surrounded by geographically 
proximal urban areas. Thus for these MSAs, it would be more beneficial to promote 
policies centered towards fostering entrepreneurship in more peripheral industries, 
such as those that cater to local amenities or those are very rare such that they require 
a large market in order to be sustainable. Overall, the empirical results suggest that the 
unidirectional development paths outlined in development economics at the country 
level do not lend well to development at the urban and regional level, and that a more 
nuanced approach to development which considers the current industrial mix as well 
as spatial patterns is needed in order to promote sound economic growth.  
 
4.6. Conclusions 
                                                 
53 The high average centrality of the Los Angeles area is due to the fact that the city has a high 
specialization in the motion picture and video industry (NAICS 5121) and in Independent artists, 
writers, and performers (NAICS 7115), both of which are in the top ten for industries with high 
centrality values.  
165 
 
 
 
Overall, this study provides support for structural change theory at the urban level. I 
find consistent evidence that the position of cities within the industry space has a 
significant relationship to growth. Considering fixed effects, the results also suggest 
that the optimal growth paths for cities depend on the current position of these cities 
within the industry space. Cities that harbor establishment birth patterns that are more 
geared towards high-spillover industries such as manufacturing should continue on 
this path towards the network core in order to achieve further growth. On the other 
hand, cities that show birth patterns focused on local demand oriented industries 
nearer the periphery of the network should continue on their paths toward the network 
periphery. While this relationship is strong when considering GDP and GDP per 
capita, it does not seem to apply when considering the relationship between structural 
change and employment. Such results suggest that structural change, while benefiting 
the overall growth in production of a city, may not have a significant effect on job 
creation. This may be due to other factors, such as the fact that job creation is a 
gradual process that takes longer to manifest than direct increases in output. Due to the 
limitation in panel length of this current study, this lagged effect cannot be studied, 
and thus further investigation to the causes of employment growth is warranted. 
Furthermore, this result may also be due to the fact that structural change and growth 
do not directly correlate with increased jobs. It very well may be the case that output 
growth does not lead to job creation, but rather increases in productivity of current 
workers, leading to higher wages. It could even be the case that output growth is the 
result of specialization and automation, which would also dampen the employment 
gain effects of structural change. 
166 
 
 
 
 When considering the spatial location of cities together with their position 
within the industry space, the basic conclusion is consistent with that of central place 
theory (Christaller 1966; Losch 1954) and the argument of Jacobs (1969). Cities that 
are spatially clustered within a larger urban system generally possess industrial 
structures that are more focused on local amenities and local demand. Examples of 
such industries are department stores, specialty food stores, or amusement parks and 
arcades. According to central place theory, such industries for which per-capita 
demand is low locate in large cities because they require a threshold amount of 
demand in order to exist. Large cities, or cities that are part of a larger urban system 
have the luxury of being able to harbor such industries, and the results suggest that 
focusing on such industries (near the periphery of the industry space) may be more 
beneficial than promoting growth in high-spillover industries nearer the core of the 
industry space. However, cities that are small and isolated do not have this luxury, and 
must concentrate on the high-spillover industries near the core in order to maintain 
maximal economic growth. Such results contradict the linear stages of growth models 
within the development economics literature (Domar 1947; Harrod 1948; Rostow 
1962), and imply that national growth and subnational growth follow different 
trajectories. 
  
167 
 
 
 
APPENDIX  
Pairwise correlations 
Avg. 
Log Educa-
weigh- Log Diver- Home- 
 Employ Log GDP Manuf. Unemp- Estab.  tional ted popul- sity Patents  Owner-
ment GDP per share rate births attain-
central- ation (HHI) ship 
capita ment 
ity 
Employment             
Log GDP 0.98            
Log GDP 
0.56 0.61           
per capita 
Avg. 
weighted -0.16 -0.17 -0.11          
centrality 
Log 
0.98 0.97 0.45 -0.16         
population 
Manufacturi
-0.21 -0.25 -0.11 0.03 -0.26        
ng share 
Diversity 
-0.31 -0.27 -0.30 0.04 -0.24 -0.17       
(HHI) 
Unemploym
-0.08 -0.06 -0.34 0.03 0.01 0.01 0.23      
ent rate 
Establishme
0.17 0.18 0.14 -0.03 0.16 -0.37 0.05 -0.28     
nt births 
Patents  0.20 0.22 0.36 -0.17 0.17 0.00 -0.08 -0.10 0.11    
Homeowner
-0.12 -0.17 -0.10 0.07 -0.17 0.21 -0.11 -0.13 0.03 -0.11   
ship 
Educational 
0.40 0.43 0.57 -0.27 0.35 -0.31 -0.14 -0.29 0.25 0.53 -0.29  
attainment 
Market 
0.03 0.04 0.06 -0.39 0.03 0.22 0.01 0.11 -0.26 0.13 0.14 0.05 
access 
  
168 
 
 
 
REFERENCES 
 
Aghion, Philippe, and Peter Howitt. 1992. “A Model of Growth Through Creative 
Destruction.” Econometrica 60 (2): 323–51. https://doi.org/10.2307/2951599. 
Bartik, Timothy. 1992. “The Effects of State and Local Taxes on Economic 
Development: A Review of Recent Research.” Economic Development 
Quarterly 6 (1): 102–11.  
———. 2005. “Solving the Problems of Economic Development Incentives.” Growth 
and Change 36 (2): 139–66. 
Batty, Michael. 2013. The New Science of Cities. Cambridge, MA: MIT Press. 
Berry, Christopher R, and Edward L Glaeser. 2005. “The Divergence of Human 
Capital Levels across Cities.” Papers in Regional Science 84 (3): 407–44. 
Cassel, Gustav. 1932. The Theory of Social Economy. New York: Harcourt Brace. 
Chenery, Hollis B. 1960. “Patterns of Industrial Growth.” The American Economic 
Review 50 (4): 624–54. 
Chenery, Hollis B, Moises Syrquin, and Hazel Elkington. 1975. Patterns of 
Development, 1950-1970. Vol. 75. Oxford University Press London. 
Chenery, Hollis B, and Lance Taylor. 1968. “Development Patterns: Among Countries 
and over Time.” The Review of Economics and Statistics, 391–416. 
Christaller, Walter. 1966. Central Places in Southern Germany. Prentice-Hall. 
Delgado, Mercedes, Michael E Porter, and Scott Stern. 2010. “Clusters and 
Entrepreneurship.” Journal of Economic Geography 10 (4): 495–518. 
169 
 
 
 
———. 2016. “Defining Clusters of Related Industries.” Journal of Economic 
Geography 16 (1): 1–38. 
Domar, Evsey D. 1947. “Expansion and Employment.” The American Economic 
Review 37 (1): 34–55. 
Ellison, Glenn, and Edward L Glaeser. 1997. “Geographic Concentration in US 
Manufacturing Industries: A Dartboard Approach.” Journal of Political 
Economy 105 (5): 889–927. 
Ellison, Glenn, Edward L Glaeser, and William R Kerr. 2010. “What Causes Industry 
Agglomeration? Evidence from Coagglomeration Patterns.” The American 
Economic Review 100 (3): 1195–1213. 
Glaeser, Edward L. 2008. Cities, Agglomeration, and Spatial Equilibrium. Oxford 
University Press. 
Glaeser, Edward L, and Joshua D Gottlieb. 2006. “Urban Resurgence and the 
Consumer City.” Urban Studies 43 (8): 1275–99. 
Glaeser, Edward L, Jed Kolko, and Albert Saiz. 2001. “Consumer City.” Journal of 
Economic Geography 1 (1): 27–50. 
Glaeser, Edward L, JoséA Scheinkman, and Andrei Shleifer. 1995. “Economic 
Growth in a Cross-Section of Cities.” Journal of Monetary Economics 36 (1): 
117–43. 
Harrod, Roy Forbes. 1948. Towards a Dynamic Economics, Some Recent 
Developments of Economic Theory and Their Application to Policy. London: 
Macmillan. 
170 
 
 
 
Heckscher, Eli Filip, and Bertil Gotthard Ohlin. 1991. Heckscher-Ohlin Trade Theory. 
The MIT Press. 
Helsley, Robert W, and William C Strange. 1990. “Matching and Agglomeration 
Economies in a System of Cities.” Regional Science and Urban Economics 20 
(2): 189–212. 
Hidalgo, César A, and Ricardo Hausmann. 2009. “The Building Blocks of Economic 
Complexity.” Proceedings of the National Academy of Sciences 106 (26): 
10570–75. 
Hidalgo, César A, Bailey Klinger, A-L Barabási, and Ricardo Hausmann. 2007. “The 
Product Space Conditions the Development of Nations.” Science 317 (5837): 
482–87. 
Ioannides, Yannis M. 2013. From Neighborhoods to Nations: The Economics of 
Social Interactions. Princeton University Press. 
Ioannides, Yannis M, and Giorgio Topa. 2010. “Neighborhood Effects: 
Accomplishments and Looking beyond Them.” Journal of Regional Science 
50 (1): 343–62. 
Jacobs, Jane. 1969. The Economy of Cities. New York: Vintage. 
Krugman, Paul. 1991. Geography and Trade. Cambridge, MA: MIT Press. 
Kuznets, Simon S. 1971. Economic Growth of Nations: Total Output and Production 
Structure. Cambridge: Belknap Press of Harvard University Press. 
Lewis, W Arthur. 1954. “Economic Development with Unlimited Supplies of 
Labour.” The Manchester School 22 (2): 139–91. 
171 
 
 
 
Lewis, William C. 1972. “A Critical Examination of the Export-Base Theory of 
Urban-Regional Growth.” The Annals of Regional Science 6 (2): 15–25. 
Losch, August. 1954. “Economics of Location.” 
Lucas, Robert E. 1988. “On the Mechanics of Economic Development.” Journal of 
Monetary Economics 22: 3–42. 
———. 2001. “Externalities and Cities.” Review of Economic Dynamics 4 (2): 245–
74. 
Malthus, Thomas Robert. 1888. An Essay on the Principle of Population: Or, A View 
of Its Past and Present Effects on Human Happiness. Reeves & Turner. 
Marshall, Alfred. 1920. Principles of Economics. London: MacMillan. 
Moretti, Enrico. 2004. “Human Capital Externalities in Cities.” In Handbook of 
Regional and Urban Economics, 4:2243–91. Elsevier. 
North, Douglass C. 1955. “Location Theory and Regional Economic Growth.” 
Journal of Political Economy 63 (3): 243–58. 
Rauch, James E. 1993. “Productivity Gains from Geographic Concentration of Human 
Capital: Evidence from the Cities.” Journal of Urban Economics 34 (3): 380–
400. 
Ricardo, David. 1891. Principles of Political Economy and Taxation. G. Bell. 
Romer, Paul M. 1986. “Increasing Returns and Long-Run Growth.” The Journal of 
Political Economy, 1002–37. 
Rosenthal, Stuart S, and William C Strange. 2001. “The Determinants of 
Agglomeration.” Journal of Urban Economics 50 (2): 191–229. 
172 
 
 
 
———. 2003. “Geography, Industrial Organization, and Agglomeration.” Review of 
Economics and Statistics 85 (2): 377–93. 
Rostow, Walt W. 1962. The Stages of Economic Growth: A Non-Communist 
Manifesto. Cambridge, MA: Cambridge university press. 
Saxenian, AnnaLee. 1996. Regional Advantage: Culture and Competition in Silicon 
Valley and Route 128. Cambridge, MA: Harvard University Press. 
Shapiro, Jesse M. 2006. “Smart Cities: Quality of Life, Productivity, and the Growth 
Effects of Human Capital.” The Review of Economics and Statistics 88 (2): 
324–35. 
Smith, Adam. 1776. An Inquiry into the Nature and Causes of the Wealth of Nations. 
New York: Bartleby. 
Solow, Robert M. 1956. “A Contribution to the Theory of Economic Growth.” The 
Quarterly Journal of Economics 70 (1): 65–94. 
Swan, Trevor W. 1956. “Economic Growth and Capital Accumulation.” Economic 
Record 32 (2): 334–61. 
Thompson, Wilbur Richard. 1968. A Preface to Urban Economics. Baltimore: Johns 
Hopkins University Press. 
Tiebout, Charles M. 1956. “A Pure Theory of Local Expenditures.” Journal of 
Political Economy 64 (5): 416–24. 
Todaro, M.P., and S.C. Smith. 2012. Economic Development. Pearson Series in 
Economics. Addison-Wesley.  
173 
 
 
 
Zheng, Lingwen, and Mildred Warner. 2010. “Business Incentive Use among U.S.  
Local Governments: A Story of Accountability and Policy Learning.” 
Economic Development Quarterly 24 (4): 325–36.  
 
 
  
174 
 
 
 
CHAPTER 5 
CONCLUDING REMARKS 
 
The underlying premise of this dissertation has been that urban economies comprise a 
complex system, where various socioeconomic actors interact to together create 
emergent outcomes, such as growth and decline. One of the key theoretical arguments 
has been that social interactions underlie economic outcomes such as agglomeration 
economies, inequalities in socioeconomic resources, entrepreneurship, job creation, 
and economic growth. Utilizing this framework, this dissertation has attempted to 
answer a series of key questions regarding the interface between social interactions, 
agglomeration economies, new firm formation, and economic growth.  
The first paper focused on the question of how the dynamics of social 
interactions that take place within a spatial setting affect the inequality in 
socioeconomic resources among social actors. Utilizing an agent-based model of 
social network formation based on a model of preferential attachment within space, 
the paper first examined the evolution of degree distributions under different 
parameter configurations to establish conditions in which the power law is sustained 
and the cases in which it breaks down. While the presentation focused on a few select 
parameter settings, sensitivity analysis reveals that the results are robust across 
different configurations for world size and introvert visibility. Generally, it is found 
that networks in which ties are scarce and the rate of tie dissolution is relatively high 
exhibit power law degree distributions. In addition, networks with link dissolution are 
found to be fundamentally different from those in which ties are relatively permanent, 
175 
 
 
 
underscoring the importance of distinguishing between the two types of networks. The 
results suggest that the power law distributions found for networks grown under PA 
are just a special case of a broader class of networks that take into consideration 
churning dynamics. With regards to inequalities in social resources, it is found that the 
relationship between network density and inequality evolves in three distinct phases. 
Sparse networks exhibit a decrease in social capital inequality as network density 
increases, moderately dense networks exhibit increases in inequality with higher 
density, and very dense networks exhibit a decrease in inequality as the network 
reaches full saturation. The model suggests that due consideration for the relative 
strength of tie formation and dissolution is warranted when aiming to mitigate 
disparities over control of network resources. For example, when considering the 
spread of tacit information, encouraging more networking activity in an ethnic enclave 
where ties are relatively permanent and dense would have very different results than 
encouraging such activity among trade association members where ties are weaker and 
more transient. This relationship between network density and inequality among 
agents is further complicated when considering the spatial aspects of inequality. It is 
found that spatial inequality is greater – in the form of higher social capital agents 
being distributed near the core – when inequality among agents overall is low, 
suggesting that we should acknowledge the potential tradeoff between spatial 
inequality and individual inequality with respect to social resources.  
The second paper builds upon the idea that social interactions and economic 
outcomes are related, by examining the relative strengths of social capital and the 
three Marshallian agglomeration economies in promoting entrepreneurship and new 
176 
 
 
 
firm formation in cities. The key argument has been that social interactions, and more 
broadly social capital within the community or region, aids entrepreneurs in the early 
stages of forming new firms. I argue that social interactions and agglomeration 
economies are better represented as characteristics of a network that consider the 
broader regional entrepreneurial ecosystem, and propose a set of measures based on 
various constructed networks of industries, patents, and nonprofit organizations. 
Utilizing current data on entrepreneurship taken from the Statistics of US Businesses, 
a panel model of the count of new firm births in a Metropolitan Statistical Area-
industry pair is estimated as a function of labor market proximity, input-output 
linkages, knowledge spillovers, and community social capital. I find evidence in 
support of all mechanisms, with labor market proximity being the most dominant. 
However, the relative magnitudes of their effects differ when considering a diverse set 
of industries, including the traded, local, high-tech, low-tech, manufacturing, and non-
manufacturing sectors, suggesting the importance of diversified entrepreneurship 
policies when considering economic development. These results are non-trivial 
considering that most previous studies have focused on a narrow subset of industries 
in testing the effects of agglomerative forces on entrepreneurship. 
Given that many theories exist as to why economic growth takes place but few 
consider how growth should take place given this theoretical background, the final 
paper considers the question of how specifically urban economies should grow. 
Viewing economic growth as a process of structural change, I construct an ‘industry 
space’ that consists of a network of industries linked by coagglomeration patterns. 
Using a measure that quantifies cities’ industrial structure as their position within this 
177 
 
 
 
industry space, I conduct empirical analysis on the relationship between industrial 
structure and economic growth. Results suggest that optimal growth paths vary 
depending on current industrial structure as well as the spatial location of cities. Cities 
that are isolated and more focused on high-spillover industries such as manufacturing 
should continue on this growth path, while those that clustered within a larger urban 
system and more focused on local amenities should also follow their current patterns 
of specialization. Overall, the results are consistent with central place theory, and 
provides support for structural change theory at the urban level. Consistent evidence is 
found that the position of cities within the industry space has a significant relationship 
to growth. Such results contradict the linear stages of growth models within the 
development economics literature, and imply that national growth and subnational 
growth follow different trajectories. 
Overall, the goal of this series of papers has been to provide policy relevant 
evidence in support of the notion that community development and economic growth 
are interrelated, as well as providing planners and policy makers alike with a 
methodology to identify detailed pathways for economic growth that take into account 
the specific socioeconomic and spatial circumstances of the city. Future research 
should emphasize the inseparability of social factors of a region with its economic 
outcomes, for the overarching findings suggest that the two are intricately related and 
mutually reinforcing. Furthermore, future research could be benefited by studying the 
urban economy as a complex system, through various techniques such as agent-based 
modeling and network analysis that have been attempted in this series of papers.  
 
178