CITIES AS COMPLEX SYSTEMS: SOCIAL INTERACTIONS, AGGLOMERATION, AND ECONOMIC GROWTH A Dissertation Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Jaebeum Cho May 2018 Β© 2018 Jaebeum Cho CITIES AS COMPLEX SYSTEMS: SOCIAL INTERACTIONS, AGGLOMERATION, AND ECONOMIC GROWTH Jaebeum Cho, Ph. D. Cornell University 2018 A key distinguishing feature of cities is that the population density is high relative to non-urban areas. Arising from this density is the frequent contact between various socioeconomic actors, which provides for the means of social interactions as well as productivity gains accrued through agglomeration economies. This collection of papers begins on the premise that social interactions underlie economic forces, which constitute the ingredients of the complex system that is the urban economy, jointly determining the outcome of cities as a whole. With such a view of the urban economy, this dissertation attempts to answer a series of key questions regarding the interface between social interactions, agglomeration economies, new firm formation, and economic growth. The first paper proposes an agent-based model of social network formation that explicitly considers space and untangles the complex relationship between social interaction dynamics and inequalities in socioeconomic resources. The second paper builds upon the insight that social interactions and economic outcomes are related and addresses the question of how social interactions and agglomeration economies jointly determine new firm formation in cities. Finally, the last paper attempts to answer the critical question of how urban economies should grow, under the premise that growth takes place through changes in industrial structure brought about by entrepreneurship in particular industries. Knowledge of the underlying mechanisms of social interactions, and how such interactions bring about new firm formation and economic growth provides for both a theoretical and empirical framework for which planning interventions can be made within the realm of community and economic development. The findings could be used to assist planners in better understanding the workings of the urban economy and inform decision making that aims to promote sustained economic growth. BIOGRAPHICAL SKETCH Jaebeum Cho was born in Seoul, Korea yet spent most of his childhood living outside of his homeland in countries such as the U.S., Canada, and Singapore. He obtained bachelors and masters degrees in Urban Planning and Engineering from Yonsei University, Korea and worked as an economic development planner for three years prior to joining the City and Regional Planning department at Cornell University in the Fall of 2012. Since then, his doctoral research has revolved around community economic development, with a particular emphasis on regional science and urban economics. v To everyone that has helped me along this journey vi ACKNOWLEDGMENTS I would like to express my deepest appreciation to my committee chair, Professor Kieran P. Donaghy, as well as to my committee members, Professor Yuri S. Mansury, Professor M. Diane Burton, and Professsor Benjamin T. Cornwell for their continued and extensive support all throughout my doctoral studies. I truly was gifted with a supportive committee that provided me with both knowledge as well as emotional support during good and bad times. I would also like to thank my family for their support, as well as their unwavering faith in me even when I sometimes lost faith in myself. In addition, I would like to acknowledge the Ewing Marion Kauffman Foundation, for which this research was partially funded by. Even with the aid provided by others, I acknowledge that the contents of this publication are solely the responsibility of myself. vii TABLE OF CONTENTS CHAPTER 1 INTRODUCTION .......................................................................................................... 1 CHAPTER 2 CHURNING, POWER LAWS, AND INEQUALITY IN A SPATIAL AGENT-BASED MODEL OF SOCIAL NETWORKS .................... 11 2.1. Introduction .......................................................................................................... 11 2.2. Theoretical framework ......................................................................................... 15 2.3. The model ............................................................................................................ 21 2.4. Algorithm implementation ................................................................................... 25 2.5. Simulation results ................................................................................................. 29 2.6. Conclusions .......................................................................................................... 50 CHAPTER 3 AGGLOMERATION, REGIONAL SOCIAL CAPITAL, AND ENTREPRENEURSHIP IN CITIES ................................................................ 68 3.1. Introduction .......................................................................................................... 68 3.2. Related literature .................................................................................................. 72 3.3. Data and variables ................................................................................................ 79 3.4. Empirical framework ........................................................................................... 97 3.5. Results ................................................................................................................ 101 3.6. Conclusions ........................................................................................................ 113 CHAPTER 4 PATHWAYS FOR ENTREPRENEURSHPI DRIVEN ECONOMIC GROWTH: ENVISIONING THE INDUSTRY SPACE ................................................................................................. 128 4.1. Introduction ........................................................................................................ 128 4.2. Related literature ................................................................................................ 131 4.3. The industry space ............................................................................................. 139 4.4. Empirical framework ......................................................................................... 154 4.5. Results ................................................................................................................ 156 4.6. Conclusions ........................................................................................................ 165 CHAPTER 5 CONCLUDING REMARKS .................................................................................... 173 viii LIST OF FIGURES Figure 2.1. ABM flow chart ........................................................................................ 27 Figure 2.2. Degree distributions for select parameter settings .................................... 30 Figure 2.3. Relationship between power law fit and network churn parameters ........ 33 Figure 2.4. Network formation dynamics under two different parameter configurations ............................................................................................ 34 Figure 2.5. Tie formation, decay, and aggregate social capital .................................. 36 Figure 2.6. Relationships between tie-formation, decay, and the Gini coefficient ..... 39 Figure 2.7. Differences in social capital between high and low human capital agents ......................................................................................................... 42 Figure 2.9. Spatial distribution of agents with 𝑆̿̿̿̿𝐢𝑖 > 1 for representative parameter settings ....................................................................................... 47 Figure 2.10. Differences in social capital between introverts and extroverts ............. 49 Figure 4.1. Network of industries based on Ellison-Glaeser coagglomeration index ............................................................................ 146 Figure 4.2. Network of industries ............................................................................. 147 Figure 4.3. Entrepreneurship activity for the New York-Northern New Jersey-Long Island MSA (top) and Los Angeles- Long Beach-Santa Ana MSA (bottom) ................................................... 151 Figure 4.4. Average weighted centrality versus linear prediction for GDP .............. 162 Figure 4.5. Average marginal effects of centrality measure at different levels of population ................................................................................. 162 Figure 4.6. MSA groupings by centrality levels ....................................................... 164 ix LIST OF TABLES Table 3.1. Count of new firms and entry rates for single and all establishment births .................................................................................... 83 Table 3.2. Select descriptive statistics for variables ................................................... 96 Table 3.3. Births of single (start-up) and all establishments ..................................... 102 Table 3.4. Births of single (start-up) and all establishments, traded versus local industries, Poisson estimates ...................................... 108 Table 3.5. Births of single (start-up) and all establishments, high-tech versus low-tech industries, Poisson estimates .......................... 110 Table 3.6. Births of single (start-up) and all establishments, manufacturing versus non-manufacturing industries, Poisson estimates ...................................................................................... 112 Table 4.1. Ellison-Glaeser (EG) coagglomeration index values ............................... 143 Table 4.2. Weighted degree centrality of 4 digit NAICS industries ......................... 149 Table 4.3. Average centrality of MSAs .................................................................... 153 Table 4.4. Summary statistics ................................................................................... 155 Table 4.5. Regression results – Log employment ..................................................... 157 Table 4.6. Regression results – Log GDP ................................................................. 159 Table 4.7. Regression results – Log GDP per capita ................................................ 160 x CHAPTER 1 INTRODUCTION Traditionally, there has been much debate regarding the exact definition of a city. Urban economists usually define a city as β€œa geographical area that contains a large number of people in a relatively small area (O’Sullivan 2012),” while the Economist Intelligence Unit defines cities as β€œthe urban agglomeration or metropolitan area it holds together (Economist Intelligence Unit 2013).” Other more specific definitions of cities exist; for example, the U.S. Census Bureau considers urban areas to be geographical areas with a minimum population of 2,500 people and a minimum density of 500 people per square mile, and a Metropolitan Statistical Area (MSA) to be a core area with a substantial population nucleus and adjacent areas that are economically integrated, with a total population of 50,000 or above. Whichever definition is used, a city distinguishes itself from non-urban areas in that the population density is high relative to the density of surrounding regions. This emphasis on population density is due to an essential feature of an urban area, namely the frequent contact between different socioeconomic actors, which is feasible only when individuals, firms and households are concentrated in a relatively small area. The natural question to ask then is why do cities exist? Considering that people need land to produce food and other essential resources, living in cities is in a sense counterintuitive for it separates us from the origins where critical commodities are produced. Furthermore, cities are noisy, dirty, and crowded. The presence of cities 1 despite these drawbacks is due to a number of factors, which relate to the benefits of colocation that more than offset the negative effects. The fundamental benefits of density are due to increased productivity resulting from specialization and agglomeration (Marshall 1920; Smith 1776). Specialization allows each person to be more productive by allowing for 1) allocational efficiency, and 2) technical efficiency. Allocational efficiency is related to making the best use of a particular worker’s skillset by assigning different tasks to workers who possess different aptitudes. Technical efficiency arises from the reduction of transition times between different tasks. Specialization is benefited by higher density for more workers allow for a better skillset match. In addition to specialization, higher density results in scale economies, or increasing returns to scale in production. Due to various agglomeration externalities, the increase in output is more than proportional to the increase in inputs, resulting in a decline in average costs and thus higher productivity. Alfred Marshall (1920) famously noted the underlying mechanisms of agglomeration economies, or the economic forces that cause firms to locate close to one another. The first is related to the sharing of intermediate inputs, where competing firms locate close to one another to share intermediate inputs of production. Intermediate inputs are goods and services that a firm produces that is used as inputs in the production process of other firms; for example, the classic example is that of dressmaking firms sharing a buttonmaker (Vernon 1972). Due to economies of scale, the cost per intermediate input decreases as the quantity increases, leading to lower production costs. The second agglomeration economy is related to the sharing of labor pools. A large labor market allows for workers to readily shift across employers, thus 2 reducing labor market uncertainty, and also facilitates better matches between firms and workers (Helsley and Strange 1990). Finally, knowledge spillovers are also a source of agglomeration economies, and entails the sharing of knowledge among firms in an industry. This results in β€œthe mysteries of the trade becoming no mystery; but are as it were in the air (Marshall 1920).” One of the key arguments presented in this dissertation is that agglomeration economies exist due to the presence of social interactions (Durlauf and Ioannides 2010; Glaeser 2008; Ioannides 2013). For example, proximity to customers and suppliers may reduce the costs of obtaining inputs or transporting goods to downstream consumers (Ellison, Glaeser, and Kerr 2010; Fujita, Krugman, and Venables 1999), but it also may embody stronger social ties between similar firms and customers that increases trust and information exchange (Dahl and Sorenson 2012). Similarly, labor market pooling shields workers from firm-specific shocks (Krugman 1991) and promotes better worker-firm matches (Helsley and Strange 1990), but it also represents social homophily (McPherson, Smith-Lovin, and Cook 2001). Especially with knowledge spillovers, the spillover of ideas is possible because individuals collocate and gain information through social linkages, which allow the knowledge β€œin the air (Marshall 1920)” to be shared with one another (Saxenian 1996). The overarching theme of the papers included in this dissertation is that the economies of cities comprise a β€œcomplex system,” which is defined as a system that exhibits adaptive, nontrivial, emergent, and self-organizing behaviors stemming from agents with rules of operation and no central control (Arthur 2013). The reasoning 3 behind this view stems from the fact that economic agents are faced with fundamental uncertainty; they do not know what they face, and thus in any economic situation, forecasts, strategies, and actions are being β€œtested” for survival within a situation those beliefs, strategies and actions create. Within an urban economy, people, firms, and governments react to the aggregate outcome these agents together create, without a mechanism for central control. Furthermore, at the regional level, regional economies evolve by adapting to their current circumstances and through competition and collaboration with neighboring areas. Thus in this sense, urban economies are the perfect example of a complex system. Another key aspect of a complex system is that the individual agents interact with one another to bring about emergent outcomes. Considering the agglomeration mechanisms discussed above, these types of externalities that occur due to physical proximity directly embody interactions within space; otherwise, there simply would be no benefit due to spatial proximity. Furthermore, the social interactions that underlie agglomeration mechanisms are also manifested due to social networks and the interaction of social actors within an area. In order to best represent the urban economy as a complex system, I will thus forward stress the importance of representing various aspects of the economy as networks of firms, organizations, people, and even ideas. Networks, by definition, are the joint set of nodes and their linkages, which makes them a perfect vehicle onto which the interactions that occur within a spatial economy can be depicted and analyzed. Utilizing this theoretical framework, this dissertation attempts to answer a series of key questions regarding the interface between social interactions, 4 agglomeration economies, new firm formation, and economic growth. The first paper focuses on the question of how the dynamics of social interactions that take place within a spatial setting affect the inequality in socioeconomic resources among social actors. Utilizing an agent-based model of social network formation that explicitly considers space, one of the main contributions of the paper is the addition of the spatial dimension to social network analysis. Traditional models of networks such as the preferential attachment model of BarabΓ‘si and Albert (1999), the random graph model of Erdos and RΓ©nyi (1960), or the small-world model of Watts and Strogatz (1998) all assume that geographic space has little to no relevance. Casual empiricism however suggests that space matters in important ways. Urban dwellers for example rely much more for mutual support on local neighbors than on acquaintances in other cities (Gans 1962; Mansury and Shin 2015). By situating social actors within space and varying the dynamics of social interactions, the paper attempts to answer the policy relevant question of how to minimize inequalities in socioeconomic resources, given a model of social network formation that benefits actors with more connections and underlying human capital. The second paper moves on to address the question of how social interactions and agglomeration economies jointly determine new firm formation within cities. The key argument in this paper is that social interactions, and more broadly social capital within the community or region, aids entrepreneurs in the early stages of forming new firms. Social aspects of the region have been viewed to be a crucial element of regional competitiveness (Kitson, Martin, and Tyler 2004; Porter 2003), where the social characteristics of a region are not simple aggregations of firms or individuals. 5 Porter (1998) suggests that a key component of cluster formation and success is the degree of social embeddedness, the existence of facilitative social networks, social capital, and institutional structures. Similarly, Storper (1995, 2013) stresses the importance of β€œuntraded interdependencies” such as networks of trust and cooperation as well as local norms and conventions when considering the success of regions. Thus, the natural question to ask is whether there is a role that regional social capital plays in promoting entrepreneurship, over and above the effect of social interactions at the micro-level. This paper attempts to unify the treatment of regional social capital and agglomeration economies as being part of the broader β€œentrepreneurial ecosystem” of a region, where the ecosystem takes its form in various types of networks and their linkages. The aim is to present findings regarding the relative strengths of these mechanisms that may aid planners and policy makers in promoting entrepreneurship and economic growth within cities. The final paper attempts to answer the critical question of how urban economies should grow. Many theories exist as to why economic growth takes place. Adam Smith (1776) emphasized capital deepening, or the increase in physical capital per worker, while more recent models of endogenous growth (Lucas 1988; Romer 1986) focus on human capital, technological change, and knowledge economies. Of course, the regional science and urban economics literature has focused on agglomeration effects to play a critical role on growth at the urban level. The main contribution of this paper is to bridge the gap between the many theories that explain the causes of growth with the relative paucity of theories that elucidate how growth should take place, given the theoretical background. Integrating new insights from 6 complexity science and development economics with more traditional theories of economic development that exist in the urban planning and urban economics literatures, the paper studies optimal patterns of economic growth, defined as structural change (Lewis 1954) that takes place through a shift in the underlying industrial structure of cities caused by new firm formation. Such pathways for economic growth through new firm formation should prove to be a useful tool for planners and policy makers alike in promoting job creation and growth within communities and regions. 7 REFERENCES Arthur, W Brian. 2013. Complexity and the Economy. Oxford University Press. BarabΓ‘si, Albert-LΓ‘szlΓ³, and RΓ©ka Albert. 1999. β€œEmergence of Scaling in Random Networks.” Science 286 (5439): 509–12. Dahl, Michael S, and Olav Sorenson. 2012. β€œHome Sweet Home: Entrepreneurs’ Location Choices and the Performance of Their Ventures.” Management Science 58 (6): 1059–71. Durlauf, Steven N., and Yannis M. Ioannides. 2010. β€œSocial Interactions.” Annual Review of Economics 2 (1): 451–78. Economist Intelligence Unit. 2013. Hot Spots 2025: Benchmarking the Future Competitiveness of Cities. The Economist, London. Ellison, Glenn, Edward L Glaeser, and William R Kerr. 2010. β€œWhat Causes Industry Agglomeration? Evidence from Coagglomeration Patterns.” The American Economic Review 100 (3): 1195–1213. Erdos, Paul, and AlfrΓ©d RΓ©nyi. 1960. β€œOn the Evolution of Random Graphs.” Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5: 17–61. Fujita, Masahisa, Paul Krugman, and Anthony J. Venables. 1999. The Spatial Economy: Cities, Regions, and International Trade. Cambridge, MA: MIT Press. Gans, Herbert J. 1962. The Urban Villagers: Group and Class in the Life of Italians- Americans. [New York]: Free Press of Glencoe. 8 Glaeser, Edward L. 2008. Cities, Agglomeration, and Spatial Equilibrium. Oxford University Press. Helsley, Robert W, and William C Strange. 1990. β€œMatching and Agglomeration Economies in a System of Cities.” Regional Science and Urban Economics 20 (2): 189–212. Ioannides, Yannis M. 2013. From Neighborhoods to Nations: The Economics of Social Interactions. Princeton University Press. Kitson, Michael, Ron Martin, and Peter Tyler. 2004. β€œRegional Competitiveness: An Elusive yet Key Concept?” Regional Studies 38 (9): 991–99. Krugman, Paul. 1991. Geography and Trade. Cambridge, MA: MIT Press. Lewis, W Arthur. 1954. β€œEconomic Development with Unlimited Supplies of Labour.” The Manchester School 22 (2): 139–91. Lucas, Robert E. 1988. β€œOn the Mechanics of Economic Development.” Journal of Monetary Economics 22: 3–42. Mansury, Yuri, and JK Shin. 2015. β€œSize, Connectivity, and Tipping in Spatial Networks: Theory and Empirics.” Computers, Environment and Urban Systems 54: 428–37. Marshall, Alfred. 1920. Principles of Economics. London: MacMillan. McPherson, Miller, Lynn Smith-Lovin, and James M Cook. 2001. β€œBirds of a Feather: Homophily in Social Networks.” Annual Review of Sociology, 415–44. O’Sullivan, Arthur. 2012. Urban Economics. 8th ed. New York, NY: McGraw- Hill/Irwin. 9 Porter, Michael E. 1998. β€œLocation, Clusters, and the New Microeconomics of Competition.” Business Economics, 7–13. β€”β€”β€”. 2003. β€œThe Economic Performance of Regions.” Regional Studies 37 (6–7): 549–78. Romer, Paul M. 1986. β€œIncreasing Returns and Long-Run Growth.” The Journal of Political Economy, 1002–37. Saxenian, AnnaLee. 1996. Regional Advantage: Culture and Competition in Silicon Valley and Route 128. Cambridge, MA: Harvard University Press. Smith, Adam. 1776. An Inquiry into the Nature and Causes of the Wealth of Nations. New York: Bartleby. Storper, Michael. 1995. β€œCompetitiveness Policy Options: The Technology‐regions Connection.” Growth and Change 26 (2): 285–308. β€”β€”β€”. 2013. Keys to the City: How Economics, Institutions, Social Interaction, and Politics Shape Development. Princeton University Press. Vernon, Raymond. 1972. β€œExternal Economies.” In Readings in Urban Economics, 27–49. eds. M. Edel and J. Rothenberg. New York: Macmillan. Watts, Duncan J, and Steven H Strogatz. 1998. β€œCollective Dynamics of β€˜small- World’ Networks.” Nature 393 (6684): 440–42. 10 CHAPTER 2 CHURNING, POWER LAWS, AND INEQUALITY IN A SPATIAL AGENT- BASED MODEL OF SOCIAL NETWORKS 2.1. Introduction Regional science since its inception has focused on socioeconomic phenomena with a spatial dimension (Nijkamp, Rose, & Kourtit, 2014). Social networks in particular are one of the defining issues of our time that have transformed how we think about socioeconomic phenomena. A growing body of empirical work measuring different aspects of social networks has indeed shown that connections matter for a variety of outcomes, such as getting jobs (Lin & Dumin, 1986), becoming more successful entrepreneurs (Greve & Salaff, 2003), or maintaining high-performing organizations (Borgatti & Cross, 2003). But while it is intuitive to many that social interactions should respect Tobler’s (1970) β€œfirst law of geography,” ground-breaking models of networks such as the preferential attachment model of BarabΓ‘si and Albert (1999), the random graph model of Erdos and RΓ©nyi (1960), or the small-world model of Watts and Strogatz (1998) all assume that geographic space has little to no relevance. Casual empiricism however suggests that space matters in important ways. Urban dwellers for example rely much more for mutual support on local neighbors than on acquaintances in other cities (Gans, 1962; Mansury & Shin, 2015). This is in line with a Pew Internet study that shows face-to-face contact has remained the dominant means of communication even for core users of online social network sites (Hampton, Sessions, Her, & Rainie, 2009). 11 Regional science is well-positioned to contribute to the literature on spatial networks as it has long recognized the critical role of physical geography in socioeconomic relationships. Cities in particular are manifestations of the dense interactions among residents living in close proximity to one another (Batty, 2013). Situating entities in space therefore strengthens the empirical basis and sheds new light on the nature of social networks. Spatially embedded models of networks indeed show the non-trivial impact of geography on network properties (Kosmidis, Havlin, & Bunde, 2008) as well as the importance of geographically concentrated networks (Browning, Dietz, & Feinberg, 2004). Empirical studies by both sociologists (McPherson, Smith-Lovin, & Cook, 2001; Wellman, 1996) as well as regional scientists (Cassi & Plunket, 2014; Fritsch & Kauffeld-Monz, 2010; Ioannides & Topa, 2010) confirm the critical role of space, with social distance (e.g. the frequency of contact or the strength of relations) being heavily influenced by geographic propinquity. An important feature of contemporary social networks is churning brought about by social actors that continuously re-evaluate and alter their links. For example, it has been observed that more than half of social networking users have β€œunfriended” contacts in their networks, thereby removing members from their inner circle (Madden, 2012). This is also evident in online dating networks, where members who start out as strangers eventually enter a committed relationship (Smith & Duggan, 2013). Many of these relationships end up being dissolved in the end. From a broader perspective, it has been argued that rising crime and disorder in cities are in large part brought about by the decay of local social ties (Sampson, 2004). Others have gone so 12 far as to suggest that the continuous decline in social connections has impoverished our lives and communities (Putnam, 2001). The churning dynamics are consistent with the view of networks as complex, dynamic, self-organizing systems embedded in space and governed by individual interactions. The present study therefore examines the evolution of spatial networks using agent-based models (ABMs), a class of complex-systems approximations where the abstraction maintains β€œa close association with real-world agents of interest” (Miller & Page, 2009). Regional scientists in particular have used simulation techniques to explore the bottom-up processes that drive the emergence of spatial patterns (Mansury & GulyΓ‘s, 2007; Torrens, 2007, 2010; Xie, Batty, & Zhao, 2007). In essence, an ABM is a set of agent-specific rule-based algorithms that shape the outcomes for individual agents. The algorithms allow agents to interact directly with one another, and in so doing link individual changes to systems dynamics. ABMs’ main advantage for modeling social networks is that they utilize a more realistic bottom-up approach that gives agency to social actors to collectively generate systems behavior. This paper addresses the following research questions. The first focuses on power law distributions building upon the observation that, while many real-life social networks have highly skewed distributions, many others do not. For example while citation networks and actor networks are both networks of collaboration, the former tend to follow the power law (Barabasi, 2000) even though the latter exhibit small- world properties (Watts, 1999). The present study queries when a scale-free distribution is sustainable in a spatial network where agents are allowed to re-evaluate 13 their ties, and conversely under what conditions churning destabilizes the power law. Second, it is argued that social capital is a socioeconomic resource affected by the decisions to maintain, dissolve, and form new connections. A natural question then is how the unequal distribution of social capital is affected by the differential strength of network churn factors, and under what conditions the disparities can be mitigated. This paper contributes to the literature by adding space into an ABM of preferential attachment (PA) with network churn, and by showing how such refinements result in previously unknown emergent properties and network behavior. In contrast to the sole focus on connectivity in the original preferential attachment model, here I consider the refinement proposed by Bianconi and BarabΓ‘si (2001) allowing agent fitness to also influence the probability to form a new connection. Space then matters in the specification where proximity affects agent fitness, which intertwines with churning to embed the agent selection mechanism in an evolving spatial landscape. The model is further enriched by the distinction between two types of agents–namely introverts and extroverts–in a community where introverts are limited in their spatial scope of interactions while extroverts are free to maintain long range connections. This is in line with the theory and empirics that reveal that psychological traits dictate social interactions (Cuperman & Ickes, 2009) and economic outcomes (Tversky & Kahneman, 1981). The next section highlights three key concepts, namely PA, power laws, and homophily. Section 3 elaborates on the network model, and section 4 on the implementation of the ABM in simulation analysis. The penultimate section 5 analyzes the simulation results focusing on the degree distributions of agents under 14 different parameter settings, as well as on the individual and spatial inequalities in social capital. The concluding section discusses key policy implications and recommendations. 2.2. Theoretical Framework 2.2.1. Preferential attachment (PA) The original PA model proposed by BarabΓ‘si and Albert (1999) ushered in a boom in network studies in the late 1990s. While the model was formalized in their seminal paper, the idea that the rate in which a particular agent acquires links is proportional to the links that the agent already has – thus preferential attachment – had been around long before. Notably, de Solla Price (1965) discussed the concept of cumulative advantage in the context of scientific citations, while sociologists referred to the same phenomena as the Matthew effect (Merton, 1968), named after the Gospel of Matthew: β€œFor whoever has will be given more, and they will have an abundance.” (Mt, 25:29). This β€œrich get richer” mechanism has been observed in many types of networks, ranging from the internet and power grids to empirical social networks (see Barabasi, 2000 for examples). The model assumes a network that starts with a small number of nodes that are randomly connected. In every succeeding step a new node is added, linking itself to the incumbent nodes already in the network. PA is incorporated by the simple rule that incoming nodes prefer incumbents that are already well connected, thus awarding an incumbent with significant connectivity a higher probability of attracting newcomers. The PA model serves as a good starting point for analyses of social networks due to two 15 key characteristics. First, it features a network that expands continuously through agent and tie addition. Second, new agents connect to others already in the network through a process of selection, which favors more connected agents. The first characteristic is essential in modeling growing networks that are fundamentally different from static networks with a fixed number of nodes and edges (Erdos & RΓ©nyi, 1960; Watts & Strogatz, 1998). The second characteristic captures the fact that in many social networks, agents choose to associate with others that are already well-connected. In theoretical terms, selection is introduced by allowing the probability that a node attracts others to be proportional to its degree connectivity1, which is in contrast to random network models in which the probabilities of attachment are fixed in time. The PA model however abstracts away from certain important aspects of real networks. The lack of a spatial dimension in particular is one critical omission to which I devote section 2.3 below. The PA model also ignores network churn, unlike random graph or small-world models that allow for the β€œrewiring” of links. Thus once an agent is connected to an incumbent upon entry, it ceases to seek new connections and only passively receives links from subsequent newcomers. Ignoring churn however disregards the body of empirical literature confirming that social actors constantly reevaluate network ties based on individual decisions and preferences (Karnstedt et al., 2010; Sasovova, Mehra, Borgatti, & Schippers, 2010). Furthermore, without tie formation and detachment among incumbents, the PA model gives near absolute agency to older nodes,2 which is simply not the case in many settings. The 1 The degree of a node refers to the number of connections that the node has. 2 This is simply because nodes that have been present for longer periods of time have more opportunities to form links with incoming nodes (see Adamic & Huberman, 2000 for a critique) 16 present study responds to the challenge of developing a spatial agent-based model to study the evolution of networks in a churning environment where relationships ebb and flow. 2.2.2. Power laws The PA model was conceived to explain the World Wide Web in which a few influential websites have a very large number of links while the rest harbor only a few connections. This highly unequal distribution exhibits a power law, which refers to the linear relationship in the log-log plot of the degree distribution.3 Power law distributions are scale-free, a notion that is best understood vis-Γ -vis the scale- dependent counterparts. Magnitudes such as the length of a town block, the height of a building and the number of bedrooms in a housing unit have a characteristic scale, which means the mean value is representative of the magnitude that one actually observes on the ground. By contrast, power-law distributed urban social networks lack characteristic scale, and this means that the average number of acquaintances is not a good predictor of the extent to which a city resident is connected. In stochastic terms, the degree distribution exhibits β€œfat tails,” which implies scale-independence, or scale- free. Urban and regional scholars have long been familiar with regularities in the size distribution of cities (Zipf, 1949). More relevant for the study here is the presence of power law relationships in spatial networks, such as those for commuters (De Montis, Chessa, Campagna, Caschili, & Deplano, 2009). The essence of power law 3 The degree distribution of a network describes the relative frequencies of nodes with different degrees (Jackson, 2008). 17 distributions is captured by the existence of a few nodes with very large degrees, acting as β€œhubs.” Fat-tailed distributions therefore imply wide variations in the extent of social contacts–and thus the resources (Granovetter, 2005; Lin, 2001)–that a node can tap into. While most have modest connections, a few exert enormous influence by virtue of their hundreds or even thousands of contacts. The implied distribution of resources is therefore highly skewed. Understanding the sources of inequality in space is one of the central challenges of the science of regions. But before we can begin addressing spatial inequality we must first disentangle the processes that give rise to differential degree distributions. Under what conditions do networks sustain fat-tailed distributions, and under what conditions does the power law break down? It is important to note that many networks are scale dependent. In the small-world networks that Watts and Strogatz (1998) consider for example, the degree distribution follows a random distribution, and the authors give numerous examples of such networks. Thus while the power law distribution is observed for specific networks, there are many networks that exhibit characteristic scale with small deviations from the average connectivity. The present study addresses this and other questions, including whether the power law can be sustained in networks where the arrival of new nodes and links occurs at the same time as the elimination of ties among existing nodes. The evidence drawn from the ABM simulations hints at the potential tug of war between tie formation and tie dissolution. The power law seems to persist when dissolution dominates formation, but vanishes in the opposite case. 18 2.2.3. Homophily, propinquity, and social capital As elegant as the preferential attachment (PA) model may be, it leaves out a number of important variables. Social ties depend not only on connectivity but also on other social and economic indicators. In particular the model abstracts away from homophily, which refers to the tendency for people to associate with those sharing a wide range of similar attributes. Homophily is one of the most fundamental forces identified to date that have been known to shape social networks (McPherson et al., 2001). Numerous studies of social relationships indicate the dominance of homophily in social interactions, ranging from ties of marriage (Kalmijn, 1998) or friendship (Aral, Muchnik, & Sundararajan, 2009; Verbrugge, 1983), to membership in voluntary associations (Cornwell & Dokshin, 2014) or appearing with others in a public space (Mayhew, McPherson, Rotolo, & Smith-Lovin, 1995). The PA model is in essence a model of heterophily, whereby nodes prefer to connect to others that are as different as possible (in the number of links) to themselves. Within the social networks literature, whether it is homophilous or heterophilous ties that predominate has long been debated. Developments in theory and empirics suggest that both are instrumental in explaining social interactions, with one dominating the other under different circumstances (Burt, 2005; Mehra, Kilduff, & Brass, 1998). An important source of homophily is propinquity, as we are more likely to sustain relationships with others that are geographically closer to us rather than with others farther away (McPherson et al., 2001). Zipf (1949) states that the importance of propinquity stems from the notion of effort, for it takes more energy and effort to maintain relationships with distant contacts. Many examples, such as studies of 19 neighborhoods (Campbell, 1990), residential proximity (Verbrugge, 1983), or immigrant enclaves (Wilson & Portes, 1980) show that more homophilous interactions take place when social actors are closer to each other.4 Locations thus not only determine physical factors, but also the common traits that forge neighborhoods. It is to account for homophily that I give each node a unique location in the network of heterophilous interactions. The introduction of space allows us to set the probability for tie creation to be inversely related to the relative distance between nodes. Spatial embeddedness changes the dynamics of tie formation as locations introduce an important new source of heterogeneity. In the preferential attachment model heterogeneity stems only from differences in connectivity. The introduction of space means that nodes with the same connectivity are not necessarily equally attractive, as it depends on where they are located in relation to the evaluating node. As I will show below, the introduction of space results in degree distributions that generally lack power law properties. While hubs with the highest number of connections are still the biggest draw, I do not expect hubs to be secluded in space as isolation would have prevented them from becoming a hub to begin with. Closely aligned with the notions of homophily and heterophily is the concept of social capital. It has been theorized that strong homophilous relationships promote higher levels of trust, reciprocity and enforcement of norms (Coleman, 1988). By contrast, weak heterophilous relationships matter when people need to β€œget ahead,” with individuals maintaining more such relationships gaining brokerage capacity or 4 Proximity and preferences come together in Schelling’s (1969) famous work where people sharing similar traits end up in the proximity of each other despite only mild preference for neighbors who are like them. 20 better access to non-redundant information (Burt, 1992, 2005; Granovetter, 1973). Social capital is thus a resource that one can tap into, but unlike physical or human capital, it is embedded in the network fabric that one is a part of (Portes, 1998). As I will show below, this conception allows us to examine the inequalities in social capital emerging from the differential propensities to form and dissolve ties. 2.3. The model In the following presentation of the model I will use the term agents and nodes interchangeably. I consider a network comprised a set of nodes 𝑁 = {1,2, … , 𝑛} and a set of undirected links 𝐿 βŠ† 𝑁 Γ— 𝑁. Multiple links between two given nodes as well as self-links are assumed to be absent. The network is initially conceived with two nodes that are connected to each other at 𝑑 = 0. In each subsequent period a new node joins the network, and must choose which of the preexisting nodes it will connect to. As in the preferential attachment model, a preexisting node is chosen with probability that is proportional to its degree connectivity (Vega-Redondo, 2007) as well as its individual fitness. The probabilistic approach is used to represent sources of variability in network formation that are too complex to capture mechanistically. In real networks ties are not always formed based on connections because of either bounded rationality or factors other than connectivity. The novel attachment mechanism here stems from the preferential bias for preexisting nodes with higher fitness, captured through the fitness parameter πœ‚, which represents individual-specific characteristics such as social affinity, wealth, or human capital (Bianconi & BarabΓ‘si, 2001). Formally, the probability Π𝑖,𝑗 that a new node i 21 will connect to an incumbent node j depends on both the connectivity π‘˜π‘— and the fitness parameter πœ‚π‘— such that: πœ‚π‘—π‘˜π‘— Π𝑖,𝑗 = (1). βˆ‘π‘— πœ‚π‘—π‘˜π‘— Equation (1) has a straightforward interpretation. Other things equal, a higher connectivity raises the likelihood of being linked to an incoming node, but a lower fitness would lower this probability. A node searching for a new contact evaluates an existing node j’s fitness based on the combination of two factors, namely human capital πœ†π‘— and the distance between the two nodes 𝑑𝑖𝑗, such that: πœ†π‘— πœ‚π‘— = 2 (2). 𝑑𝑖𝑗 The level of human capital πœ†π‘— is drawn from a uniform random distribution with support [0, 1]. Note that equation (2) introduces space by assigning every agent a unique location in the network so that no two nodes can occupy the same area. A central feature of the tie dynamics in the model is the formation and dissolution of ties among incumbent nodes. The trajectory is cumulatively driven by a combination of tie creation and tie deletion, leading to network churning (Koka, Madhavan, & Prescott, 2006). For tie formation I introduce the notion of maximum visibility reach that distinguishes extroverts from introverts. The former are capable of connecting to other agents anywhere within the spatial grid, while the latter are agents that are unable to connect to other agents outside of their visibility range (v). Formally, the probability Π𝐹𝑒,𝑗 for an extrovert e to connect to any other node j that is not yet a link neighbor is calculated as: 22 πœ‚π‘—π‘˜π‘— Π𝐹𝑒,𝑗 = πœƒπ‘“ (3), βˆ‘π‘—βˆ‰π‘† πœ‚ π‘˜π‘’ 𝑗 𝑗 where πœƒπ‘“ is the tie formation parameter drawn from the interval [0, 1], and 𝑆𝑒 is the set of node e’s link neighbors. By contrast, the tie formation probability for introverts i is defined as: πœ‚π‘—π‘˜π‘— πœƒπ‘“ π‘€β„Žπ‘’π‘› π‘‘βˆ‘ 𝑖𝑗 ≀ 𝑣 Π𝐹 = π‘—βˆ‰π‘†π‘–,𝑑 ≀𝑣 πœ‚π‘—π‘˜π‘–π‘— 𝑗 𝑖,𝑗 (4) { 0 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’. Equation (4) assigns a non-zero probability for an introvert i to connect to agents within i's visibility range, 𝑑𝑖𝑗 ≀ 𝑣, or zero otherwise. For tie decay, the probability that any node i deletes a tie with a current link neighbor is determined by the number of links currently maintained by that agent (π‘˜π‘–) and the distance between the two nodes (𝑑𝑖𝑗). This is based on the notion that maintaining ties requires social effort (Lin, 2001), and hence other things equal, nodes with a larger number of links lose connectivity at a higher rate compared to nodes with only a few links. At the same time, links between nodes farther away have a higher chance to dissolve. Formally, I introduce a tie decay parameter that controls how ephemeral ties are within the network. The equation that governs the probability for node i to dissolve its link to neighbor j is as follows: 2 𝑑𝑖𝑗 πœƒ π‘˜ π‘€β„Žπ‘’π‘› πœƒ π‘˜ 𝑑2 < βˆ‘π‘‘2 𝑑 𝑖 2 𝑑 𝑖 𝑖𝑗 𝑖𝑗 Π𝐷 βˆ‘π‘—βˆˆπ‘† 𝑑𝑖 𝑖𝑗 𝑖,𝑗 = π‘—βˆˆπ‘†π‘– (5), { 1 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’ 23 where πœƒπ‘‘ is the tie dissolution parameter drawn from the interval [0, 1]. 5 To capture individual network resources, I utilize the concept of social capital, defined as the instrumental resources that are available to actors through the social ties they maintain (Lin, 2001). Formally, I utilize a simple definition of an agent’s individual social capital 𝑆𝐢𝑖 as the sum of the human capital (Coleman, 1988; Moretti, 2004) of an agent’s first-degree link neighbors such that: 𝑆𝐢𝑖 = βˆ‘ πœ†π‘— (6) , 𝑗 ∈ 𝑆𝑖 and naturally I define aggregate social capital in the network to be βˆ‘π‘– 𝑆𝐢𝑖. It is important to note that social capital – both at the individual and aggregate levels – indirectly feeds back into the model in the form of influencing individual decisions of tie formation and dissolution. From the setup, the individual social capital measure depends on both the number of links (which increases the number of link neighbors 𝑗 that are included in set 𝑆𝑖) as well as the quality of these links (the human capital that a link neighbor is endowed with). Agents with more links will, ceteris paribus, have more social capital, which feeds back to them by 1) increasing their chances of being linked by other agents, while at the same time 2) increasing their likelihood of losing links to others due to higher values of π‘˜π‘– which increases Ξ  𝐷 𝑖,𝑗 (see equation (5)). On the other hand, agents with higher quality links, ceteris paribus, will be affected by an increased chance of losing links, for their link neighbors will usually harbor a larger number of connections which would increase the chance of tie dissolution. Aggregate social capital also feeds back to individual agents, with higher levels resulting in lower 5 Π𝐷𝑖,𝑗 can become greater than 1 when πœƒ π‘˜ 2 2 𝑑 𝑖𝑑𝑖𝑗 > βˆ‘π‘—βˆˆπ‘† 𝑑𝑖𝑗, in which case the probability defaults to 1. 𝑖 24 tie formation due to network saturation (i.e. less possible links to be formed) as well as higher tie dissolution (i.e. more possible links to be severed). Note that there are two mechanisms that act against each other to determine levels of social capital; namely 1) the mechanism of preferential attachment based on degree, human capital, and distance, and 2) the mechanism of preferential detachment based on degree and distance. I will assess how individual and aggregate levels of social capital – as well as inequality in social capital – evolve for varying values of the tie formation (πœƒπ‘“) and tie decay (πœƒπ‘‘) parameters. 2.4. Algorithm implementation I use the standard protocol of β€œOverview, Design concepts, and Details” (ODD, see Railsback & Grimm, 2011) to describe the agent-based algorithm. ODD provides a standardized way of presenting the ABM starting with three elements which overview the model and how it is designed, followed by specific design concepts that illustrate the ABM’s key characteristics, and ending with three elements that describe the initialization and implementation details. I highlight the main purpose of the model, as well as the entities, variables, scale, and processes below (see Annex D for specific details of the model according to the ODD protocol). Purpose The model seeks to examine how churning affects network resources and system characteristics in a spatial setting. Specifically, I consider how the degree distribution of agents differs based on varying strengths of tie formation and dissolution, as well as how these factors influence individual and spatial inequalities. 25 Entities, state variables, and scales The model has three types of entities: individual agents, their network connections, and square patches of residential locations. The network connections are assumed to be binary (i.e. either a link exists or it does not), non-redundant (i.e. no more than one link can exist between any two agents), undirected (i.e. the direction of the link is indistinguishable), and containing no self-loops (i.e. no links to oneself). Each patch is in the state of either empty or occupied by maximum one agent, and make up a square grid landscape of 100 Γ— 100 (L=100).6 Since opposite edges are disconnected, the landscape represents a two-dimensional Euclidean surface. This is to facilitate the analysis of spatial inequalities between agents as well as specific core-periphery structures that are better represented in two-dimensional space. The patches have no state variables other than their relative position within the grid, which dictates the distances to other patches, and thus the distances between agents. Agents seek to expand the extent of their connections in the endeavor to increase their social capital while being constrained by the effort required to maintain their ties. Each individual agent is defined by both static and dynamic state variables. The static state variables comprise the agent type (whether extrovert or introvert), level of human capital, the propensity to form ties among incumbents (the tie formation parameter) and to dissolve existing ties (the tie dissolution parameter), and spatial coordinates, which are all predetermined at agent birth. The dynamic state variables are the individual degree connectivity, relative fitness, and social capital. The global variables are the degree distribution, aggregate social capital, and measure of inequality (the Gini coefficient). 6 Sensitivity analysis is run for different world sizes. See Appendix C. 26 Figure 2.1. ABM flow chart 27 The model’s temporal scale is such that each simulation is run until the population of agents (N) reaches 1,000, which results in a maximum link count of 499,500. Process overview and scheduling There are three processes in the model: 1) the linking of a newcomer to an incumbent member of the network, and the 2) formation and 3) dissolution of ties among incumbents. Regarding the first process, in each period a new agent enters the system and is assigned a random location on the spatial landscape, and with equal probabilities is deemed either an introvert or extrovert. Introverts are assigned a visibility range parameter value (v) of 15.7 After each incumbent node is assigned a value of Π𝑖,𝑗 (see equation (1)), the newcomer first chooses an incumbent at random and creates a link with probability Π𝑖,𝑗. In the event that the newcomer fails to create a link, it would continue to draw randomly from the full set of incumbents until a connection is formed.8 Tie formation between incumbents and tie dissolution are implemented using an exhaustive search method where each incumbent evaluates the probabilities for tie formation (see equations (3) and (4)) and tie decay (see equation (5)) for all candidate agents (i.e. all agents that are not currently linked for tie formation, and all agents that are linked for tie decay) in every time step, forming and deleting links accordingly. The sequence is such that first a newcomer is added to the network and linked, followed by tie formation and afterwards tie deletion processes for the incumbents.9 One time period ends when all three processes have been completed for all agents. 7 As with world size, sensitivity analysis is run for differing visibility ranges. See Appendix C. 8 In other words, the re-selection is done with replacement. 9 Thus it is possible to delete a link with an incumbent for which an agent has created a link in the same time period. 28 I conduct a parameter sweep on the tie formation parameter πœƒπ‘“ and the tie dissolution parameter πœƒπ‘‘, each of which is allowed to vary between 0 and 1. The analyses employ the mean of five simulation runs for each parameter setting with a unique random number seed for every configuration.10 All simulations are implemented in NetLogo (Wilensky, 1999), a multi-agent programmable modeling environment popularly used worldwide. I examine the implications of different parameter settings for degree connectivity, network churning, and the presence or absence of power-law distributions. I also examine the distributional impact (measured by the Gini coefficient) of these settings, calculated by assessing the level of social capital obtained by the agents through their network ties. Finally, I consider the spatial distribution of agents with high or low levels of social capital to examine the role of network churn in a spatial context. 2.5. Simulation results 2.5.1. Degree distributions and the power law In seeking to establish the conditions in which the power law prevails, I first verify the agent-based algorithm by confirming that the power law holds when πœƒπ‘“ = πœƒπ‘‘ = 0, i.e., for the original preferential model of BarabΓ‘si and Albert (1999) in which incumbents are not allowed to form or dissolve ties with one another. Figure 2.2a shows that indeed the power law prevails in the long-run distribution of degree connectivity with 10 I choose five simulations as the optimal level that balances computational burden with potential variability, as the results suggest that the overall variability of network characteristics are minimal across different random seeds. This is due to the fact that each agent is subject to a stochastic process of initial linkage, link formation and decay with every other agent in the network at every time period. The enormous amount of stochasticity involved with constructing a final network of 1,000 agents thus renders the resulting network as a whole robust to large variations in aggregate characteristics. 29 Figure 2.2a-h. Degree distributions for select parameter settings. The X-axis corresponds to the degree of nodes, while the Y-axis to the number of nodes with such degrees, in log– log scale. 30 this parameter setting, with a coefficient of determination R-squared of 0.9 and a power exponent of 1.9 which, as it turns out, roughly coincides with the lower bound of the range for many real networks (BarabΓ‘si & Albert, 1999). The scale-free property however vanishes as soon as a small but positive propensity to form new ties is introduced among incumbent agents while holding πœƒπ‘‘ = 0. Figures 2.2b-c reveal that for networks where links never decay, a slight increase in πœƒπ‘“ results in the complete breakdown of the power law. The shifting of the points on the plot to the right as πœƒπ‘“ increases shows how rising propensity to form new ties among incumbents benefits the relatively disadvantaged (in the number of links) by awarding them with more connections. The power law returns when tie formation and decay are both present, but only when they are either roughly equal in strength or when the rate of tie dissolution is greater than that for tie formation. Figure 2.2d shows the resurgence of the scale-free property when πœƒπ‘“ = 0.4 and πœƒπ‘‘ = 0.5, accompanied by a highly unequal distribution of social capital as evident from the Gini coefficient that is at least twice that for cases where churning is absent (i.e., when πœƒπ‘“ = πœƒπ‘‘ = 0). This suggests that power law distributions that seem ostensibly similar could have vastly different implications for social equity. Churning characterizes most real-world networks, and the model predicts a much higher concentration of social capital at the top than the canonical PA model when both tie formation and decay are present. However, the power law breaks down again when incumbents form ties at a rate that far exceeds the rate at which ties are dissolved. Figures 2e-h show that the degree distribution approaches a bell-shaped 31 curve as the ratio πœƒπ‘“/πœƒπ‘‘ increases. At the same time social capital becomes more evenly distributed as the Gini coefficient declines. The results also reveal that the power law prevails only in networks with very low density. 11 Under the original PA model, a network grown starting from two agents up to 1,000 agents in increments of 1 inevitably results in a link count of 999, which corresponds to a network density of 0.2%. The power-law network depicted in Figure 2.2d has a link count of 236 and a much lower density of 0.047%. As I have shown, when tie formation dominates tie dissolution– resulting in a larger number of links and higher density – the power law breaks down. Figure 2.3 depicts the fit of the power law curves under different combinations of the churn parameters πœƒπ‘“ (tie formation) and πœƒπ‘‘ (tie dissolution). The results suggest that in general, higher values of πœƒπ‘‘ relative to πœƒπ‘“ result in distributions that more closely follow power laws. The upper left portion of the figure is dominated by distributions with R-squared greater than 0.8. There is however substantial variation in the upper left extremes due to the sparseness of the network in these extremes as well as due to the stochastic evolution of the degree distributions. The figure confirms the results from Figure 2.2, with the power law fit decreasing substantially as tie formation becomes more prominent. In addition, it can be seen that introducing tie formation has a much larger effect in breaking the power law for lower values of πœƒπ‘‘. For example, when πœƒπ‘‘ = 0.6, an increase of πœƒπ‘“ from 0 to 0.5 still results in a degree distribution with R-squared of roughly 0.9. However, when πœƒπ‘‘ = 0.1, similar R- 11 Network density is simply the number of links K divided by the number of possible links ?Μ‚?, where ?Μ‚? = 𝑛(𝑛 βˆ’ 1)/2 and n is the number of nodes in an un-directed network. 32 1.0 R sqr = 0.2 R sqr = 0.4 R sqr = 0.6 0.8 R sqr = 0.8 R sqr = 1.0 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 (Tie formation) f Figure 2.3. Relationship between power law fit (R-squared) and network churn parameters. Darker colors represent higher R-squared. Values are the mean over five simulation runs each with a different random seed for every parameter configuration. squared values can be obtained only when πœƒπ‘“ is less than 0.1. The ability of ABMs to capture system dynamics is advantageous in studying the evolution of networks. Under two different parameter configurations that both result in power law distributions, I show how the same system properties may emerge albeit with different micro-foundations. Figure 2.4 depicts network formation for when 1) πœƒπ‘“ = πœƒπ‘‘ = 0 (Figures 2.4a-d), and when 2) πœƒπ‘“ = 0.4, πœƒπ‘‘ = 0.5 (Figures 2.4e- h), both of which exhibit power law properties (see Figure 2.2). In the absence of churning the evolution of degree connectivity is path dependent, much like in the 33 d (Tie dissolution) 4.a. 4.e.  f = d = 0, N = 50 , K = 49 f = 0.4,d = 0.5, N = 50,K = 7 4.b. 4.f. f =d = 0, N = 200, K =199 f = 0.4,d = 0.5, N = 200, K = 46 4.c. 4.g. f =d = 0, N = 500, K = 499 f = 0.4,d = 0.5, N = 500,K =106 4.d. 4.h. f =d = 0, N =1000, K = 999 f = 0.4,d = 0.5, N =1000,K = 237 Figure 2.4. Network formation dynamics under two different parameter configurations. 4a-d is where πœ½π’‡ = πœ½π’… = 𝟎 while 4e-h is where πœ½π’‡ = 𝟎. πŸ’ and πœ½π’… = 𝟎. πŸ“. Black and white nodes are extroverts and introverts, respectively. Nodes are sized proportionately to their degree. 34 canonical PA model that awards higher degrees to incumbents that have been in the network longer. Thus the nodes in Figure 4a that initially command relatively higher connectivity in the early stages are able to maintain their advantage throughout. However, when network churn is present, those that initially had greater connectivity (Figure 4.e) quickly lose their advantage, being replaced by other nodes that were able to get ahead at different times. Thus while overall the two systems both converge to power law distributions, the trajectories are vastly different. This suggests that the original PA model is a special case within the general class of models for which the power law prevails. Furthermore, the results reveal that distinct bottom-up dynamics likely lead to differential access to social capital (see Figure 2.4) even for systems that exhibit similar macro-properties. 2.5.2. Aggregate social capital Figure 2.5 and Appendix A display long-run levels of aggregate social capital and total link count K for different values of πœƒπ‘“ and πœƒπ‘‘. As Figure 2.5 shows, aggregate social capital falls monotonically with higher propensity (πœƒπ‘‘) for incumbents to dissolve ties while holding tie formation (πœƒπ‘“) constant. Such results are intuitively appealing, in that we should expect to see lower levels of trust and therefore social capital in a fluid environment where ties are easily broken. Furthermore, the marginal impact of higher values of πœƒπ‘‘ tends to diminish for larger values of πœƒπ‘“ (see also Appendix A, table (a)). For example, an increase in πœƒπ‘‘ from 0.5 to 0.6 while holding πœƒπ‘“ fixed at 0.5 results in a 30 percent decline in aggregate social capital, but the same increase in πœƒπ‘‘ while holding πœƒπ‘“ fixed at 1.0 results cuts the percentage decline to 20 35 6 5 d = 0 d = 0.1 4 d = 0.2 d = 0.3 d = 0.4 3 d = 0.5 d = 0.6 2 d = 0.7 d = 0.8 1 d = 0.9 d = 1 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 f (Tie formation) Figure 2.5. Tie-formation (πœ½π’‡), decay (πœ½π’…), and aggregate social capital. Values are the mean over five simulation runs each with a different random seed for every parameter configuration. percent. This suggests that, other things equal, greater inclination to form new ties increases the resiliency of social capital to forces that dissolve existing ties. Conversely, Figure 2.5 also shows that a higher inclination (πœƒπ‘“) to connect to other incumbents is the rising tide that lifts aggregate social capital across different values of πœƒπ‘‘. This can be traced to aggregate social capital being proportional to the total number of links in the network, which increases as πœƒπ‘“ rises. Less obvious is the finding that the marginal effect of tie formation diminishes at higher πœƒπ‘“ values. For example, a rise in πœƒπ‘“ from 0.4 to 0.5 holding πœƒπ‘‘ constant at 0.5 brings about a 35 percent increase in total social capital (see Appendix A), but a rise in πœƒπ‘“ from 0.9 to 1.0 again at πœƒπ‘‘ = 0.5 results in a much lower increase of 14 percent. In hindsight this is internally consistent with the logic of the model. Everybody is connected to almost 36 SCi (Aggregate social capital, logarithm) everybody else when πœƒπ‘“ is already near maximum, and so a further increase in πœƒπ‘“ will only have a limited effect on total social capital. Unexpectedly however, the marginal impact of tie formation accelerates for larger πœƒπ‘‘β€™s. Using part of the above example, an increase in πœƒπ‘“ from 0.4 to 0.5 at πœƒπ‘‘ = 0.5 brings about a 35 percent increase in total social capital, but the same increase at πœƒπ‘‘ = 1.0 triples the increase to over 107 percent. It appears that while rapid decay (high πœƒπ‘‘) foments a sparse network, it is precisely this limited connectivity that allows a higher rate of tie formation to exert greater influence on aggregate social capital. This can be explained by the higher chances to connect to agents endowed with higher human capital when overall connectivity is low. An important emergent outcome is the non-linear relationship between network churn and aggregate social capital. Figure 2.5 reveals a drastic decrease in social capital by a factor of 58 and a decrease in the number of connections by a factor of 62 (see also Appendix A) when πœƒπ‘‘ is raised from 0.0 to 0.1. This is in contrast to changes at higher πœƒπ‘‘ values where on average the decrease is less than two-fold for every 0.1 increment in πœƒπ‘‘. These results suggest a phase transition – defined as a significant change of state when parameter values cross a certain threshold (SolΓ©, Manrubia, Luque, Delgado, & Bascompte, 1996) – occurring in the vicinity of πœƒπ‘‘ = 0, where the qualitative behavior of the system undergoes a significant alteration. Considered a signature of complex networks (Castellano, Marsili, & Vespignani, 2000; Holme & Newman, 2006), in this case the phase transition is the change from a very sparse network with an average network density of 0.12% for non-zero values of πœƒπ‘‘ to a relatively dense network with an average density of 29.4% when πœƒπ‘‘ = 0 . 37 The simulation results thus highlight the important role of tie dissolution when analyzing different types of networks. The findings suggest that networks in which ties are permanent are distinct from those in which ties are transient. Networks with permanent ties have significantly greater connectivity as the high density suggests, which results in higher levels of aggregate social capital. The introduction of even a small likelihood for ties to dissolve however shifts the system into one in which connectivity is much lower, and this transition occurs suddenly rather than gradually. The importance of such a distinction has been empirically observed. For example, Wilson and Portes (1980) document the experiences of immigrant minorities integrating into the US labor market, with findings suggesting that immigrant groups in enclaves characterized by more permanent ethnic and cultural ties frequently perform better economically than other minorities that are more fragmented spatially (and thus part of more transient social networks). The phase transition in this model appears in line with such evidence. 2.5.3. Inequality Figure 2.6 and Appendix B show the implications of network churn for the distribution of social capital measured by the Gini coefficient. Figure 2.6a reveals that tie dissolution and the Gini coefficient are in general positively correlated, suggesting that a higher rate of tie decay amplifies inequality. This is an unexpected, emergent outcome because ties dissolve more rapidly for highly-connected agents (see equation (5)). Since agents with higher human capital also maintain more links, tie decay is expected to equalize the distribution of social capital. But while indeed higher rate of 38 6.a. 1.0 d = 1 0.8 d = 0.9 d = 0.8 d = 0.7 0.6 d = 0.6 d = 0.5 d = 0.4 0.4 d = 0.3 d = 0.2 d = 0.1 0.2 d = 0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 f (Tie formation) 1.0 6.b. 0.8 f = 0 f = 0.1 f = 0.2 0.6 f = 0.3 f = 0.4 f = 0.5 0.4 f = 0.6 f = 0.7 f = 0.8 0.2 f = 0.9 f = 1 0.0 d = 0 d = 0.00125 d = 0.0025 d = 0.005 d = 0.01 d = 0.015 d = 0.02 Figure 2.5a-b. Relationships between tie-formation (πœ½π’‡), decay (πœ½π’…) and the Gini coefficient. For 2.6b, bars for each πœ½π’… panel are ordered from the left in increasing levels of πœ½π’‡. Values are the mean over five simulation runs each with a different random seed for every parameter configuration. 39 Gini coefficient Gini coefficient tie decay causes high human capital agents to lose links faster, the preferential attachment (PA) mechanism ensures that the majority of these links are to lower human capital agents. As it turns out, agents with lower human capital lose ground in relative terms, and inequality rises as a result. On the other hand, a higher rate of tie formation holding πœƒπ‘‘ constant generally reduces inequality, unless ties are relatively permanent. Here it is important to recognize the two opposing forces at work. The first is the mechanism of preferential attachment (PA) initiated by the agent itself (or active PA). For an agent endowed with relatively low human capital, a new tie to a high human capital agent increases the former’s social capital more than the latter’s, and active PA thus tends to reduce inequality. The second is through being preferentially attached to (or passive PA) by other agents within the network. Passive PA favors the well-endowed, and thus tends to accentuate the gap between the highly connected and the relatively isolated. It turns out that which one dominates depends on the magnitude of the decay parameter πœƒπ‘‘. When ties are permanent (πœƒπ‘‘ = 0), the effect of increasing πœƒπ‘“ follows an inverted U pattern where the Gini increases at first and then declines after a certain threshold is surpassed. Thus passive PA is the dominant force initially for lower values of πœƒπ‘“. This is followed by reduced inequality as the network nears saturation (i.e. full connectivity) when πœƒπ‘“ is comparatively large and active PA dominates. When some ties are transient however, equalization through the active mechanism begins to take full effect and higher πœƒπ‘“β€™s lower inequality across the board. The simulation results highlight the phase transition near πœƒπ‘‘ = 0, marking the changing nature of the impact of incumbent tie formation. To shed light on the 40 transition, I run simulations with very small values of πœƒπ‘‘ in the neighborhood of πœƒπ‘‘ β‰ˆ 0. Figure 2.6b shows that for sufficiently low, non-zero rates of tie decay within the range 0 < πœƒπ‘‘ ≀ 0.01 (panels 2 to 5), the relationship between πœƒπ‘“ and inequality roughly follows a U-shaped curve. In this regime, an increase in πœƒπ‘“ decreases inequality to a certain minimum, after which a further increase in πœƒπ‘“ does the opposite and actually increases inequality. As πœƒπ‘‘ approaches 0, this pattern gradually reverts to the shape that characterizes that for πœƒπ‘‘ = 0. The relationship between tie formation and inequality is thus highly nonlinear depending on the tie decay parameter. The implication is that when ties are relatively permanent, any policy aimed to promote connectivity could inadvertently privilege the connected even more. In addition to overall inequality captured by the Gini coefficient, I also examine the gap between higher and lower human capital agents to shed light on how differential access to human capital affects the trajectories for social capital.12 Figure 2.7 depicts the evolution of agent inequality based on human capital levels for four different parameter settings, each embodying distinct trajectories. Here, the difference in social capital between high (75th percentile or above) and low (25th percentile or below) human capital agents is measured relative to the latter.13 Comparing the case for πœƒπ‘“ = πœƒπ‘‘ = 0 and that for πœƒπ‘“ = 0.4, πœƒπ‘‘ = 0.5, it can be seen that the two settings converge to similar values of inequality in the long run, yet the trajectory for the 12 Dynamic analyses of the Gini coefficient (not included here) suggest that the Gini is relatively stable for all parameter settings across time, after the network has evolved to include a sufficient number of agents for reliable Gini calculation. 13 Specifically, inequality is measured at each time point as the difference in average social capital between high (75th percentile and above) and low (25th percentile and below) human capital agents, divided by the average social capital of the low human capital agents. 41 3.0 f = 0.4, d = 0.5 f = d = 0 f = 0.8, d = 0.01 2.5 f = 0.4, d = 0 2.0 1.5 1.0 0.5 0.0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Time (ticks) Figure 2.7. Differences in social capital between high (75th percentile and above) and low (25th percentile and below) human capital agents across time, calculated by dividing the raw difference in average social capital between high and low human capital agents by the average social capital of low human capital agents. The figure omits values for which 𝒕 < πŸ“πŸŽ due to excessive volatility in values for a small set of agents. Values are the mean over five simulation runs each with a different random seed for every parameter configuration. network with churning is much more volatile. Both configurations yield scale-free degree distributions, but the results reveal that power laws also produce a highly skewed distribution of social capital with the higher human capital agents commanding levels of social capital twice that of the lower human capital agents. The oscillations for the churning case (πœƒπ‘“ = 0.4, πœƒπ‘‘ = 0.5) are due to the constant rewiring of links, and it is churning that allows lower human capital agents to improve their social standing vis-Γ -vis the more privileged ones. 42 Social capital inequalities { (high H.C. - low H.C.) / low H.C. } 2.5.4. Aggregate social capital and agent inequality The non-monotonic relationship shown in Figure 2.8 between distribution (measured by the Gini) and overall connectivity (measured by total link count) merits an explanation.14 As discussed in Section 2.5.2, what matters for connectivity is the ratio πœƒπ‘“/πœƒπ‘‘ since aggregate social capital is strictly increasing in πœƒπ‘“ while strictly decreasing in πœƒπ‘‘. The trajectory for inequality on the other hand is dependent on both 1.0 Phase 1 Phase 2 Phase 3 0.8 0.6 d =f = 0βˆ’ ο€Ύ 0.4 0.2 d = 0 d = 0.00125 0.00125 d <= 0.1 d > 0.01 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 Log (links) Figure 2.8. Relationship between link count and the Gini coefficient. Values are the mean over five simulation runs each with a different random seed for every parameter configuration. 14 I use link count instead of aggregate social capital to make the results more intuitive, recognizing that the two are qualitatively similar in their behavior across parameter values (see Appendix A). 43 Gini coefficient the magnitude of πœƒπ‘‘ and its relative strength. Hence starting from a sparse network, an increase in the ratio πœƒπ‘“/πœƒπ‘‘ generally accomplishes both higher connectivity and a more equal distribution. But if the ratio is increased further by lowering tie decay rate πœƒπ‘‘ from a level already close to zero, then a tradeoff ensues where stronger connectivity accompanies greater concentration of social capital. However, a reversal is observed at sufficiently high levels of connectivity with the return of the inverse relationship. More specifically, there are two transitions occurring at total link count π‘˜ β‰ˆ 35,000 and at π‘˜ β‰ˆ 126,000, respectively, resulting in three distinct phases. To explain these results recall that an agent acquires social capital through two opposing mechanisms, namely active and passive PA. Which one dominates here hinges on the level of network activity and network density, which in turn depend on the combination of πœƒπ‘“ and πœƒπ‘‘. Phase I represents sparse networks with low network densities between 0 and 7% where the fall in of inequality is driven by three factors. First, low human capital agents maintain a very small number of – if any – links. An increase in πœƒπ‘“ therefore renders their connection gains through active PA comparatively larger. Second, the gains for higher human capital agents being passively linked to are not as large as the gains for lower human capital agents actively initiating connections. Finally, even in sparse networks higher human capital agents command more links and thus have a higher probability for their ties to decay as long as πœƒπ‘‘ > 0. Phase II includes networks with moderate density between 7 and 25%. Here inequality increases with aggregate social capital because in moderately dense 44 networks, the gains for lower human capital agents is not as large as in sparse networks for a significant number of high benefit links have already been exploited. This is in contrast to higher human capital agents who benefit from both active and passive attachment mechanisms. Finally, Phase III represents very dense networks nearing link saturation for which inequality falls once again as connectivity increases further. This is due to most incumbents in this regime already having links with most others, and thus a new tie is simply becoming the equalizing vehicle that closes the gap between lower and higher human capital agents. 2.5.5. Spatial inequalities Space matters in the model as agents’ relative positions within the spatial landscape influence the level of social capital that they are able to maintain. I turn now to the locational patterns of agents commanding higher than average social capital, and compare these with those for the rest. Since the amount of aggregate social capital differs across different parameter settings, for comparison purposes I calculate the relative amount of social capital ̿̿𝑆̿̿𝐢𝑖 an agent has as: 𝑆𝐢 ΜΏΜΏΜΏΜΏ 𝑖 /βˆ‘π‘– 𝑆𝐢𝑖 𝑆𝐢𝑖 = , 1/𝑁 where N is the total number of agents within the network. Thus an agent with a value of ̿̿𝑆̿̿𝐢𝑖 greater than 1 maintains a greater amount of social capital than when mean social capital is uniformly distributed among all agents. I then divide agents into two categories, based on whether they have values of ̿̿𝑆̿̿𝐢𝑖 greater than or less than (or equal to) 1. 45 Figure 2.9 shows the long-run spatial distribution of agents with 𝑆̿̿̿̿𝐢𝑖 > 1 for select parameter settings, where the nodes are color-coded based on whether they are extroverts or introverts.15 Recall that extroverts are unencumbered in connecting with others while introverts are constrained by their spatial reach.16 The original PA model with πœƒπ‘“ = πœƒπ‘‘ = 0 is shown in Figure 9a, where high social capital agents are randomly distributed with a significant number with 𝑆̿̿̿̿𝐢𝑖 greater than 1 on the outer edges of the grid. By contrast, Figures 2.9b-d indicate that a higher number of links (K) accompanies the clustering of the high social capital agents towards the center, which in turn implies the lower social capital agents primarily occupying the fringe. Figures 2.9e-f show that high social capital agents again disperse spatially for even larger values of K. What is striking is that the spatial distribution of high social capital agents closest to the center (Figure 2.9d) occurs precisely at the parameter setting for which the Gini coefficient is the lowest (see Figure 2.6b). This implies that, overall, spatial inequality and social capital inequality are inversely related. Other parameter settings with higher Gini coefficients indeed produce a lesser concentration towards the center. The results can be explained when I take into consideration the distinct trajectories of link formation as K increases. For sparse networks, the paucity of links lowers the probability of linking to agents further away, since the model prioritizes agents that are close by. This results in agents occupying locally central positions 15 While not shown, due to the random spatial distribution of all agents the blank areas in Figure 9 are occupied by low social capital agents. 16 Analysis of the spatial distribution of agents based on human capital levels (not shown here) suggest that the spatial distribution of social capital is independent from that of human capital, with no clear emergent pattern. 46 9.a f =d = 0,Gini = 0.412, K = 999 9.b f = 0.9,d = 0.1,Gini = 0.259, K = 4234 9.c f = 0.7,d = 0.015,Gini = 0.15, K = 22383 9.d f = 0.8,d = 0.01,Gini = 0.103, K = 35123 9.e f = 0.3,d = 0,Gini = 0.367, K =101317 9.f f =1,d = 0,Gini = 0.263, K = 270756 Figure 2.6a-f. Spatial distribution of agents with ̿̿𝑺̿̿π‘ͺπ’Š > 𝟏 for representative parameter settings. Black and grey nodes are extroverts and introverts respectively. 47 being able to maintain higher levels of social capital, long-distance ties increasingly become likely and as a result agents commanding globally central positions have higher chances of connecting to or being connected. Centrally positioned agents then end up accumulating more social capital than agents on the fringe who have fewer options. However as density increases even further, the network becomes saturated to the point where central agents lose their advantage, for agents on the fringe are also able to maintain links with distant nodes. Another key observation is the distinct pattern in the composition of high social capital agents relative to their breed as the number of links increases. In Figures 2.9a-d the distribution of extroverts versus introverts is relatively even suggesting that in low density networks, introverts ceteris paribus have equal opportunities to acquire high levels of social capital. However, Figures 2.9e-f reveal that as the number of links crosses a certain threshold, introverts rapidly vanish from the set of high social capital agents. Figure 2.10a reveals that indeed there is a sharp increase in the absolute difference in average social capital between extroverts and introverts near log 4.55 β‰ˆ 35,000, which is precisely the point at which social capital inequality enters the first phase transition (see Figure 2.8). While not shown here, this sharp increase in social capital inequality between extroverts and introverts persists even after controlling for the overall higher levels of aggregate social capital in denser networks. This suggests that the transition from phase 1 to 2 in Figure 2.8 not only entails a sharp increase in aggregate inequality, but high levels of inequality between extroverts and introverts as well. At a link count of roughly 100,000, extroverts command levels of social capital 48 more than twice that of the introverts, which is substantial considering that for lower values of K they were virtually identical. It turns out that disparities increase in network density because of the limited visibility 250 200 150 100 50 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 Log (links) Figure 2.7. Differences in social capital between introverts and extroverts as a function of link count, measured as raw differences (extrovert s.c. – introvert s.c.). Values are the mean over five simulation runs each with a different random seed for every parameter configuration. reach (v) of introverts. In the beginning, both extroverts and introverts mainly connect with those close by since pairs that are in the vicinity have a higher chance to form connections. As the network crosses a certain density threshold however, both types exhaust their local possibilities, yet extroverts continue to connect with other agents farther away while introverts are no longer able to establish new ties due to their spatial constraints. 49 Social capital inequalities { (extrovert - introvert) } Overall, the tradeoff between spatial and individual inequality suggests that policies geared towards reducing one could adversely affect the other. In addition, the model offers one plausible explanation for the existence of agglomeration economies within a social interactions setting. The essential sources of agglomeration are off- market knowledge exchanges, input and output linkages, as well as labor market pooling (Marshall, 1920), which are all driven to some extent by social interactions (Ioannides 2013). The model suggests that spatial agglomeration benefits individuals in the core by allowing them to maintain more social connections than those in the periphery, which in turn may result in better economic outcomes. 2.6. Conclusions I have examined the evolution of degree distributions under different parameter configurations to establish conditions in which the power law is sustained and the cases in which it breaks down. While the presentation has focused on a few select parameter settings, sensitivity analysis reveal that the results are robust across different configurations for world size L and introvert visibility v (see Appendix C). Generally, I find that networks in which ties are scarce and the rate of tie dissolution is relatively high exhibit power law degree distributions. I also find that networks with link dissolution are fundamentally different from those in which ties are relatively permanent, underscoring the importance of distinguishing between the two types of networks. As a classic example, Watts (1999) shows that many social networks with tie decay are characterized by β€œsmall-world” properties, and these result in a very different degree distribution from that of networks where ties are permanent, which 50 BarabΓ‘si and Albert (1999) find to exhibit the power law. In fact, the results suggest that the power law distributions found for networks grown under PA are just a special case of a broader class of networks that take into consideration churning dynamics. Of particular interest is the concentration of social resources, which I find to be closely related to network density as it evolves in three distinct phases. Sparse networks exhibit a decrease in social capital inequality as network density increases, moderately dense networks exhibit increases in inequality with higher density, and very dense networks exhibit a decrease in inequality as the network reaches full saturation. In a complete (i.e. fully connected) network no inequality would exist and the level of aggregate social capital would be maximal. However, such networks are extremely rare in real life. The model suggests that due consideration for the relative strength of tie formation and dissolution are warranted when aiming to mitigate disparities over control of network resources. For example, when considering the spread of tacit information, encouraging more networking activity in an ethnic enclave where ties are relatively permanent and dense would have very different results than encouraging such activity among trade association members where ties are weaker and more transient. This relationship between network density and inequality among agents is further complicated when considering the spatial aspects of inequality. I find that spatial inequality is greater – in the form of higher social capital agents being distributed near the core – when inequality among agents overall is low. The results suggest that we should acknowledge the potential tradeoff between spatial inequality and individual inequality with respect to social resources. 51 In this paper, I have conceptualized social capital as a network resource rather than an individual resource that an agent is endowed with. This view is motivated by distributional considerations. The poor in particular – due to resource deprivation – are more likely to utilize membership in community networks that exchange help in crises than to resort to individual coping mechanisms (Gans, 1962). The results highlight counterintuitively that simply encouraging more networking activity among individuals may not result in the disadvantaged benefiting from the increased intensity of social interactions. Although this study does not pursue it, the current specification has enabled a new kind of analysis for furthering the understanding of network dynamics. The model establishes the impact of connectivity on social capital through equation (6). The accumulation of social capital however likely results in higher betweenness centrality (Freeman, 1977), which directly triggers additional rounds of tie decay and formation. An examination of closed-loop feedback effects of this type is beyond the scope of the present study but should be attempted in the future as an extension. Finally, while social capital has mainly been studied within the context of social networks, I note that it can be generalized to other domains – such as economic networks – in which resources obtained from network ties are of central importance. The model’s simple definition of social capital to be the human capital sum of link neighbors allows the framework to be readily extended to other types of networks, possibly by substituting human capital with a different resource of importance. I hope that this demonstration highlighting the complexity of network behavior in space and 52 in the presence of churning dynamics will stimulate further investigation of the implications for differential spatial patterns and social inequality. 53 APPENDIX A. Relationship between πœƒπ‘“, πœƒπ‘‘, and (a) aggregate social capital βˆ‘π‘– 𝑆𝐢𝑖 (b) link count (k). Ξ” is the average change (in multiples) in (a) social capital and (b) link count compared to the next highest πœƒπ‘‘ value. N = 1,000, L = 100, v = 15. Values are averaged over 5 simulations. (a) Aggregate social capital (βˆ‘π‘– 𝑆𝐢𝑖) πœƒπ‘‘ πœƒπ‘“ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1,215.4 5.0 2.1 1.8 0.6 - a - - - - - 0.1 49,813.1 578.3 251.0 150.7 93.8 67.6 42.7 29.8 20.2 8.2 - 0.2 80,618.1 1,106.5 508.4 306.1 208.1 137.0 95.7 71.5 49.3 22.4 1.7 0.3 109,908 1,649.8 734.5 474.6 326.3 223.3 163.9 125.6 90.3 42.6 9.9 0.4 136,943 2,199.3 990.4 618.1 435.0 288.0 227.0 173.6 115.8 71.8 26.8 0.5 168,187 2,703.4 1,267.0 780.0 542.6 390.3 295.5 230.9 169.7 105.9 55.6 0.6 189,633 3,280.7 1,513.8 915.7 680.7 492.3 394.0 288.7 219.6 166.8 99.6 0.7 219,314 3,821.6 1,783.3 1,137.7 795.3 603.4 480.3 380.8 274.0 214.3 135.2 0.8 242,426 4,307.1 2,044.5 1,279.0 932.3 706.1 550.8 445.6 349.5 265.1 185.7 0.9 267,520 4,883.1 2,309.4 1,485.6 1,094.1 808.6 648.7 541.8 420.6 331.5 262.7 1 285,562 5,348.8 2,547.3 1,631.3 1,227.8 921.6 763.8 601.1 517.2 434.4 334.1 Ξ”SC 57.6 1.1 0.6 0.4 0.4 0.3 0.3 0.3 0.3 0.5 - a For lower values of πœƒπ‘“, sufficiently high values of πœƒπ‘‘ prohibit the network from maintaining any links, resulting in no social capital. 54 (b) Link count (K) πœƒπ‘‘ πœƒπ‘“ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 999 4 2 2 1 - - - - - - 0.1 45,244 469 207 126 78 57 38 25 17 7 - 0.2 73,048 930 412 249 172 112 81 61 43 21 1 0.3 99,041 1,384 615 389 263 183 133 109 74 37 8 0.4 122,936 1,853 827 512 356 239 192 146 98 63 24 0.5 150,770 2,305 1,065 652 446 326 243 191 143 90 47 0.6 173,607 2,815 1,296 780 565 409 329 240 185 140 83 0.7 202,849 3,252 1,514 961 662 505 394 319 232 180 120 0.8 226,062 3,718 1,748 1,092 790 586 467 374 292 229 160 0.9 250,995 4,251 1,978 1,264 927 681 549 453 355 282 224 1 270,693 4,683 2,232 1,398 1,036 776 640 517 434 366 283 Ξ”K 62.0 1.2 0.6 0.4 0.4 0.3 0.3 0.3 0.3 0.5 - 55 APPENDIX B. Relationship between πœƒπ‘“, πœƒπ‘‘, and the Gini coefficient. Ξ”Gini is the average change in the Gini coefficient compared to the next highest πœƒπ‘‘ value. N = 1,000, L = 100, v = 15. Values are averaged over 5 simulations. πœƒπ‘‘ πœƒπ‘“ 0 0.00125 0.0025 0.005 0.01 0.015 0.02 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.41 0.69 0.82 0.90 0.95 0.96 0.97 0.99 0.99 0.99 0.79 - - - - - - 0.1 0.32 0.22 0.24 0.25 0.28 0.31 0.35 0.68 0.82 0.89 0.92 0.94 0.96 0.97 0.98 0.99 - 0.2 0.36 0.21 0.17 0.19 0.22 0.24 0.27 0.52 0.70 0.79 0.84 0.89 0.92 0.94 0.96 0.98 0.39 0.3 0.37 0.26 0.17 0.15 0.20 0.21 0.23 0.44 0.61 0.71 0.78 0.84 0.88 0.90 0.93 0.96 0.99 0.4 0.37 0.28 0.20 0.13 0.17 0.20 0.21 0.39 0.55 0.65 0.72 0.79 0.83 0.87 0.90 0.94 0.97 0.5 0.36 0.31 0.23 0.14 0.15 0.18 0.19 0.35 0.49 0.59 0.66 0.74 0.79 0.83 0.87 0.91 0.95 0.6 0.34 0.32 0.25 0.16 0.13 0.17 0.18 0.31 0.44 0.55 0.63 0.70 0.74 0.80 0.84 0.88 0.92 0.7 0.32 0.32 0.27 0.18 0.12 0.15 0.17 0.29 0.41 0.50 0.58 0.65 0.71 0.75 0.80 0.84 0.90 0.8 0.30 0.33 0.28 0.20 0.12 0.14 0.16 0.28 0.37 0.47 0.54 0.62 0.67 0.71 0.77 0.81 0.86 0.9 0.28 0.33 0.30 0.22 0.12 0.13 0.15 0.25 0.36 0.43 0.50 0.57 0.63 0.67 0.73 0.78 0.82 1 0.26 0.32 0.30 0.23 0.13 0.12 0.14 0.25 0.34 0.41 0.47 0.53 0.59 0.64 0.69 0.72 0.78 Ξ”Gini 0.02 0.11 0.17 0.05 -0.07 -0.06 -0.36 -0.21 -0.13 -0.06 -0.08 -0.05 -0.04 -0.04 -0.04 0.03 - 56 APPENDIX C. Robustness checks for world size L and neighborhood visibility of introverts v, at select parameter values in which πœƒπ‘“ = πœƒπ‘‘. 57 APPENDIX D. Details of the model according to the ODD protocol. Design concepts Basic principles: see section 2.3 above. Emergence: I look for the emergence of a power law distribution under different specifications of the tie formation and dissolution parameters, as well as under what circumstances the power law breaks down. We also look for the emergence of high and low inequality in social capital among agents as well as spatial patterns of inequality based on different parameter settings. Adaptation: There is no explicit adaptation of the agents in terms of changes in static state variables. Nonetheless the agents adapt their behavior with respect to their levels of social capital, with higher social capital agents deleting more ties due to the limited social effort they can exert. Objective: The agents’ objective is to maximize their social capital in each time frame, by connecting to another agent that preferentially has a higher level of human capital (which would imply that these agents also have higher connectivity). However, the agents are constrained by distance, being able to connect more easily to others who are closer to themselves. Introverts are also constrained by their neighborhood visibility, only being able to connect to others within a fixed threshold distance. Sensing: Agents use probabilities in deciding which new connection to form and which tie to dissolve. The probabilities are calculated based on the information set that reveals the human capital and connectivity levels of other agents, as well as their relative distance to these agents. Introverts are assumed to be able to sense this 58 information only for agents within their visibility reach, while extroverts are assumed to be fully knowledgeable of all information regarding all agents in the network. Interaction: Agents interact directly with each other through the formation and the dissolution of links, which in the real world would represent the re-evaluation of social connections. Introverts interact only with other agents within their visibility reach, while extroverts interact with all other agents. All agents interact with all possible others in each time frame. Stochasticity: Stochastic processes are used to assign the spatial position of each agent at birth. In addition, the human capital of each agent is randomly drawn from a uniform distribution within the interval [0, 1]. Stochasticity also appears in all tie formation and dissolution processes, for these processes are determined based on probabilities which may yield different results based on different random number seeds. Collectives: The collective – or aggregate – level of social capital, the degree distribution, average levels of social capital for extroverts versus introverts, as well as spatial patterns of social capital inequality are represented in the model, and emerge from the behavioral characteristics of agents. Observation: I track the evolution of degree distribution, individual and aggregate levels of social capital, and inequality in social capital among agents as well as spatially. I use numeric data representations as well as graphs to present these outputs of interest, and also utilize maps of agents to present spatial patterns. Initialization 59 At the beginning of every simulation (t=0), two agents linked to each other are placed uniformly and randomly on a respective patch. These two agents are assigned into the introvert or extrovert breed randomly with equal probability, and their human capital is drawn from a uniform distribution with [0, 1] support. Introverts are assigned a specific neighborhood visibility reach value. In addition, the tie formation and dissolution parameters are set globally at specific values, and these values are assumed to be constant across agents. Input data The environment is assumed to be generic, and thus the model has no input data. Submodels See the process overview and scheduling section for details on each procedure. 60 REFERENCES Adamic, L. A., & Huberman, B. A. (2000). Power-law distribution of the world wide web. Science, 287(5461), 2115-2115. Aral, S., Muchnik, L., & Sundararajan, A. (2009). Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544-21549. Barabasi, A.-L. (2000). Linked: how everything is connected to everything else and what it means. Plume Editors. BarabΓ‘si, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509-512. Batty, M. (2013). The new science of cities: Mit Press. Bianconi, G., & BarabΓ‘si, A.-L. (2001). Competition and multiscaling in evolving networks. Europhysics Letters, 54(4), 436. Borgatti, S. P., & Cross, R. (2003). A relational view of information seeking and learning in social networks. Management Science, 49(4), 432-445. Browning, C. R., Dietz, R. D., & Feinberg, S. L. (2004). The paradox of social organization: networks, collective efficacy, and violent crime in urban neighborhoods. Social Forces, 83(2), 503-534. Burt, R. S. (1992). Structural holes: The social structure of competition: Harvard university press. Burt, R. S. (2005). Brokerage and closure : an introduction to social capital. Oxford: Oxford University Press. 61 Campbell, K. E. (1990). Networks past: a 1939 Bloomington neighborhood. Social Forces, 69(1), 139-155. Cassi, L., & Plunket, A. (2014). Proximity, network formation and inventive performance: in search of the proximity paradox. The Annals of Regional Science, 53(2), 395-422. Castellano, C., Marsili, M., & Vespignani, A. (2000). Nonequilibrium phase transition in a model for social influence. Physical Review Letters, 85(16), 3536. Coleman, J. S. (1988). Social capital in the creation of human capital. American Journal of Sociology, 94, S95-S120. Cornwell, B., & Dokshin, F. A. (2014). The power of integration: affiliation and cohesion in a diverse elite network. Social Forces. doi: 10.1093/sf/sou068 Cuperman, R., & Ickes, W. (2009). Big five predictors of behavior and perceptions in initial dyadic interactions: personality similarity helps extraverts and introverts, but hurts β€œdisagreeables”. Journal of Personality and Social Psychology, 97(4), 667. De Montis, A., Chessa, A., Campagna, M., Caschili, S., & Deplano, G. (2009). Complex networks analysis of commuting Complexity and spatial networks (pp. 239-255): Springer. de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510- 515. doi: 10.1126/science.149.3683.510 Erdos, P., & RΓ©nyi, A. (1960). On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 5, 17-61. 62 Fritsch, M., & Kauffeld-Monz, M. (2010). The impact of network structure on knowledge transfer: an application of social network analysis in the context of regional innovation networks. The Annals of Regional Science, 44(1), 21-38. Gans, H. J. (1962). The Urban Villagers: Group and Class in the Life of Italians- Americans: New York: Free Press of Glencoe. Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360-1380. Granovetter, M. (2005). The impact of social structure on economic outcomes. The Journal of Economic Perspectives, 19(1), 33-50. Greve, A., & Salaff, J. W. (2003). Social networks and entrepreneurship. Entrepreneurship Theory and Practice, 28(1), 1-22. Hampton, K., Sessions, L., Her, E., & Rainie, L. (2009). Social isolation and new technology. Pew Internet & American Life Project: Washington. Holme, P., & Newman, M. E. (2006). Nonequilibrium phase transition in the coevolution of networks and opinions. Physical Review E, 74(5), 056108. Ioannides, Y. M. (2013) From neighborhoods to nations: the economics of social interactions. Princeton University Press. Ioannides, Y. M., & Topa, G. (2010). Neighborhood effects: accomplishments and looking beyond them. Journal of Regional Science, 50(1), 343-362. Jackson, M. O. (2008). Social and economic networks (Vol. 3). Princeton: Princeton University Press. Kalmijn, M. (1998). Intermarriage and homogamy: causes, patterns, trends. Annual Review of Sociology, 395-421. 63 Karnstedt, M., Hennessy, T., Chan, J., Basuchowdhuri, P., Hayes, C., & Strufe, T. (2010). Churn in social networks Handbook of social network technologies and applications (pp. 185-220): Springer. Koka, B. R., Madhavan, R., & Prescott, J. E. (2006). The evolution of interfirm networks: Environmental effects on patterns of network change. Academy of Management Review, 31(3), 721-737. Kosmidis, K., Havlin, S., & Bunde, A. (2008). Structural properties of spatially embedded networks. Europhysics Letters, 82(4), 48005. Lin, N. (2001). Social capital: A theory of social structure and action. Cambridge, UK: Cambridge Univ. Press. Lin, N., & Dumin, M. (1986). Access to occupations through social ties. Social Networks, 8(4), 365-385. Freeman, L. (1977). A set of measures of centrality based on betweenness. Sociometry, 40(1): 35-41. Madden, M. (2012). Privacy management on social media sites. Pew Internet Report, 1-20. Mansury, Y., & GulyΓ‘s, L. (2007). The emergence of Zipf's Law in a system of cities: An agent-based simulation approach. Journal of Economic Dynamics and Control, 31(7), 2438-2460. Mansury, Y., & Shin, J. (2015). Size, connectivity, and tipping in spatial networks: Theory and empirics. Computers, Environment and Urban Systems, 54, 428- 437. Marshall, A. (1920). Principles of Economics. London: MacMillan. 64 Mayhew, B. H., McPherson, M., Rotolo, T., & Smith-Lovin, L. (1995). Sex and ethnic heterogeneity in face-to-face groups in public places: an ecological perspective on social interaction. Social Forces, 74, 15-52. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: homophily in social networks. Annual Review of Sociology, 415-444. Mehra, A., Kilduff, M., & Brass, D. J. (1998). At the margins: A distinctiveness approach to the social identity and social networks of underrepresented groups. Academy of Management Journal, 41(4), 441-452. Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56-63. Miller, J. H., & Page, S. E. (2009). Complex adaptive systems: An introduction to computational models of social life: Princeton university press. Moretti, E. (2004). Estimating the Social Return to Higher Education: Evidence from Longitudinal and Repeated Cross-Sectional Data. Journal of Econometrics, 121(1-2), 175-212. Nijkamp, P., Rose, A., & Kourtit, K. (2014). Regional science matters: studies dedicated to Walter Isard: Springer. Portes, A. (1998). Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24, 25. Putnam, R. D. (2001). Bowling alone: The collapse and revival of American community: Simon and Schuster. Railsback, S. F., & Grimm, V. (2011). Agent-based and individual-based modeling: a practical introduction: Princeton university press. 65 Sampson, R. J. (2004). Networks and neighbourhoods: The implications of connectivity for thinking about crime in the modern city. Demos Collection, 155-166. Sasovova, Z., Mehra, A., Borgatti, S. P., & Schippers, M. C. (2010). Network churn: the effects of self-monitoring personality on brokerage dynamics. Administrative Science Quarterly, 55(4), 639-670. Smith, A., & Duggan, M. (2013). Online dating & relationships. Pew Internet & American Life Project. SolΓ©, R. V., Manrubia, S. C., Luque, B., Delgado, J., & Bascompte, J. (1996). Phase transitions and complex systems: simple, nonlinear models capture complex systems at the edge of chaos. Complexity, 1(4), 13-26. Tobler, W. R. (1970). A computer movie simulating urban growth in the detroit region. Economic Geography, 46, 234-240. Torrens, P. M. (2007). A geographic automata model of residential mobility. Environment and Planning B: Planning and Design, 34(2), 200-222. Torrens, P. M. (2010). Agent‐based models and the spatial sciences. Geography Compass, 4(5), 428-448. Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211(4481), 453-458. Vega-Redondo, F. (2007). Complex social networks: Cambridge University Press. Verbrugge, L. M. (1983). A research note on adult friendship contact: a dyadic perspective. Social Forces, 62, 78-83. 66 Watts, D. J. (1999). Small worlds: the dynamics of networks between order and randomness: Princeton University Press. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of β€˜small-world’ networks. nature, 393(6684), 440-442. Wellman, B. (1996). Are personal communities local? A Dumptarian reconsideration. Social Networks, 18(4), 347-354. Wilensky, U. (1999). NetLogo. Center for Connected Learning and Computer-Based Modeling, Northwestern University. Evanston, IL. Wilson, K. L., & Portes, A. (1980). Immigrant enclaves: an analysis of the labor market experiences of Cubans in Miami. American Journal of Sociology, 295- 319. Xie, Y., Batty, M., & Zhao, K. (2007). Simulating emergent urban form using agent- based modeling: Desakota in the Suzhou-Wuxian region in China. Annals of the Association of American Geographers, 97(3), 477-495. Zipf, G. K. (1949). Human behavior and the principle of least effort. New York: Hafner. 67 CHAPTER 3 AGGLOMERATION, REGIONAL SOCIAL CAPITAL, AND ENTREPRENEURSHIP IN CITIES 3.1. Introduction Entrepreneurship research, while still relatively new, has provided multiple models regarding the link between new firm formation and economic growth and development. For example, entrepreneurship has been touted to generate channels of creative destruction (Akcigit and Kerr 2010), where it allows the means of production to be used in newer and more efficient combinations (Schumpeter 1934). Others have suggested that entrepreneurship drives innovation by transforming general knowledge into economic knowledge that can be exploited for personal gain (Audretsch and Keilbach 2004). While many theories exist, most agree that entrepreneurship is the result of individuals or groups perceiving and acting upon economic opportunities that manifest in their surrounding environment. However, research regarding how exactly entrepreneurs perceive and act upon such opportunities is still in its infancy. In this paper, I argue that social interactions, and more broadly social capital within the community or region, aids entrepreneurs in the early stages of forming new firms. Such a view that economic outcomes are driven by social forces is certainly not new. Marshall (1920) emphasized how β€œthe mysteries of the trade become no mystery, but are, as it were, in the air” when elaborating on his theory of intellectual spillovers within agglomerations. Within this statement is implicit that intellectual spillovers are possible because individuals collocate and gain information through social linkages, which allow the knowledge β€œin the air” to be shared with one another. Saxenian 68 (1996) describes how new firms in Silicon Valley – due the horizontal integration of small firms in the region – benefited from such social linkages, as opposed to Route 128 which was dominated by large, inward looking incumbent firms. The benefits of social capital are not confined to knowledge exchange. For example, entrepreneurs are aided in their search for financial capital and qualified recruits through the social ties they maintain (Stuart and Sorenson 2003), and it has even been argued that social capital aids entrepreneurs in building self-confidence (Sorenson and Audia 2000). More recently, the literature on agglomeration has embraced the idea that the traditional factors that have been thought to cause agglomeration of economic activity also represent – at least in part – social interactions (Glaeser 2008; Ioannides 2013). Among others, proximity to customers and suppliers may reduce the costs of obtaining inputs or transporting goods to downstream consumers (Ellison, Glaeser, and Kerr 2010; Fujita, Krugman, and Venables 1999), but it also may embody stronger social ties between similar firms and customers that increases trust and information exchange (Dahl and Sorenson 2012). Similarly, labor market pooling shields workers from firm- specific shocks (Krugman 1991) and promotes better worker-firm matches (Helsley and Strange 1990), but it also represents social homophily (McPherson, Smith-Lovin, and Cook 2001). Critically, such representations of social interactions are inherently ego-centric, in that they view interactions as being shaped solely by one’s own network of relations. This leaves out the role of regions and geography in directly shaping and influencing the interactions and social capital that are available to its members. 69 A considerable body of literature within the economic geography and economics fields has thus developed which considers social interactions and social capital within the regional domain. Social aspects of the region has been viewed to be a crucial element of regional competitiveness (Kitson, Martin, and Tyler 2004; Porter 2003), where the social characteristics of a region are not simple aggregations of firms or individuals. Porter (1998) suggests that a key component of cluster formation and success is the degree of social embeddedness, the existence of facilitative social networks, social capital, and institutional structures. Similarly, Storper (1995, 2013) stresses the importance of β€œuntraded interdependencies” such as networks of trust and cooperation as well as local norms and conventions when considering the success of regions. Thus, the natural question to ask is whether there is a role that regional social capital plays in promoting entrepreneurship, over and above the effect of social interactions at the micro-level. This paper attempts to unify the treatment of regional social capital and agglomeration economies as being part of the broader β€œentrepreneurial ecosystem” of a region, where the ecosystem takes its form in various types of networks and their linkages. In as much as the main Marshallian forces – being customer supplier linkages, labor market pooling, and knowledge spillovers – are also manifestations of interactions of a certain form, I argue that they along with social capital can naturally be analyzed using the characteristics of suitably defined networks of industries, social organizations, and knowledge. Utilizing network analysis and a detailed dataset of nonprofit organizations within the US, I focus on aspects of social capital that are embodied within the characteristics of networks of nonprofit organization 70 classifications within a region. The focus on nonprofit organizations follows that of previous studies of community social capital (Feld 1981; R. D. Putnam 2001), where it is viewed that many opportunities for social interaction within a community emerge within the context of voluntary associational activity. In concordance, customer and supplier links are viewed as linkages within an inter-industry network defined by input-output relationships, and labor market pooling is defined using network relationships based on industrial occupational composition. Finally, knowledge spillovers are defined using networks that consider the distribution of patents as well as institutions that promote knowledge creation within a given region. The main goal of this paper is to compare the relative effects of different Marshallian economies and community social capital on entrepreneurship, and to assess whether these forces can explain the large variations in regional entrepreneurship rates that have been previously documented in the literature (Acs and Armington 2004, 2006). In addition, I move beyond the traditional emphasis on specific industries (such as manufacturing) and attempt to identify whether the effects of these forces differ across a diverse set of industries. In particular, distinctions are made based on being traded, local, high-tech, or low-tech (Delgado, Porter, and Stern 2016), as well as between manufacturing and non-manufacturing sectors. The paper is organized as follows. I begin by discussing the links between agglomeration theory and social interactions, and how network analysis can be utilized to measure these forces fruitfully within a unified framework. Section 3 presents the calculation of the various indices used in the empirical estimation, along with key descriptive statistics. Section 4 presents the empirical framework, while Section 5 71 discusses the main findings. I conclude with a final discussion of key insights and relevant policy implications. 3.2. Related literature 3.2.1 Entrepreneurship, agglomeration, and social capital The geographic concentration of economic activity has long been of central interest to urban economists, economic geographers, and regional scientists (Glaeser et al. 1992; Krugman 1991; Porter 2003; Storper and Christopherson 1987; Storper and Venables 2004). Indeed, many studies have shown how aggregate activity is concentrated in large urban areas, with estimates suggesting that in the US, 2% of the land area in the lower 48 states is home to roughly 75% of the population (Rosenthal and Strange 2004). Generally, agglomeration economies – or aggregate urban external effects – are thought to arise as the sum of a large number of individual externalities, both between establishments and individuals. Ultimately, external economies within cities arise due to productivity gains accrued through proximity, which in turn reduces transport costs, allows better access to specialized inputs and labor, and allows for β€œthe mysteries of the trade” to be shared with one another (Marshall 1920). Even with the remarkable decrease in transportation costs and the development of knowledge diffusion mechanisms that readily traverse geographic boundaries, if anything this pattern of agglomeration has become stronger over the years (Storper and Venables 2004). It is a continuing trend of urban economies that employment and firms are geographically concentrated, with more successful regions experiencing a resurgence of agglomerative activity (Scott et al. 2001). 72 A key argument that is proposed in this paper is that social interactions and social capital have a geographic dimension, over and above that of individual networks of relations, whether they be between firms or individuals. Moreover, I argue that such social forces are an underlying agglomerative mechanism, much like the Marshallian microfoundations. It is a well-documented fact that individuals primarily have connections with others that reside or work in the same region, with the odds of maintaining relationships sharply declining with distance (McPherson, Smith-Lovin, and Cook 2001; Zipf 1949). Chen et al. (2010) document how the high concentration of venture capital in select regions may award advantages to entrepreneurs within locations for which venture capital is abundant, where even for similar projects the entrepreneur located within the region would enjoy an advantage compared to those farther away. As the venture capital market is highly influenced by social connections (Sorenson and Stuart 2001), such an example is a case where regional social capital benefits the entrepreneur, regardless of individual networks of relations. A similar argument can be made for immigrant entrepreneurs, who benefit extensively from regional social capital that is driven by the high concentration of homogenous ethnic groups (Wilson and Portes 1980). The economic geography literature in particular has paid much attention to the geography of social interactions across a broad variety of applications. Currid and Williams (2010) study the spatial and geographic dimensions of the social milieu within the context of cultural industries, and find that social geography exhibits nonrandom spatial clustering and that such clusters tend to reinforce themselves. BΓΌrker and Minerva (2014) consider the variability of civic capital both across and 73 within regions, and find that variable endowments of civicness affects economic outcomes, in particular the size distribution of plants. Many others have studied the relationship between geography and social interactions within different contexts, from the knowledge spillover activities of mobile inventors (Agrawal, Cockburn, and McHale 2006) to knowledge flows across European regions (Caragliu and Nijkamp 2016), and even for manufacturing industry networks in Tanzania (Murphy 2003). The overwhelming consensus is that social interactions and social capital yields economic benefits, and most critically that this link is fundamental to theories of agglomeration (Kemeny et al. 2016). Like social capital, entrepreneurship is also geographically concentrated. Fairlie (2014) finds that entrepreneurship rates (calculated as the percentage of individuals age twenty to sixty-four who report owning a new business) differ significantly across states, with California, Montana, and South Dakota exhibiting higher and the Northeast and Midwest states showing lower levels of firm formation. The discrepancy is apparent at the Metropolitan Statistical Area (MSA) level as well, with Los Angeles-Long Beach-Santa Ana reporting entrepreneurship rates greater than 0.5% of the labor force while Detroit-Warren-Livonia experiencing rates lower than 0.2%. Since entrepreneurship is also a form of economic activity that benefits from productivity gains, it is intuitive to theorize that it too will benefit from agglomeration externalities of the type suggested by Marshall. Broadly, it has been theorized that the difference in rates of entrepreneurship can be explained by (1) differential returns to entrepreneurship, (2) differential availability of inputs and human capital, (3) 74 differential supplies of ideas, and (4) differences in the local culture (Glaeser, Rosenthal, and Strange 2010). It is remarkable how the theories that hypothesize causes of differential rates of entrepreneurship parallel well-known theories of agglomeration. Thus as mentioned above, there is reason to believe that agglomeration externalities that result in productivity gains for the entrepreneur would also be influenced by social interactions and social capital at the regional level. As virtually all economic behavior is embedded in networks of social relations (Granovetter 1995; Ioannides 2013), it should be the case that social capital (of the positive sort) should affect positively entrepreneurial outcomes. An extensive literature has documented how the social capital of entrepreneurs impacts the success of their ventures (for example, Sorenson 2005; Stuart and Sorenson 2005), with personal and professional relationships with critical actors that act as brokers of valuable entrepreneurial resources being instrumental in aiding business creation (Hoang and Antoncic 2003). Within the literature, there exists a conceptual distinction between two types of community social capital; namely bonding and bridging (de Souza Briggs 1998; R. D. Putnam, Leonardi, and Nanetti 1993; R. D. Putnam 2001). While here I focus on the aggregate level, this distinction closely mirrors that of social capital theory at the individual level, which differentiates between strong ties that are characteristic of homophilous interactions (Coleman 1988) and weak ties that traverse greater social distance and connect otherwise disparate groups (Burt 2005; Granovetter 1973). At the regional level, bonding social capital is characteristic of strong, repeated interactions among individuals or groups who are like one another, and are thought of to promote 75 trust, reciprocity, and enforcement of social norms. Storper and Venables (2004), while not explicitly coining the term, talk about how such repeated interactions (termed face to face contact, or buzz) are beneficial by not only promoting trust, but also by allowing for rapid feedback in communication and reducing the tendency for free riding, among other factors. On the contrary, bridging social capital has been commonly thought of as aiding individuals in β€œgetting ahead,” with widely dispersed ties to people and institutions with different backgrounds helping in gaining non- redundant information and unique insights (Burt 2004; de Souza Briggs 1998; R. Putnam et al. 2004). Most famously, Jacobs (1969) promoted the closely related view that new ideas and innovation come from diversity, arguing that innovations are the product of cross-industry fertilization made possible by contact with individuals with different perspectives. Under this theoretical lens, in the context of entrepreneurship, bonding social capital at the regional level may help entrepreneurs by, for example, promoting easier access to resources (especially club goods), or by reducing transaction costs otherwise incurred when dealing with unfamiliar contacts. On the contrary, bridging social capital would positively benefit entrepreneurs who seek information on promising business ventures or innovative knowledge. This paper contributes to the growing literature on entrepreneurship and social interactions by distinguishing between these two forces, and accounting explicitly for their effects on entrepreneurship across a variety of different industries. 3.2.2 A network theoretic approach to the entrepreneurial ecosystem As theories of agglomeration and social capital both embed the concept of interactions (whether among firms or individuals), I argue that they can be characterized under a 76 unifying lens that considers these interactions as linkages that are part of broader networks of industries, organizations, and individuals. A network theoretic approach to defining agglomeration externalities as well as social capital is appealing in that networks – by definition – are the joint set of nodes and their linkages, however these nodes are defined. Without explicitly considering network typologies, agglomeration scholars have already introduced network theoretic concepts in relating each of the Marshallian factors with various economic outcomes. For example, Ellison et al. (2010) derive a metric of coagglomeration that measures the strength of agglomerative forces between industry pairs. The index can be thought of as representing the strength of linkages between industries, defined over a geographic space. Furthermore, their metrics of proximity to suppliers and consumers, labor market pooling, and technology spillovers bear a noteworthy resemblance to proximity metrics that have been calculated in network settings (Hidalgo et al. 2007; Hidalgo and Hausmann 2009). Similarly, Glaeser and Kerr (2009) also define Marshallian economies using indices that consider pairwise proximities of industries based on input-output, labor market, and technological proximities. The key distinction between these studies and those that explicitly consider network topologies is that the former only consider either dyadic relationships (i.e. between any two given industries) or direct linkages (i.e. between a particular industry and its first-degree neighbors), while the latter consider all linkages and the aggregate properties of networks. Recent work on complex networks has focused on analytical methods and concepts which consider not only the linkages of individual nodes but also the aggregate characteristics of the network as a whole (Hausmann and Klinger 2006; 77 Hidalgo et al. 2007; Hidalgo and Hausmann 2009; Jackson 2008). In particular, Hidalgo et al. (2007) consider the concept of product space that is characterized by the revealed linkages between products in global trade, where the linkages are defined based on joint trade patterns between products across countries. In related work, Hidalgo and Hausmann (2009) develop a method to calculate the competitiveness of countries based on these revealed trade patterns that takes into account both direct and undirect linkages (i.e. links of neighbors, links of neighbors of neighbors, etc) between products and countries. Within the economic geography literature, firms, industries and products have been viewed within a network paradigm to map global cluster networks of high-tech industries, and common network techniques such as community structure detection have been used to explain increased geographic concentration in particular industries (Turkina, Van Assche, and Kali 2016). A key issue when considering networks is how to define nodes and their linkages. I argue that each of the Marshallian economies as well as regional social capital should be represented by distinct networks of nodes and links based on their theoretical underpinnings, but under a unifying framework that considers them as characteristics of networks that represent the broader entrepreneurial ecosystem of a region. For example, proximity to suppliers and consumers can readily be characterized as a network of industries (the β€œindustry space”), where the industries are related based on how much inputs or outputs they share (the linkages). However, the degree of proximity (i.e. strength of linkages) between industries in a region may also differ based on regional patterns of specialization (i.e. the overall position of the region within the industry space). Consider the case of the health care and 78 manufacturing industry, for which overall input-output linkages would be relatively weak. If a region is highly specialized in manufacturing, even if the linkages between the two industries are weak, the health care industry will be more susceptible to shocks in the manufacturing sector than, say the health care sector of a region for which manufacturing is relatively absent. Furthermore, if the manufacturing sector is itself strongly linked to the retail trade sector, and this sector is also highly specialized, shocks in the trade sector should have strong repercussions on both the manufacturing and the health care industry even if the direct link between health care and trade is relatively weak. It is because of this interconnectedness that the aggregate network characteristics of a region as a whole should also be considered in addition to individual pairwise proximities when measuring the strength of supplier and consumer linkages. The same argument applies for the other factors as well, differing only in how the nodes and linkages are defined. The metrics introduced in the following section are an attempt to merge and capture the key aspects of both the pairwise proximities and regional characteristics within an index that can be readily measured using available data sources. 3.3. Data and variables The unit of analysis for the study is the MSA-industry pair, where MSAs are defined using the 2009 MSA definitions taken from the US Office of Management and Budget (OMB)17, and industries are defined at the 4-digit level using 2007 definitions of the 17 I focus on the MSA level for a number of reasons. First, data for some variables were not available for lower levels of geography, and in many cases, available data were too noisy for proper estimation. This is especially pronounced for patent data, where the locations assigned to the inventors has been noted as a source of random measurement error at lower levels of geography (Agrawal et al. 2014). 79 North American Industry Classification System (NAICS)18. I only considered MSAs within the lower 48 states, and further excluded 7 MSAs that did not have reliable demographic information from the American Community Survey (ACS).19 Due to the panel nature of the data, concordances between 2002, 2007, and 2012 NAICS classifications were made.20 For industries, I excluded agriculture, private households, and public administration, as well as some industries for which entrepreneurship data was not available.21 This resulted in a dataset that included 356 MSAs and 282 industries, for a total of 100,392 MSA-industry pairs. The dataset spans the years 2005 to 2013, where entrepreneurship data was collected for the years 2006 to 2013 while the underlying variables were for the years 2005 to 2012 (a 1 year lag). This resulted in 8 years of data, for a total of 803,136 observations. 3.3.1 Entrepreneurship Entrepreneurship data were drawn from the Statistics of U.S. Businesses (SUSB), an annual dataset produced by the US Census Bureau that provides detailed geographic and industry level data on the number of establishments and employment levels, as Furthermore, the metrics that proxy for the entrepreneurial ecosystem were deemed more accurate when considered at the MSA level, as opposed to the county or zip-code level, where regional characteristics may not be fully represented. 18 The 4-digit NAICS level was used to strike a balance between appropriate granularity and error due to constructing concordances between different classifications. In addition, occupation data for industries below the 4-digit level suffered from a high percentage of non-disclosure, which rendered noisy estimates for the calculated metrics. 19 The seven MSAs were Cape Girardeau-Jackson, MO-IL (16020), Carson City, NV (16180), Hinesville- Fort Stewart, GA (25980), Lewiston, ID-WA (30300), Manhattan, KS (31740), Mankato-North Mankato, MN (31860), and Steubenville-Weirton, OH-WV (44600). 20 The differences were minimal across these classifications at the 4-digit level. In cases where 4-digit codes for 2002 and 2012 NAICS classifications did not fully map onto the 2007 definitions, a concordance was made based on relative employment levels for industries at the 6-digit level that shared a 4-digit industry. Roughly 10 4-digit industries were affected by this mapping scheme. 21 These included postal services (NAICS 4911), rail transportation (NAICS 4821), and insurance and employee benefit funds (NAICS 5251). 80 well as information on firm births, deaths, expansions, and contractions.22 The SUSB provides data for the universe of US establishments with paid employees, using the Census Bureau’s Business Register as its underlying source.23 In this sense, it is similar to other databases such as the Longitudinal Business Database (LBD), while having the advantage of being more readily available as its use is not restricted to qualified researchers through Census data centers. The SUSB distinguishes between single-unit start-ups (i.e. new firms) and start-ups that are part of a multi-unit enterprise. The primary focus of this paper is on single-unit start-ups, yet comparisons are made with start-up activity including those that are expansions of existing enterprises. Counts of new firms for a given MSA-industry pair are calculated at the 4-digit industry level for the years 2006 to 2013, which is used as the main outcome variable. I focus on counts of new establishments instead of employment counts, as employment counts at detailed geographies and industries suffered from disclosure issues while establishment counts remained uncensored. Nonetheless, previous studies have shown that empirical results obtained from considering establishment counts are very much similar to those where employment counts were considered (Rosenthal and Strange 2003), and as such it is expected that the results will be little affected by the choice of establishments over employment.24 Considering that this timeline 22 While the publicly available data for the SUSB only provide establishment and employment change data at the state level, special tabulations are available at the county level for a reasonable cost. These tabulations were used to construct the dataset, aggregating county data to the MSA level using 2009 MSA definitions. 23 As such, the SUSB excludes non-employer businesses such as sole proprietors with no paid employees. 24 Furthermore, start-ups typically begin with a very small number of employees, which should also make the difference between considering establishments instead of employment minimal. 81 encompasses the recent recession years, I was able to distinguish patterns of entrepreneurship before and after the recession. Table 3.1 provides a summary of the count of new firms as well as entry rates (calculated as new firm births divided by the number of incumbent establishments) for all industries as well as for select industry groups.25 It can be seen that entrepreneurship in the US is dominated by the local, low-tech, and non- manufacturing sectors, with higher overall counts of new firms as well as higher entry rates. However, research suggests that the traded, high-tech, and manufacturing industries account for a disproportionate level of employment despite their small share of total establishments, and usually reward higher wages to their employees (Delgado, Porter, and Stern 2016), and as such disentangling the determinants of entrepreneurship for these industries is of great importance. Furthermore, when examining entrepreneurship rates before and after the recession, overall these industries remained relatively resilient, which suggests that these industries are less susceptible to shocks in the business cycle and may be more reliable sources of job creation. Overall, the manufacturing sector was the most resilient in terms of entrepreneurship rates, while the locally traded industries (which are highly dependent on local demand) were the hardest hit. 3.3.2 Labor market pooling One of the main reasons firms agglomerate is to benefit from scale economies associated with a large labor pool (Ellison, Glaeser, and Kerr 2010). The benefits of 25 I use the classification scheme developed by Delgado, Porter, and Stern (2016) to distinguish between traded, local, high-tech, and low-tech industries. Manufacturing refers to the industries classified under sectors 31-33 in the 2007 NAICS. 82 Table 3.1. Count of new firms and entry rates for single and all establishment births Single (start-up) establishment births All establishment births Category Total Total births Rate Rate Rate Rate Rate Rate births βˆ†a 2006-2013 all 2006- 2010- βˆ†s all 2006- 2010- 2006-2013 (% of total) years 2009 2013 years 2009 2013 (% of total) All industries 4,974,638 8.8 9.3 8.3 -10.0 6,206,174 11.0 11.6 10.4 -10.7 Traded 1,162,522 9.0 9.3 8.7 -6.1 1,560,027 12.0 12.5 11.6 -7.4 (23.4) (25.1) Local 3,812,116 8.8 9.3 8.2 -11.2 4,646,147 10.7 11.3 10.0 -11.8 (76.6) (74.9) High-tech 24,409 5.0 5.2 4.8 -8.3 43,241 8.7 9.5 8.0 -15.5 (0.5) (0.7) Low-tech 4,950,229 8.8 9.3 8.4 -10.0 6,162,933 11.0 11.6 10.4 -10.7 (99.5) (99.3) Manuf. 147,034 6.1 6.2 6.0 -3.2 164,234 6.8 7.0 6.6 -5.3 (3.0) (2.6) Non-manuf. 4,827,604 8.9 9.4 8.4 -10.3 6,041,940 11.2 11.8 10.5 -11.0 (97.0) (97.4) Notes: Single establishment births refers to births excluding those part of an enterprise, while all establishment births includes all types of births. Entry rates are calculated as the average across the years of new firms divided by incumbent firms, in percentages. βˆ†s refers to the change in entry rates, calculated as the difference between rates for 2010-2013 and 2006-2009 divided by the rate for 2006- 2009, in percentages. such labor pools is well documented, with Marshall (1920) suggesting that a large labor market allows for workers to readily shift across employers, thus reducing labor market uncertainty. Helsley and Strange (1990) suggest that in addition to these benefits, a large labor pool facilitates better matches between firms and workers, which would increase firm productivity, while Combes and Duranton (2006) suggest 83 that entrepreneurs start firms in agglomerated areas due to better access to a suitable labor force. To measure the extent to which a particular industry within a given MSA is closely matched to the labor market characteristics of the region, I first construct a network of industries that are linked based on similarities in occupational composition. To do this, I use detailed data on the occupational composition of employment in industries taken from the Occupational Employment Statistics (OES) program administered by the Bureau of Labor Statistics, pooled across the panel years.26 This dataset provides detailed employment patterns for all industries across roughly 800 occupations, and serves as the baseline for calculating the pairwise proximity between two given industries. This proximity measure is analogous to that calculated by Ellison et al. (2010), where the pairwise correlations between industries i and j across occupations is calculated based on employment shares as πœ™π‘™π‘Žπ‘π‘œπ‘Ÿπ‘–π‘— = πΆπ‘œπ‘Ÿπ‘Ÿ 𝑠(πΈπ‘šπ‘π‘™π‘œπ‘¦π‘šπ‘’π‘›π‘‘π‘–π‘œ, πΈπ‘šπ‘π‘™π‘œπ‘¦π‘šπ‘’π‘›π‘‘π‘—π‘œ) where πΈπ‘šπ‘π‘™π‘œπ‘¦π‘šπ‘’π‘›π‘‘π‘–π‘œ is the fraction of industry i’s employment in occupation o, and πΆπ‘œπ‘Ÿπ‘Ÿπ‘  refers to Spearman’s rank correlation.27 The mean value for the proximity metric is 0.529, with the lowest proximity value being -0.061 between Leather and Hide Tanning and Finishing (NAICS 3161) and Colleges, Universities, and Professional Schools (NAICS 6113), while the highest value being 0.952 for Electrical and Electronic Goods Merchant Wholesalers (NAICS 4236) and Hardware, and Plumbing and Heating Equipment and Supplies Merchant Wholesalers (NAICS 4237). 26 I pool across the panel years to reduce the chance of variability due to external factors that are not related to actual similarities in employment patterns. 27 I use Spearman’s correlations over Pearson correlations due to the high skewness of employment patterns and to mitigate the effect of outliers in the data. 84 As mentioned previously, in order to more accurately consider the aggregate labor market characteristics of a region and the indirect linkages between link neighbors, I calculate the Eigenvector centrality of each industry based on the pairwise proximity measure.28 Eigenvector centrality has a long history within both the economics and sociology literature, going back to at least Leontief (1941). In essence, this measure of network centrality is based on the simple idea that a node is important if it is linked to other important nodes, and differs from degree centrality (i.e. the link counts) in that a node that may have a high (low) centrality if it is linked to others who are more (less) important, even if its degree is low (high). If a node’s centrality (i.e. importance) is proportional to the sum of neighbor’s centralities, this relationship can be represented simply as π‘π‘™π‘Žπ‘π‘œπ‘Ÿ 1 𝑖 = βˆ‘πœ™π‘–π‘—π‘ π‘™π‘Žπ‘π‘œπ‘Ÿ 𝑗 π‘“π‘œπ‘Ÿ πœ† β‰  0 πœ† 𝑗≠𝑖 where π‘π‘™π‘Žπ‘π‘œπ‘Ÿπ‘– denotes centrality and πœ† is a constant. In matrix form, this can be represented as λ𝐜 = 𝚽𝐜 . Hence the centrality vector 𝐜 is the eigenvector of the adjacency matrix 𝚽 associated with the eigenvalue Ξ», which gives the centrality measure its name.29 It is important to note that this centrality vector is calculated using the system of equations that are represented by this relationship, and thus considers both direct and indirect linkages.30 28 The correlations are rescaled to be between 0 and 1 in order to facilitate calculation of the centrality metric. 29 The standard procedure is to choose Ξ» as the largest eigenvalue such that the eigenvector centralities take non-negative values. 30 I utilize the igraph package available for the R software environment to calculate the centrality measure. 85 Finally, in order to map the aggregate characteristics of the region onto the network, I compute the standard location quotient for each MSA-industry pair as πΈπ‘–π‘Ÿπ‘‘ LQπ‘–π‘Ÿπ‘‘ = 𝐸𝑖𝑑 where πΈπ‘–π‘Ÿπ‘‘ is the share of establishments for industry 𝑖 in region π‘Ÿ at time 𝑑, and 𝐸𝑖𝑑 is the share of establishments for industry 𝑖 at time 𝑑 for the US. Then, the index for labor market proximity for a particular MSA-industry pair is calculated as βˆ‘ LQ βˆ™ π‘π‘™π‘Žπ‘π‘œπ‘Ÿ βˆ™ πœ™π‘™π‘Žπ‘π‘œπ‘Ÿ π‘™π‘Žπ‘π‘œπ‘Ÿ 𝑗≠𝑖 π‘—π‘Ÿπ‘‘ 𝑗 𝑖𝑗PROXπ‘–π‘Ÿπ‘‘ = π‘™π‘Žπ‘π‘œπ‘Ÿ . βˆ‘π‘—β‰ π‘– LQπ‘—π‘Ÿπ‘‘ βˆ™ 𝑐𝑗 This metric can be thought of as the weighted average proximity between industry i and the rest of the industry space, where the weights are proportional to both industry specialization patterns in the region and the centrality of the other industries.31 The highest calculated labor market proximity was 0.901 for Other Textile Product Mills (NAICS 3149) in the Dalton, GA MSA in year 2012, and the lowest proximity was 0.540 for Colleges, Universities, and Professional Schools (NAICS 6113) in the same MSA in year 2010. 3.3.3 Customer supplier linkages Another key reason why firms agglomerate is to reduce transportation costs in obtaining inputs or in shipping goods to customers. The concentration of firms in a region enables them to share a pool of suppliers while at the same time be closer to customers. Following previous work (Ellison, Glaeser, and Kerr 2010; Jofre-Monseny, MarΓ­n-LΓ³pez, and Viladecans-Marsal 2011), I utilize data from the 2007 Benchmark Input-Output Accounts published by the Bureau of Economic Analysis (BEA) 31 The eigenvector centrality is scaled such that it takes values between 0 and 1. 86 aggregated to the 4-digit industry level to calculate the pairwise proximity metric between two given industries based on customer-supplier relations.32 Specifically, the input-output proximity between two industries is calculated as πœ™πΌπ‘‚π‘–π‘— = max{ 𝐼𝑛𝑝𝑒𝑑𝑖←𝑗 , 𝐼𝑛𝑝𝑒𝑑𝑗←𝑖 , 𝑂𝑒𝑑𝑝𝑒𝑑𝑖→𝑗 , 𝑂𝑒𝑑𝑝𝑒𝑑𝑗→𝑖} where 𝐼𝑛𝑝𝑒𝑑𝑖←𝑗 is the share of industry i’s inputs that come from industry j, and 𝑂𝑒𝑑𝑝𝑒𝑑𝑖→𝑗 is the share of industry i’s outputs sold to industry j. 33 The maximum of the four values corresponding to input and output flows is used due to asymmetries within and across input and output measures, as well as the fact that many industries report output sales to only itself (especially for non-manufacturing industries), which in such cases creates isolates (i.e. nodes with no links) within the network. Even so, a bulk of the pairwise industry proximities exhibited a value close to zero, while the highest proximity value was 0.742 between Iron and Steel Mills and Ferroalloy Manufacturing (NAICS 3311) and Steel Product Manufacturing from Purchased Steel (NAICS 3312). The eigenvector centralities are calculated in an analogous manner to that of the labor market pooling metric, using the proximities defined above. Then the index for input-output proximity for a particular MSA-industry pair is calculated as βˆ‘π‘—β‰ π‘– LQπ‘—π‘Ÿπ‘‘ βˆ™ 𝑐 𝐼𝑂 𝐼𝑂 𝐼𝑂 𝑗 βˆ™ πœ™π‘–π‘— PROXπ‘–π‘Ÿπ‘‘ = βˆ‘π‘—β‰ π‘– LQ βˆ™ 𝑐 𝐼𝑂 π‘—π‘Ÿπ‘‘ 𝑗 32 Specifically, I use the Make and Use tables (after redefinitions) to calculate the input-output proximities. 33 These shares are calculated based on total industry input and output, which includes final demand and the government. 87 where 𝑐𝐼𝑂𝑗 is the eigenvector centrality based on input-output linkages. Again, this metric can be thought of as the weighted average proximity between a reference industry i and the rest of the industry space. The highest observed input-output proximity was 0.400 for Petroleum and Coal Products Manufacturing in the Midland, TX MSA in year 2005, while the lowest observed proximity was 0.0004 for Other Investment Pools and Funds (NAICS 5259) in the Morristown, TN MSA for the same year. 3.3.4. Knowledge spillovers Despite their importance, knowledge spillovers are notoriously difficult to measure. They encompass many different areas of both economics and sociology, including growth theory and theories related to human capital accumulation. Unlike input sharing, knowledge spillovers are inherently a non-market exchange where the product is not bought or sold, and even in the case where there is an exchange, it is most likely to be a complicated venture between a variety of institutions (Rosenthal and Strange 2004). Previous studies have mainly relied on direct evidence of knowledge spillovers through patent citation data (for example Glaeser and Kerr 2009; Jaffe, Trajtenberg, and Henderson 1993), or through Scherer’s (1984) technology matrix which measures R&D activity flows between industries. However, such sources only reflect flows of knowledge at the highest level, and arguably do not well represent the idea Marshall had in mind when mentioning how Sheffield cutlery workers took advantage of the secrets of their trade available as local public goods (Rosenthal and Strange 2004). Furthermore, in the case of patent citation data, the classification of industries only encompasses manufacturing industries, which renders related metrics useless when 88 considering industries in other sectors such as services or trade. The use of Scherer’s technology matrix is also difficult to justify in that it is based on data that predates the current study by over 30 years. To overcome these challenges, I calculate two related metrics for knowledge spillovers that take into consideration the opportunities for knowledge exchange within a given region. As such, the metric is not region-industry specific, but rather one that is defined for a particular region as a whole.34 I argue that such a regional metric is more closely related to both Marshall’s and Jacobs’ view of how knowledge spillover takes place. It is important to note that as mentioned previously, knowledge spillover theories overlap considerably with theories of community social capital, especially of the bridging type that considers important a diverse set of interactions within a region which are conducive to the accumulation of a wide variety of non- redundant information. As such, I view the derived metrics for knowledge spillovers also as a proxy for the bridging social capital of a region. For the first metric, I make use of a detailed dataset on the universe of nonprofit organizations within the US collected as a collaboration between the National Center for Charitable Statistics (NCCS) and the Internal Revenue Service (IRS).35 This dataset provides detailed classifications as well as geographic data for all nonprofit organizations that are catalogued with the IRS for which forms 501(c) are filed. Nonprofits are classified into roughly 600 mutually exclusive groups based on 34 Other studies, such as Rauch (1993), consider average levels of education as a proxy for knowledge spillover capacity. 35 The data are not publicly available, but are disclosed to qualified researchers at a minimal cost. Detailed information on the dataset as well as the classification scheme is provided at http://nccs.urban.org/Learn-About-NCCS-Data.cfm. 89 the National Taxonomy of Exempt Entities (NTEE) and encompass a broad variety of organizations ranging from arts and humanities organizations to religious congregations. To measure knowledge spillover capacity, within the set of nonprofits I only consider organizations that are classified within the education, medical research, science and technology, and social science groups, as well as those that report as being research institutes or organizations that conduct public policy analysis.36 This resulted in a total of 127 organization types being included. While this set of organizations is admittedly limited, most educational institutions (including those that are outside the formal educational system) are nonprofits, and nonprofits that primarily engage in research activity are likely to be involved in activities within a region for which informal knowledge spillover takes place. Furthermore, these organizations include those that primarily engage in charitable giving (such as grants) within the knowledge sector, which is another important form of spillover activity. As such, the presence of these organizations in a region should proxy for informal knowledge spillovers that are not captured by higher level measures such as patents, which are mainly filed for commercial gain.37 I define an organizational network for the informal knowledge sector where the different organization types are the nodes, and the linkages are defined as the revealed agglomeration patterns of these organizations across the MSAs. Specifically, I define 36 These correspond to NTEE major codes B (education), H (medical research), U (science and technology), and V (social science), as well as nonprofits that are classified under the subgroup β€œResearch Institutes & Public Policy Analysis” under all major codes. 37 Nonetheless, the metrics for knowledge spillover are weaker than that for input-output linkages or labor pooling, and as such it is anticipated that their relationship with entrepreneurship will be weaker. Furthermore, these metrics are not mutually exclusive to the input-output or labor pooling metrics, as knowledge spillovers may occur through input-output or labor market interactions. 90 the pairwise proximity between any two organization groups as the correlation across regions of establishment shares, or 𝑖𝑛𝑓 πœ™ 𝑠𝑖𝑗 = πΆπ‘œπ‘Ÿπ‘Ÿ (πΈπ‘–π‘Ÿ , πΈπ‘—π‘Ÿ) where πΈπ‘–π‘Ÿ is the establishment share of group i in region r. 38 The idea is that if two organization types are closely related due to similar interests or requirements of physical factors, information, or technology, they will tend to be located in tandem, while dissimilar types will be less likely to be collocated within the same region.39 Measured this way, the highest calculated pairwise proximity was 0.468 for Eye Diseases, Blindness & Vision Impairments Research (NTEE H41) and Surgical Specialties Research (NTEE H9B), while the lowest was surprisingly for Scholarships & Student Financial Aid (NTEE B82) and Student Sororities & Fraternities (NTEE B83) with a value of -0.456. To capture the density of knowledge spillover linkages within a region, I calculate a network density measure that captures the average proximity between all organization groups, weighted by the relative regional presence of these organizations in the region, where 𝑖𝑛𝑓 𝑖𝑛𝑓 DENπ‘Ÿπ‘‘ =βˆ‘βˆ‘πΈπ‘–π‘Ÿπ‘‘ βˆ™ πΈπ‘—π‘Ÿπ‘‘ βˆ™ πœ™π‘–π‘— . 𝑖 𝑗 Higher density values would suggest that there is less knowledge spillover capacity within a region, as the knowledge pool would be limited to relatively like-minded 38 As for the labor pooling metric, I pool across the panel years to reduce variability due to erroneous external shocks. 39 In this sense it is related in concept to the Ellison-Glaeser (EG) index of industry coagglomeration (Ellison, Glaeser, and Kerr 2010). 91 organizations.40 The lowest density value was 0.405 for Fort Collins-Loveland, CO in year 2008, while the highest value was 0.542 for St. George, UT in the same year. The second knowledge spillover metric aims to capture the formal type of spillovers that are mainly due to commercialization of knowledge. I utilize annual MSA level patenting data from the US Patent and Trademark Office (USPTO), where patents are classified based on a detailed system that includes 473 patent classifications. To create a measure of the knowledge pool of the region, I constructed moving totals for patenting activity beginning with the year 2000, which was the earliest year for which MSA level data were available. Thus the sum of accepted patents for the years 2000 to 2005 represent the knowledge pool for the year 2005 (the beginning of the panel), and the sum for years 2001 to 2006 correspond to 2006 levels, π‘“π‘œπ‘Ÿ and so forth. The pairwise proximity and density (DENπ‘Ÿπ‘‘ ) metrics were calculated analogously as for the informal knowledge spillover metric, where the underlying network consisted of the patent classes as nodes and correlations (proximities between patent classes) as linkages. The highest pairwise proximity was 0.787 for Active solid- state devices (class 257) and Semiconductor device manufacturing (class 438), while the lowest was -0.120 for Static structures (class 052) and Single-crystal, oriented- crystal, and epitaxy growth processes (class 117). The highest density value was 0.736 for Burlington-South Burlington, VT in year 2012, and the lowest was 0.557 for Valdosta, GA in year 2010. 3.3.5 Bonding social capital 40 Again, the correlations are rescaled to take values between 0 and 1. 92 In order to capture the bonding social capital of a region, I again utilized the NCCS nonprofit dataset, in this case excluding the organizations that were used to construct the informal knowledge spillover metric. In addition, I excluded some organization groups that did not directly relate to community activity, such as public utilities and transportation systems, corporate foundations, insurance providers, or other pension or retirement funds.41 This resulted in a total of 467 nonprofit groups, for which the pairwise proximities were calculated, analogous to the metrics for knowledge spillovers. The highest proximity was 0.605 for Christianity (NTEE X20) and Protestant (NTEE X21) congregations, while the lowest was -0.459 for private independent charities (NTEE T22) and community service clubs (NTEE S80). For the density metric (DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ ), the highest value was 0.535 for Winston-Salem, NC in year 2008, while the lowest value was 0.482 for Augusta-Richmond County, GA-SC in year 2012. Contrary to the knowledge spillover metrics that calculate the bridging social capital of a region, according to social capital theory, a higher density value would be suggestive of a stronger community for which more social interactions and face-to-face contact are present. 3.3.6 Other factors In addition to the key metrics described above, I included other factors that could influence entrepreneurship. Most importantly, following prior work (Delgado, Porter, 41 Specifically, corporate foundations (NTEE T21), public transportation systems (NTEE W40), public utilities (NTEE W80), insurance providers (NTEE Y20), state sponsored worker compensation reinsurance organizations (NTEE Y25), pension and retirement funds (NTEE Y30), teacher’s retirement fund associations (NTEE Y33), employee funded pension trusts (NTEE Y34), and multi-employer pension plans (NTEE Y35) were excluded. I also excluded unclassified organizations and organizations for which the headquarters were outside of the US. 93 and Stern 2010; Glaeser et al. 1992; Porter 2003), I included the location quotient (LQπ‘–π‘Ÿπ‘‘) of the MSA-industry pair as a measure of specialization. This measure proxies for the degree to which the industry is over-represented in the MSA, and is theorized to have a strong positive effect on entry rates for a given MSA-industry pair. In addition, I included the total number of nonprofit organizations per capita (NPPCπ‘Ÿπ‘‘) as well as the total number of patents per capita (PATPCπ‘Ÿπ‘‘) to distinguish aggregate size effects of nonprofit organization presence and patent activity with the effects of the metrics described in the previous section. I also included several control variables that have been theorized to impact new firm entry. I included industrial diversity, calculated as the Hirschman-Herfindahl index (HHIπ‘Ÿπ‘‘) based on 4-digit industry establishment counts for a given MSA. Most notably, Jacobs (1969) and Glaeser et al. (1992) suggest that industrial diversity may be a potential source of knowledge spillovers, in which case it would positively affect entrepreneurship. However, higher diversity may also represent the lack of strong clusters (Porter 2003) and localization economies, in which case the effect would be negative. Thus the direction of the relationship between diversity and entrepreneurship is ambiguous. In addition, I included a metric of market access (MAπ‘Ÿπ‘‘) that proxies for the relative size of markets in neighboring regions. Higher market access would mean a higher potential for opportunities for interactions with neighboring regions, yet it may also result in the crowding out of local industries due to regional competition. As such, its relationship with entrepreneurship is also ambiguous. I calculate market access as 94 POP𝑠𝑑 MAπ‘Ÿπ‘‘ =βˆ‘ 𝑑2π‘Ÿπ‘  π‘ β‰ π‘Ÿ where POP𝑠𝑑 is the population of the neighboring region and 𝑑 2 π‘Ÿπ‘  is the square of the distance between the centroids of the MSAs. I set a threshold value of 300 miles in calculating this metric to reflect a reasonable distance for which motor vehicle travel could possibly occur within a day’s journey (Mukim 2014).42 I also included homeownership rates (HOMEπ‘Ÿπ‘‘) as well as the percentage of the population that is foreign born (FOREIGNπ‘Ÿπ‘‘) to control for other possible social factors that may confound the key variables of interest. Glaeser (2001) suggests that a significant determinant of social capital within a region is homeownership rates, for homeownership may create direct financial incentives for investment in social capital. The percentage of foreign born population has also been viewed to influence not only the social capital of a region (Wilson and Portes 1980), but also to be directly correlated with entrepreneurship rates (Reynolds and Curtin 2009), for foreigners are documented to exhibit significantly higher rates of entrepreneurship compared to native groups. Finally, I included the standard population (POPπ‘Ÿπ‘‘), per capita income (PCINCπ‘Ÿπ‘‘), and educational attainment (EDUCπ‘Ÿπ‘‘, measured as the percentage of the population 25 to 65 with a bachelor’s degree or higher) variables to control for other MSA level characteristics that may impact entrepreneurship rates.43 All demographic variables were calculated using American Community Survey (ACS) 1 year 42 The centroids and their distances are calculated using county distance data housed within the National Bureau of Economic Research (NBER) database. MSA centroids are calculated as the centroid of the county within the MSA that was home to the largest fraction of the MSA population. 43 Glaeser (2001) also points out that education levels are strong predictors of social capital, for more educated individuals are likely to invest more in their social connections. 95 Table 3.2. Select descriptive statistics for variables N = 803,136 Std Variables Mean Min Max Dev. Dependent variables (count of births) 𝑠𝑖𝑛𝑔𝑙𝑒 Births of single establishments (B ) 4.877 32.30 0 2,911 π‘–π‘Ÿπ‘‘ Births of all establishments (Bπ‘Žπ‘™π‘™) 6.143 36.06 0 3,034 π‘–π‘Ÿπ‘‘ Industry by MSA by year characteristics Location quotient (LQ ) 1.069 2.183 0 184.7 π‘–π‘Ÿπ‘‘ Labor market proximity (PROXπ‘™π‘Žπ‘π‘œπ‘Ÿ) 0.776 0.0568 0.540 0.901 π‘–π‘Ÿπ‘‘ Input-Output proximity (PROX𝐼𝑂 ) 9.32E-03 0.0107 4.23E-04 0.400 π‘–π‘Ÿπ‘‘ MSA by year characteristics 𝑖𝑛𝑓 Informal knowledge spillovers (DEN ) 0.475 0.0219 0.405 0.542 π‘Ÿπ‘‘ π‘“π‘œπ‘Ÿ Formal knowledge spillovers (DEN ) 0.604 0.0257 0.557 0.736 π‘Ÿπ‘‘ Bonding social capital (DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ ) 0.506 0.0041 0.482 0.535 Nonprofit organizations, per 1,000 4.691 1.548 1.150 18.02 (NPPCπ‘Ÿπ‘‘) Accepted patents, per 1,000 (PATPCπ‘Ÿπ‘‘) 1.442 2.346 0.0305 28.58 Controls Industrial diversity (HHI ) 0.0151 0.00189 0.0120 0.0307 π‘Ÿπ‘‘ Market access (MAπ‘Ÿπ‘‘) 2,455 2,161 51.09 18,289 Population (POPπ‘Ÿπ‘‘) 710,705 1.58E+06 69,922 1.92E+07 Per capita income (PCINCπ‘Ÿπ‘‘) 36,133 7,343 17,881 97,392 Educational attainment, % (EDUCπ‘Ÿπ‘‘) 25.44 8.025 10 59.10 Foreign born population, % (FOREIGNπ‘Ÿπ‘‘) 7.695 6.771 0.460 38.82 Homeownership rate, % (HOMEπ‘Ÿπ‘‘) 67.19 5.687 47.41 85.24 Unemployment rate, % (UNEMPπ‘Ÿπ‘‘) 7.040 2.987 2.017 28.90 96 estimates44 while per capita income was calculated using BEA Regional Economic Accounts. Table 3.2 presents select descriptive statistics for the variables included in the analysis (the correlation matrix for the variables is presented in Appendix A). It is noteworthy to mention that the dependent variable (count of firm births) is highly skewed, with roughly 65% of the MSA-industry-year observations exhibiting birth counts of zero. This raises technical issues related to the econometric specification of the model, for OLS estimation based on the logged values of the dependent variable cannot be fruitfully carried out as most observations are dropped when logged values are used. Furthermore, common transformations – such as adding 1 and subsequently taking the logs – are also troublesome due to the high percentage of zero values. The next section highlights these issues and the estimation strategy used for the empirical analysis. 3.4. Empirical framework 3.4.1 Model specification I based the estimation framework on a location choice model of entrepreneurs where an establishment is born when it is possible to earn non-negative profits within a region, taking the existing economic environment as given (see Rosenthal and Strange (2003) for a review of the model). In this sense, regional characteristics which increase productivity will result in higher levels of establishment births, and entrepreneurs 44 In cases where 1 year estimates at the county level were unavailable, I utilized 3 year ACS estimates. In cases where county level data was not available throughout the panel years, I utilized MSA level data. This resulted in a minor source of measurement error, for MSA definitions are not constant for the ACS over the panel years. However, the number of MSAs affected was very small, and even in these cases the definition changes mostly applied to smaller counties within a given MSA, rendering these errors minimal. 97 compare profitability across locations. It is assumed that location and decisions to found a new establishment are made at time t – 1, and establishments are born in the subsequent time period t. Since the main outcome variable of interest is the count of new establishments in a MSA-industry pair at time t, Poisson estimates of the coefficients can also be given a random profit maximization interpretation (Guimaraes, Figueirdo, and Woodward 2003; Jofre-Monseny, MarΓ­n-LΓ³pez, and Viladecans-Marsal 2011). Hence the baseline specification of the model is 𝐸(B π‘™π‘Žπ‘π‘œπ‘Ÿ πΌπ‘‚π‘–π‘Ÿπ‘‘) = exp (𝛼 + 𝛽0π΅π·π‘ˆπ‘€π‘–π‘Ÿ0 + 𝛽1LQπ‘–π‘Ÿπ‘‘βˆ’1 + 𝛽2PROXπ‘–π‘Ÿπ‘‘βˆ’1 + 𝛽3PROXπ‘–π‘Ÿπ‘‘βˆ’1 𝑖𝑛𝑓 π‘“π‘œπ‘Ÿ + 𝛽5DENπ‘Ÿπ‘‘βˆ’1 + 𝛽6DENπ‘Ÿπ‘‘βˆ’1 + 𝛽 DEN π‘ π‘œπ‘π‘–π‘Žπ‘™ 4 π‘Ÿπ‘‘βˆ’1 + 𝛽7NPPCπ‘Ÿπ‘‘βˆ’1 + 𝛽8PATPCπ‘Ÿπ‘‘βˆ’1 + Xπ‘Ÿπ‘‘βˆ’1𝜸 + I𝑖 + Rπ‘Ÿ + T𝑑) where I𝑖, Rπ‘Ÿ, and T𝑑 are the set of industry, MSA, and year fixed effects (for a total of 646 fixed effects in the model) and Xπ‘Ÿπ‘‘βˆ’1 is the set of control variables described in the previous section. Following Blundell et al. (1995) and Delgado et al. (2010), I also include an indicator variable for any pre-existing start-up activity for the years 2003 and 2004 (π΅π·π‘ˆπ‘€π‘–π‘Ÿ0) to control for additional unobservable characteristics of MSA- industry pairs which may impact establishment births. All of the explanatory variables are logged, and standardized to have mean zero and unit standard deviation to aid interpretation (Ellison, Glaeser, and Kerr 2010; Glaeser and Kerr 2009).45 As mentioned previously, high skewness and the large number of zero births for MSA-industry pairs presents a problem in linear estimation, which is one reason 45 Some of the MSA-industry pairs have zero values for the location quotients, and as such I sum 1 with the location quotient values before log transformation. Also, a dummy variable that indicates whether LQπ‘–π‘Ÿπ‘‘βˆ’1 = 0 was also included. 98 why the Poisson model is preferred. Furthermore, other non-linear models such as the Tobit or Negative Binomial suffer from the incidental parameters problem, where a large number of fixed effects leads to inconsistent estimation of the parameters under fixed 𝑇, 𝑁 β†’ ∞ asymptotics, since the number of parameters that need to be estimated grows arbitrarily large (Chamberlain 1980; Hsiao 1986). The Poisson model does not suffer from such bias, its consistency does not rest on additional assumptions concerning the distribution of the dependent variable with respect to the covariates (unlike the negative binomial), and the mean-variance equality restriction of the Poisson model may be relaxed using fully robust standard errors clustered at the panel level (Cameron and Trivedi 2013; Wooldridge 2010). Nonetheless, as has been previously noted (Rosenthal and Strange 2003), the problem of noisy estimates of fixed effects decreases as the number of observations per fixed effect grows large (in this case over 2,000), and thus as a robustness check I run the same regression using a fixed effects Probit model with a dummy for positive or zero births as the dependent variable.46 3.4.2 Endogeneity concerns It is important to note that there may be a number of other explanations for variations in new establishment counts in a particular MSA-industry pair. Most importantly, 46 I also conducted a preliminary analysis using a zero-inflated negative binomial regression (not reported), utilizing the Chamberlain-Mundlak Conditionally Correlated Random Effects (CCRE) model with cluster means in place of the fixed effects to check whether the large number of zeros and possible overdispersion affected the results. However, results were qualitatively similar to that of the Poisson model, which suggests that the large number of fixed effects and relevant controls adequately explained the excess zeros and apparent overdispersion in the count data. In addition, due to the large number of observations and variables as well as the complexity of two-step models, this method generated convergence problems in the estimation routines, which is why the Poisson model was preferred. 99 natural advantages of a region (such as proximity to an airport, rivers, or the sea) should positively impact establishment births regardless of Marshallian forces (Ellison, Glaeser, and Kerr 2010) or the social capital of a region, which may result in the agglomeration of new firms being the cause, and not the result of various inter- industry relations. Other difficult to measure factors such as the business culture of a region may also impact entry of new businesses. I include the full range of MSA, industry, and year fixed effects together to control for such unobservables. MSA fixed effects control for time-invariant characteristics such as natural endowments, climate, or other geographic features of the region, while industry fixed effects control for industry characteristics that are constant over time. Year fixed effects control for time- specific shocks such as macroeconomic conditions or business cycles, which is especially important in this setting due to the most recent recessionary years being included in the analysis. Using the count of new firms as the dependent variable also partially addresses omitted variables and simultaneity biases. Rosenthal and Strange (2003) point out that entrepreneurs are unconstrained by previous decisions and make location choices taking the existing economic environment as exogenously given. Furthermore, Becker and Henderson (2000) point out that time persistent location determinants can be successfully controlled for by conditioning firm births on the stock of pre-existing firms, which is done in this setting with the inclusion of the location quotient (as well as the number of nonprofit organizations). For the social capital variables, I explicitly consider the opportunity for social interactions through the existing characteristics of nonprofit organizations in the region, rather than through other variables such as 100 measures of trust or reciprocity (possibly obtained from survey data) which have been criticized to be vague and plagued with endogeneity issues (Durlauf 2002; Glaeser 2001). Thus these social variables can be viewed in the same perspective as other industry characteristics, and suffer less from endogeneity. Nonetheless, I lack enough variation in the data to include MSA by industry fixed effects, and am unable to fully control for omitted variables at the MSA-industry level (such as city policies that favor specific industries).47 However, the inclusion of the indicator variable for start-up activity pre-dating the panel years is hoped to soak up a portion of these unobservables. Overall, even with the careful selection of control variables and arsenal of fixed effects, I am cautious to interpret the results as evidence of causality and rather interpret them as partial correlations. 3.5. Results 3.5.1 All industries I first report the baseline results for the Poisson and Probit models, including all industries as well as the full set of fixed effects. The Poisson estimates can be interpreted as a 𝛽 Γ— 100 percent increase in the count of new establishments for a 1 standard deviation increase in an explanatory variable. In other words, a 1 standard deviation increase in labor market proximity (PROXπ‘™π‘Žπ‘π‘œπ‘Ÿπ‘–π‘Ÿπ‘‘ ) for a MSA-industry pair would increase the count of new establishment births by 16.2% (Table 3.3). The Probit model reports average marginal effects, and can be interpreted as a 𝛽 change in the probability of positive births for a 1 standard deviation increase in an explanatory 47 This also presents a problem in estimation as the inclusion of MSA by industry fixed effects excludes nearly half of the observations, for the fixed effects perfectly predict outcomes for MSA-industry pairs that experience zero births throughout the panel years. 101 Table 3.3. Births of single (start-up) and all establishments Poisson Probit Single start-ups All start-ups Single start-ups All start-ups Variables (1) (2) (3) (4) π΅π·π‘ˆπ‘€π‘–π‘Ÿ0 0.125*** 0.091*** 0.019*** 0.023*** (0.009) (0.008) (0.001) (0.001) LQπ‘–π‘Ÿπ‘‘ 0.721*** 0.714*** 0.062*** 0.066*** (0.008) (0.007) (0.001) (0.001) PROXπ‘™π‘Žπ‘π‘œπ‘Ÿπ‘–π‘Ÿπ‘‘ 0.162*** 0.115*** 0.062*** 0.065*** (0.030) (0.023) (0.003) (0.003) PROXπΌπ‘‚π‘–π‘Ÿπ‘‘ 0.004 0.015 0.012*** 0.013*** (0.013) (0.010) (0.001) (0.001) 𝑖𝑛𝑓 DENπ‘Ÿπ‘‘ -0.039*** -0.041*** -0.003* -0.003* (0.008) (0.008) (0.002) (0.002) π‘“π‘œπ‘Ÿ DENπ‘Ÿπ‘‘ -0.018* -0.006 -0.003 -0.004* (0.010) (0.010) (0.002) (0.002) DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ 0.059*** 0.060*** 0.003** 0.004*** (0.006) (0.006) (0.001) (0.001) NPPCπ‘Ÿπ‘‘ -0.056*** -0.065*** -0.006* -0.008** (0.013) (0.013) (0.003) (0.003) PATPCπ‘Ÿπ‘‘ 0.079*** 0.083*** 0.013*** 0.013*** (0.015) (0.015) (0.003) (0.003) HHIπ‘Ÿπ‘‘ 0.039*** 0.035*** 0.003* 0.001 (0.006) (0.007) (0.002) (0.002) MAπ‘Ÿπ‘‘ -0.574*** -0.440*** -0.026 -0.071** (0.149) (0.149) (0.033) (0.034) POPπ‘Ÿπ‘‘ -0.086 -0.106 0.007 0.027 (0.106) (0.105) (0.026) (0.026) PCINCπ‘Ÿπ‘‘ 0.074*** 0.089*** 0.004 0.007** (0.011) (0.011) (0.003) (0.003) EDUCπ‘Ÿπ‘‘ -0.020** -0.016* 0.000 0.002 (0.008) (0.008) (0.002) (0.002) FOREIGNπ‘Ÿπ‘‘ -0.051*** -0.044*** -0.001 -0.002 (0.009) (0.009) (0.002) (0.002) HOMEπ‘Ÿπ‘‘ -0.002 -0.001 0.001 0.002 (0.006) (0.006) (0.001) (0.001) UNEMPπ‘Ÿπ‘‘ -0.018*** -0.010 0.003** 0.003* (0.007) (0.006) (0.001) (0.002) Fixed effects V V V V Observations 803,136 803,136 803,136 803,136 Notes: Columns 3 and 4 report average marginal effects. To aid convergence (especially with likelihoods of very large magnitude), the dependent variables for the Poisson models are divided by 1E+06. This has no effect on the parameter estimates nor the standard errors, and only affects the absolute magnitudes of the log likelihoods. Relative magnitudes among Poisson models remain relevant. In parentheses are panel robust standard errors clustered at the MSA by industry level. All variables are logged, and are standardized to have unit standard deviation to aid interpretation. *** p<0.01, ** p<0.05, * p<0.1 102 variable. Thus for example, a 1 standard deviation increase in input-output proximity would increase the probability of new establishment births by 0.012. The explanatory variables are standardized, and thus I am able to compare the relative magnitude of their effects on the dependent variables of interest. For both the Poisson and Probit models, it can be seen that specialization (LQπ‘–π‘Ÿπ‘‘) within a given MSA-industry pair is strongly associated with an increase in the count of new establishments, regardless of whether only single unit start-ups or all start-ups are considered. In the Poisson model, the effect of specialization is much stronger than the other variables of interest, and this strong relationship is consistent with previous work suggesting that strongly specialized clusters are conducive to start- up activity (Delgado, Porter, and Stern 2010; Porter 1998). The strong negative impact of the market access (MAπ‘Ÿπ‘‘) variable across specifications suggests that in general the crowding out effects due to a large market outside of the region overwhelms the benefits from gaining more opportunities for cross-regional interactions. Surprisingly, the percentage of foreign born population in the region (FOREIGNπ‘Ÿπ‘‘) is seen to negatively impact new establishment births. However, pooled regressions without MSA fixed effects (not reported here) show this variable to be strongly positively correlated with new firm formation. This suggests that while MSAs with more foreigners do experience more establishment births, when considering within MSA variation this effect is negative (i.e. an increase in foreigners within a region results in lower establishment birth counts). Overall, while the coefficient signs and relative magnitudes are generally similar, the estimates for the Probit specifications are much less precisely estimated compared to their Poisson counterparts, suggesting that much 103 information is lost when considering establishment births of any magnitude to be equal. When comparing the relative strength of the Marshallian factors, it can be seen that labor market proximity is the dominating force. This is consistent with previous work (Jofre-Monseny, MarΓ­n-LΓ³pez, and Viladecans-Marsal 2011; Rosenthal and Strange 2001) which find that labor market pooling exerts the most robust effect, compared to input-output linkages or knowledge spillovers. Surprisingly, input-output proximity (PROXπΌπ‘‚π‘–π‘Ÿπ‘‘) is insignificant in the Poisson specification while strongly significant in the Probit model, suggesting that different mechanisms for input-output linkages govern birth probabilities as opposed to the count of new firm births. The insignificance of the input-output proximity variable in the Poisson specification can be explained when considering that the dataset encompasses all types of industries. Observing the underling BEA Input-output accounts data, it can be seen that only the manufacturing industries are largely dependent on customer-supplier connections, while other industries are not so reliant on such linkages. The knowledge spillover and social capital variables are consistent with agglomeration and social capital theory. The estimates imply that more diverse knowledge pools (more bridging social capital) positively affect new firm formation 𝑖𝑛𝑓 π‘“π‘œπ‘Ÿπ‘š (captured by the negative coefficients for the DENπ‘Ÿπ‘‘ and DENπ‘Ÿπ‘‘ variables), and that strong local ties within the community (DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ ) also benefit entrepreneurship. The effect on new establishment births is stronger for the informal spillover metric, which could be due to the weakness of the formal knowledge spillover metric which only considers the highest level of knowledge creation captured by patenting activity. 104 However, an increase in the number of patents per capita (PATPCπ‘Ÿπ‘‘) is seen to be strongly correlated with new firm formation, which may reflect the positive effect of innovative firms and organizations within the region. The negative coefficient for the nonprofit organizations per capita variable (NPPCπ‘Ÿπ‘‘) suggests that the benefits that arise due to more associational activity are more than offset by crowding out effects. Since most nonprofits have no employees and thus are not included in entrepreneurship counts (as the SUSB only considers employer firms), more nonprofits within a region may be the result of potential entrepreneurs deciding to become self-employed in the nonprofit sector, as opposed to founding new establishments with paid employees. Considering that the US economy experienced much change during and after the recent global recession, it is noteworthy to consider whether the relative effects of the key variables are robust for different time periods. Thus I run a model (Appendix B) in which I include a full set of interaction terms between a dummy equal to 1 for years 2009 to 2012 and each of the right-hand side variables, excluding the fixed effects. This results in the non-interacted variables corresponding to the coefficient estimates for the years prior to the recession (2005 to 2008), while the interaction terms correspond to the differences in the coefficient estimates for the years prior to and after the recession period. The coefficients for the years 2009 to 2012 are calculated by summing the coefficients for the non-interacted and interacted terms.48 The general results are similar to that for the aggregate regression in the previous 48 This procedure allows for the relative magnitudes of the coefficients to be compared across different groups, as well as testing for the significance in the difference of coefficient estimates across these groups. 105 section. As an additional robustness check, I also ran the aggregate model from Table 3.3 excluding the years 2009 to 2011, to check if the exclusion of the recessionary period affected overall results (Appendix C). Again, the results were qualitatively identical to that of the model including all panel years. Thus, the general conclusion that labor market pooling, knowledge spillovers, and community social capital are important for all industries across the board regardless of economic conditions holds. 3.5.2 Traded versus local industries I now turn to a distinction in industry types that has interested many agglomeration scholars; namely that between traded versus local industries (Delgado, Porter, and Stern 2016; Porter 2003). Traded industries are those that are theorized to be more geographically concentrated, serving outer markets and producing goods and services that are sold outside of the region. Local industries mainly serve local markets, and are thus driven by local demand factors suggesting that their distribution is more even and proportional to the size of local markets. Previous studies suggest that traded industries, while being the minority, provide for a disproportionate amount of employment while also awarding employees with higher wages (Porter 2003). As such, local industries are generally thought to be of lesser significance when considering the economic performance of regions. Nonetheless, local industries generate the bulk of new firms (see Table 3.1), and as such studying the factors that drive entrepreneurship in these industries is also of importance. Furthermore, local industries encompass those that perform a supporting role to their traded counterparts, and as such a healthy supply of new firms from industries of both types should be important for regional growth. 106 To test whether different factors govern the birth of new establishments in the traded and local sectors, I subset the industries into these two categories based on the definitions provided by Delgado et al. (2016) and the US Cluster Mapping Project (USCMP), which define traded and local clusters based on 6-digit NAICS codes.49 As I utilize 4-digit NAICS industry codes, a concordance was made in cases where a particular 4-digit industry mapped on to more than one cluster definition. Specifically, in such cases I assigned the 4-digit industry to the cluster code for which the majority of its employment was situated, based on employment data taken from the County Business Patterns (CBP). As traded clusters tend to agglomerate more compared to their local counterparts, it is expected that traded industries should be more influenced by the Marshallian factors. However, the theorized effects of the social capital variables are not as clear. Table 3.4 reports the results for the regressions based on these industry definitions, where again I utilize interaction terms between an indicator variable for traded versus local industries and the right-hand side variables. I present the results for both single-establishment (model 1) as well as all types of establishment (model 2) births, and I focus only on the key variables of interest for brevity. The results largely coincide with previous studies, where for traded industries both the labor market proximity and input-output proximity metrics are highly significant with relatively large magnitudes. For local industries, only the labor market proximity metric was significant, and the difference in estimated coefficients with the traded sector was highly significant at -0.176. The results were qualitatively similar when considering all 49 Details on industry classification methodologies are provided at http://clustermapping.us/content/cluster-mapping-methodology. 107 Table 3.4. Births of single (start-up) and all establishments, traded versus local industries, Poisson estimates DV: Count of single establishment DV: Count of all establishment births births (2) (1) Variables Traded Local Difference Traded Local Difference π΅π·π‘ˆπ‘€π‘–π‘Ÿ0 0.068*** 0.119*** 0.051*** 0.064*** 0.074*** 0.009 (0.012) (0.013) (0.018) (0.010) (0.011) (0.015) LQπ‘–π‘Ÿπ‘‘ 0.644*** 0.799*** 0.155*** 0.659*** 0.770*** 0.111*** (0.009) (0.013) (0.016) (0.009) (0.011) (0.014) PROXπ‘™π‘Žπ‘π‘œπ‘Ÿπ‘–π‘Ÿπ‘‘ 0.262*** 0.086** -0.176*** 0.169*** 0.072*** -0.097** (0.042) (0.034) (0.051) (0.037) (0.027) (0.044) PROXπΌπ‘‚π‘–π‘Ÿπ‘‘ 0.060*** 0.033 -0.027 0.074*** 0.023* -0.050*** (0.017) (0.020) (0.020) (0.014) (0.012) (0.016) 𝑖𝑛𝑓 DENπ‘Ÿπ‘‘ -0.050*** -0.035*** 0.014** -0.045*** -0.039*** 0.006 (0.009) (0.008) (0.006) (0.008) (0.008) (0.005) π‘“π‘œπ‘Ÿ DENπ‘Ÿπ‘‘ -0.038*** -0.013 0.025*** -0.014 -0.005 0.009 (0.011) (0.010) (0.008) (0.011) (0.010) (0.006) DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ 0.083*** 0.050*** -0.033*** 0.075*** 0.054*** -0.021*** (0.007) (0.006) (0.005) (0.007) (0.006) (0.004) NPPCπ‘Ÿπ‘‘ -0.073*** -0.047*** 0.025*** -0.070*** -0.061*** 0.009 (0.014) (0.013) (0.007) (0.014) (0.013) (0.006) PATPCπ‘Ÿπ‘‘ 0.124*** 0.064*** -0.060*** 0.104*** 0.073*** -0.031*** (0.017) (0.015) (0.011) (0.016) (0.014) (0.009) Control vars. V V Fixed effects V V Observations 803,136 803,136 Log L -40.33 -50.53 Notes: See notes for Table 3.3. Both models include the full set of control variables and MSA, 4-digit NAICS, and year fixed effects. For both models, the estimates are calculated by adding interaction terms between a dummy equal to 1 for traded industries and all right-hand side variables excluding the fixed effects. Thus the coefficients for the non-interacted variables coincide with the parameter estimates for the reference group (traded), while the coefficients for the interaction terms coincide with the differences in parameter estimates between the two groups. The coefficients for the comparison group are obtained by summing the parameter estimates for the interacted and non-interacted terms, and the standard errors are based on the estimated variance-covariance matrix, using the lincom routine in Stata. *** p<0.01, ** p<0.05, * p<0.1 108 types of establishment births (model 2), suggesting that indeed Marshallian factors are important determinants of entrepreneurship in the traded industries. When considering knowledge spillovers, again both the informal and formal spillover metrics exhibited coefficients of larger magnitude for the traded industries. The estimated differences in coefficient magnitude between the traded and local industries was also highly significant when considering single establishment births, although this difference was not so pronounced when considering all types of establishment births. This suggests that not only are knowledge spillovers more important for the traded industries, but also that bridging social capital in the form of a diverse knowledge pool is positively associated with new establishment births. The magnitude of the coefficients for the informal and formal knowledge spillover metrics are comparable to that for the input-output proximity metric, suggesting that their relative importance is non-trivial. Surprisingly the coefficient for the bonding social capital metric was comparatively large in magnitude, being even larger than that for input-output proximity. Furthermore, bonding social capital was estimated to be more important for the traded industry, with a statistically significant difference of 0.033. At face value, this suggests that the benefits of a strong local community – such as reduced transaction costs due to trust and reciprocity or access to club goods – are more important for the traded sector, and is consistent with previous studies that consider repeated face to face contact and social interactions as being instrumental in shaping economic outcomes (Storper and Venables 2004; Saxenian 1996). 3.5.3 High-tech versus low-tech entrepreneurship 109 I also consider whether the relative effects of the Marshallian factors and social capital variables differ for the high-tech versus low-tech industries. I again utilize the industry definitions provided by Delgado et al. (2016) and the USCMP to index high-tech industries, which encompass the aerospace vehicles and defense (cluster code 1), biopharmaceuticals (cluster code 5), communications equipment and services (cluster code 8), downstream chemical products (cluster code 11), information technology and Table 3.5. Births of single (start-up) and all establishments, high-tech versus low- tech industries, Poisson estimates DV: Count of single establishment DV: Count of all establishment births births (2) (1) Variables High-tech Low-tech Difference High-tech Low-tech Difference π΅π·π‘ˆπ‘€π‘–π‘Ÿ0 0.091*** 0.125*** 0.034 0.183*** 0.088*** -0.095*** (0.035) (0.009) (0.036) (0.031) (0.008) (0.032) LQπ‘–π‘Ÿπ‘‘ 0.524*** 0.722*** 0.198*** 0.636*** 0.714*** 0.079*** (0.024) (0.008) (0.025) (0.014) (0.007) (0.016) PROXπ‘™π‘Žπ‘π‘œπ‘Ÿπ‘–π‘Ÿπ‘‘ 0.163*** 0.162*** -0.001 0.027 0.115*** 0.088 (0.063) (0.030) (0.095) (0.083) (0.023) (0.085) PROXπΌπ‘‚π‘–π‘Ÿπ‘‘ 0.147*** 0.004 -0.143*** -0.029 0.016 0.045 (0.044) (0.013) (0.044) (0.037) (0.010) (0.038) 𝑖𝑛𝑓 DENπ‘Ÿπ‘‘ -0.038** -0.039*** -0.001 -0.060*** -0.040*** 0.020 (0.017) (0.008) (0.015) (0.015) (0.008) (0.013) π‘“π‘œπ‘Ÿ DENπ‘Ÿπ‘‘ -0.020 -0.018* 0.002 0.045*** -0.007 -0.052*** (0.020) (0.010) (0.017) (0.017) (0.010) (0.014) DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ 0.087*** 0.059*** -0.028** 0.077*** 0.060*** -0.017 (0.013) (0.006) (0.012) (0.012) (0.006) (0.011) NPPCπ‘Ÿπ‘‘ -0.074*** -0.056*** 0.018 -0.091*** -0.065*** 0.026 (0.024) (0.013) (0.021) (0.022) (0.013) (0.019) PATPCπ‘Ÿπ‘‘ 0.234*** 0.078*** -0.156*** 0.065** 0.083*** 0.018 (0.031) (0.015) (0.028) (0.029) (0.015) (0.025) Control vars. V V Fixed effects V V Observations 803,136 803,136 Log L -40.33 -50.53 Notes: See notes for Tables 3 and 4. *** p<0.01, ** p<0.05, * p<0.1 110 analytical instruments (cluster code 23), and medical devices (cluster code 30) clusters. All high-tech clusters are traded industries, and as such it is expected that the relative effects should be similar to those for the traded sector, with possible differences in the knowledge spillover metrics. Table 3.5 presents the results, where again interaction terms are used to differentiate between industry groups. As expected, the results are qualitatively similar to those for traded versus local industries for most variables when considering single establishment births. Surprisingly however, the formal knowledge spillover metric becomes insignificant, being replaced with a very strong effect for the number of patents per capita (PATPCπ‘Ÿπ‘‘) metric. This suggests that for high-tech industries which rely heavily on R&D and patenting, the overall innovative capacity of a region, which is likely to be influenced by an agglomeration of high-tech firms, is more important compared to a diverse knowledge pool. This is consistent with Saxenian’s (1996) analysis of Silicon Valley and Route 128, where these regions began to attract high- tech entrepreneurship through the existence of a large number of incumbent firms in similar industries. When considering all types of entrepreneurship (model 2), there is a general inconsistency in the estimated coefficients for the labor market proximity, input-output proximity, and formal knowledge spillover metrics compared to model 1. Considering that many high-tech firms are also high-growth firms with multiple establishments, these inconsistencies are likely to be capturing the founding of firm branches, and not true entrepreneurship in the sense of new firm formation. 3.5.4 Manufacturing versus non-manufacturing entrepreneurship 111 As a final step, I re-estimate the model, in this case differentiating between manufacturing and non-manufacturing industries. Table 3.6 presents the results. Previous literature related to agglomeration economies and entrepreneurship has to a Table 3.6. Births of single (start-up) and all establishments, manufacturing versus non-manufacturing industries, Poisson estimates DV: Count of single establishment DV: Count of all establishment births births (2) (1) Non- Non- Variables Manuf. Difference Manuf. Difference manuf. manuf. π΅π·π‘ˆπ‘€π‘–π‘Ÿ0 0.088*** 0.146*** 0.058** 0.068*** 0.101*** 0.033 (0.020) (0.011) (0.023) (0.018) (0.009) (0.020) LQπ‘–π‘Ÿπ‘‘ 0.545*** 0.751*** 0.206*** 0.554*** 0.736*** 0.182*** (0.021) (0.009) (0.023) (0.019) (0.007) (0.020) PROXπ‘™π‘Žπ‘π‘œπ‘Ÿπ‘–π‘Ÿπ‘‘ 0.358*** 0.132*** -0.227*** 0.379*** 0.092*** -0.288*** (0.057) (0.030) (0.060) (0.053) (0.023) (0.056) PROXπΌπ‘‚π‘–π‘Ÿπ‘‘ 0.080*** 0.020 -0.060*** 0.111*** 0.023** -0.088*** (0.019) (0.014) (0.023) (0.018) (0.011) (0.021) 𝑖𝑛𝑓 DENπ‘Ÿπ‘‘ -0.033*** -0.039*** -0.005 -0.031*** -0.041*** -0.009 (0.010) (0.008) (0.008) (0.010) (0.008) (0.007) π‘“π‘œπ‘Ÿ DENπ‘Ÿπ‘‘ -0.031** -0.017* 0.013 -0.021 -0.006 0.015 (0.015) (0.010) (0.011) (0.014) (0.010) (0.010) DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ 0.072*** 0.058*** -0.014** 0.080*** 0.059*** -0.020*** (0.008) (0.005) (0.006) (0.008) (0.006) (0.006) NPPCπ‘Ÿπ‘‘ -0.091*** -0.054*** 0.037*** -0.091*** -0.064*** 0.027*** (0.016) (0.013) (0.010) (0.015) (0.013) (0.009) PATPCπ‘Ÿπ‘‘ 0.154*** 0.076*** -0.079*** 0.153*** 0.080*** -0.073*** (0.023) (0.015) (0.018) (0.022) (0.015) (0.018) Control vars. V V Fixed effects V V Observations 803,136 803,136 Log L -40.33 -50.53 Notes: See notes for Tables 3 and 4. *** p<0.01, ** p<0.05, * p<0.1 large extent focused on the manufacturing sector, as Marshall’s microfoundations most readily map onto the needs of manufacturing firms. As such, it is expected that 112 all three Marshallian externalities will be more significantly correlated with entrepreneurship for the manufacturing sector compared to the non-manufacturing sector. The estimated coefficients are as expected, with labor market proximity again being the dominating force out of the three Marshallian factors. While the labor market proximity metric continues to be significant for non-manufacturing industries, its coefficient value decreases to nearly one-thirds that for the manufacturing sector, and input- output proximity becomes insignificant. Similar to the high-tech industries, the number of patents per capita continues to exert a strong effect on births for the manufacturing industries, and is consistent with previous research that suggests patenting activity is a key determinant of entrepreneurship for manufacturing (Akcigit and Kerr 2010; Ellison, Glaeser, and Kerr 2010). Overall the results are robust when considering all types of establishment births. The bonding social capital variable is again more strongly related to firm births for the manufacturing sector, which suggests that manufacturing industries benefit from similar social externalities as to the traded and high-tech industries. 3.6. Conclusions Overall, this study provides strong support for the role of different types of social interactions in promoting entrepreneurship. I find evidence consistent with social network and social capital theory, which suggests the importance of both strong bonding ties of repeated interactions within communities and weak bridging ties of long range connections between different groups. This is a key contribution in that previous studies of social interactions within the economic geography literature have 113 not distinguished between these different types of interactions. The results taken together suggest that these social forces exert a non-trivial impact on the number of new firm births in a region-industry pair, over and above the Marshallian forces. Considering that the Marshallian forces themselves also represent to some degree social factors – such as homophilous interactions and trust gained through repeated interactions – the fact that the measures of bonding and bridging social capital continue to have a strong effect on entrepreneurship suggests that as a whole, social factors may be just as important as economic factors when examining the forces that drive entrepreneurship in regions. I hope that further research will clarify whether one is more dominant over the other in promoting entrepreneurship, and whether this relationship changes for different industries. The broad results are consistent across a range of industry categories. The basic conclusion is that both Marshallian economies – with the exception of customer supplier linkages – and social capital are important in promoting entrepreneurship regardless of the industry. This result is also non-trivial considering that most previous studies have focused on a narrow subset of industries in testing the effects of agglomerative forces on entrepreneurship (e.g., Glaeser and Kerr 2009). Customer supplier linkages only seem to be significant for the traded, high-tech, and manufacturing industries, while across the board, the effect of labor market pooling seems to be the strongest and most robust across industry classes. I also find that the effects of bonding and bridging social capital are stronger for the traded, high-tech, and manufacturing industries compared to their counterparts. This is consistent with previous studies (Ellison, Glaeser, and Kerr 2010; Rosenthal and Strange 2001), 114 considering that social interactions are closely linked to theories of knowledge spillovers. Traded industries – which comprise all of the high-tech industry classifications as well as the bulk of the manufacturing sector – are more dependent on agglomeration economies (Delgado, Porter, and Stern 2016), and thus the effects of both strong repeated homophilous interactions and weak heterophilous interactions should be expected to be stronger for these industries compared to local industries which benefit less from knowledge spillovers and more from local demand. The relative effects of the Marshallian micro-foundations and social capital are consistent when we consider not only the birth of new firms, but also all establishment births including new establishments of existing firms. While further research is needed, this suggests that the mechanisms that promote entrepreneurship for small firms are similar to those that are important for multi-locational firms. Future research should further study whether the effect of the key forces described in this study are consistent across a wide variety of firm sizes – including multi-national conglomerates – and how the relative importance of these factors varies with firm size. 115 APPENDIX A. Pairwise correlation matrix of variables LQ π‘™π‘Žπ‘π‘œπ‘Ÿ 𝐼𝑂 π‘ π‘œπ‘π‘–π‘Žπ‘™ 𝑖𝑛𝑓 π‘“π‘œπ‘Ÿπ‘–π‘Ÿπ‘‘ PROXπ‘–π‘ŸP𝑑RO Xπ‘–π‘ŸD𝑑 ENπ‘Ÿπ‘‘ DEN π‘Ÿπ‘‘ D ENπ‘Ÿπ‘‘ N PPCπ‘Ÿπ‘‘P ATPCHπ‘Ÿπ‘‘H Iπ‘Ÿπ‘‘ MAπ‘Ÿπ‘‘ POPπ‘Ÿπ‘‘ 𝑃𝐢INCπ‘ŸE𝑑D UCπ‘Ÿπ‘‘F OREIGHNOπ‘ŸM𝑑 Eπ‘Ÿπ‘‘ 0.0 PROXπ‘™π‘Žπ‘π‘œπ‘Ÿ π‘–π‘Ÿπ‘‘ 1 𝐼𝑂 0.0 0.4PROX π‘–π‘Ÿπ‘‘ 6 0 - 0.0 0.0 DENπ‘ π‘œπ‘π‘–π‘Žπ‘™ π‘Ÿπ‘‘ 0.00 5 1 𝑖𝑛𝑓 0.0 0.0 0.0 0.2 DENπ‘Ÿπ‘‘ 0 0 0 3 - π‘“π‘œπ‘Ÿ 0.0 0.0 0.2 0.0 DEN 0.0 π‘Ÿπ‘‘ 1 4 6 8 0 - - - 0.0 0.0 0.1 NPPCπ‘Ÿπ‘‘ 0.0 0.0 0.0 1 0 4 2 0 4 0.0 0.0 0.0 0.2 0.0 0.7 0.3 PATPC π‘Ÿπ‘‘ 3 0 2 3 6 0 2 - - - - - - 0.0 0.0 HHI 0.0 0.0 0.0 0.0 0.3 0.3 π‘Ÿπ‘‘ 0 6 6 1 1 6 5 0 - - 0.0 0.0 0.0 0.0 0.0 0.2 0.1 MAπ‘Ÿπ‘‘ 0.0 0.0 0 1 2 9 1 2 0 4 1 - - 0.0 0.0 0.1 0.4 0.0 0.3 0.3 0.0 POPπ‘Ÿπ‘‘ 0.0 0.2 5 0 0 1 9 3 4 4 9 7 - - 0.0 0.0 0.4 0.2 0.3 0.3 0.4 0.0 0.4 PCINC 0.0 0.0 π‘Ÿπ‘‘ 1 6 0 8 6 8 8 9 5 1 7 - - - - 0.0 0.0 0.3 0.5 0.4 0.6 0.3 0.6 EDUCπ‘Ÿπ‘‘ 0.0 0.0 0.2 0.0 1 8 2 3 6 7 9 0 2 7 1 3 - 0.0 0.0 0.0 0.3 0.2 0.3 0.2 0.0 0.0 0.4 0.2 0.2 FOREIGN 0.2 π‘Ÿ0𝑑 0 6 6 4 5 4 7 5 9 9 9 9 - - - - - - - - - 0.0 0.0 0.1 0.0 0.1 HOMEπ‘Ÿπ‘‘ 0.0 0.1 0.1 0.0 0.0 0.1 0.0 0.2 0.4 1 1 8 3 1 4 8 9 0 9 8 6 5 3 - - - - - - - - 0.0 0.0 0.1 0.2 0.2 0.0 0.1 UNEMPπ‘Ÿπ‘‘0 .0 0.0 0.0 0.2 0.2 0.0 0.3 0.11 5 9 6 2 3 2 0 2 5 2 0 8 0 1 116 APPENDIX B. Births of single (start-up) and all establishments, before and after the recession, Poisson estimates DV: Count of single establishment DV: Count of all establishment births births (2) (1) Variables 2005-2008 2009-2012 Difference 2005-2008 2009-2012 Difference π΅π·π‘ˆπ‘€π‘–π‘Ÿ0 0.140*** 0.108*** -0.031*** 0.105*** 0.077*** -0.028*** (0.010) (0.011) (0.011) (0.009) (0.01) (0.011) LQπ‘–π‘Ÿπ‘‘ 0.715*** 0.728*** 0.013** 0.704*** 0.724*** 0.020*** (0.008) (0.009) (0.006) (0.007) (0.008) (0.005) PROXπ‘™π‘Žπ‘π‘œπ‘Ÿπ‘–π‘Ÿπ‘‘ 0.180*** 0.146*** -0.034*** 0.133*** 0.097*** -0.036*** (0.030) (0.030) (0.003) (0.023) (0.023) (0.003) PROXπΌπ‘‚π‘–π‘Ÿπ‘‘ 0.003 0.002 -0.001 0.019* 0.007 -0.011*** (0.013) (0.013) (0.003) (0.010) (0.010) (0.003) 𝑖𝑛𝑓 DENπ‘Ÿπ‘‘ -0.011 -0.025*** -0.015*** -0.018** -0.027*** -0.009** (0.007) (0.009) (0.004) (0.007) (0.008) (0.004) π‘“π‘œπ‘Ÿ DENπ‘Ÿπ‘‘ -0.045*** -0.043*** 0.002 -0.030*** -0.029*** 0.000 (0.010) (0.009) (0.004) (0.010) (0.009) (0.004) DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ 0.016*** 0.027*** 0.011*** 0.023*** 0.032*** 0.010*** (0.005) (0.005) (0.003) (0.005) (0.005) (0.003) NPPCπ‘Ÿπ‘‘ -0.008 -0.025** -0.017*** -0.017 -0.029** -0.011*** (0.012) (0.012) (0.004) (0.012) (0.012) (0.004) PATPCπ‘Ÿπ‘‘ 0.036** 0.034** -0.002 0.030** 0.033** 0.002 (0.015) (0.015) (0.005) (0.014) (0.014) (0.005) Control vars. V V Fixed effects V V Observations 803,136 803,136 Log L -40.33 -50.53 Notes: See notes for Tables 3.3 and 3.4. *** p<0.01, ** p<0.05, * p<0.1 117 APPENDIX C. Aggregate model excluding years 2009 to 2011 Poisson Probit Single start-ups All start-ups Single start-ups All start-ups Variables (1) (2) (3) (4) π΅π·π‘ˆπ‘€π‘–π‘Ÿ0 0.120*** 0.088*** 0.020*** 0.023*** (0.010) (0.009) (0.001) (0.001) LQπ‘–π‘Ÿπ‘‘ 0.717*** 0.709*** 0.062*** 0.065*** (0.008) (0.007) (0.001) (0.001) PROXπ‘™π‘Žπ‘π‘œπ‘Ÿπ‘–π‘Ÿπ‘‘ 0.155*** 0.102*** 0.063*** 0.065*** (0.028) (0.023) (0.004) (0.004) PROXπΌπ‘‚π‘–π‘Ÿπ‘‘ 0.005 0.019* 0.012*** 0.013*** (0.013) (0.010) (0.002) (0.001) 𝑖𝑛𝑓 DENπ‘Ÿπ‘‘ -0.043*** -0.046*** -0.002 -0.003 (0.012) (0.012) (0.002) (0.002) π‘“π‘œπ‘Ÿ DENπ‘Ÿπ‘‘ -0.012 0.000 -0.000 -0.001 (0.012) (0.012) (0.003) (0.003) DENπ‘ π‘œπ‘π‘–π‘Žπ‘™π‘Ÿπ‘‘ 0.067*** 0.067*** 0.002 0.004** (0.007) (0.007) (0.002) (0.002) NPPCπ‘Ÿπ‘‘ -0.069*** -0.073*** -0.005 -0.008* (0.017) (0.018) (0.004) (0.004) PATPCπ‘Ÿπ‘‘ 0.078*** 0.083*** 0.018*** 0.017*** (0.017) (0.017) (0.004) (0.004) HHIπ‘Ÿπ‘‘ 0.010 0.013 -0.000 -0.002 (0.008) (0.009) (0.002) (0.002) MAπ‘Ÿπ‘‘ -0.547*** -0.492*** -0.025 -0.048 (0.164) (0.174) (0.040) (0.041) POPπ‘Ÿπ‘‘ -0.010 0.008 0.026 0.045 (0.119) (0.121) (0.030) (0.031) PCINCπ‘Ÿπ‘‘ 0.110*** 0.122*** 0.004 0.010*** (0.014) (0.015) (0.004) (0.004) EDUCπ‘Ÿπ‘‘ -0.023** -0.022* -0.005 -0.003 (0.011) (0.011) (0.003) (0.003) FOREIGNπ‘Ÿπ‘‘ -0.042*** -0.032*** -0.004 -0.002 (0.012) (0.012) (0.003) (0.003) HOMEπ‘Ÿπ‘‘ 0.002 0.001 0.002 0.002 (0.007) (0.007) (0.002) (0.002) UNEMPπ‘Ÿπ‘‘ -0.018** -0.006 0.001 0.002 (0.007) (0.007) (0.002) (0.002) Fixed Effects V V V V Observations 501,960 501,960 501,960 501,960 Notes: See notes for Table 3. *** p<0.01, ** p<0.05, * p<0.1 118 REFERENCES Acs, Zoltan, and Catherine Armington. 2004. β€œEmployment Growth and Entrepreneurial Activity in Cities.” Regional Studies 38 (8): 911–27. β€”β€”β€”. 2006. Entrepreneurship, Geography, and American Economic Growth. Cambridge University Press Cambridge. Agrawal, Ajay, Iain Cockburn, Alberto Galasso, and Alexander Oettl. 2014. β€œWhy Are Some Regions More Innovative than Others? The Role of Small Firms in the Presence of Large Labs.” Journal of Urban Economics 81: 149–65. Agrawal, Ajay, Iain Cockburn, and John McHale. 2006. β€œGone but Not Forgotten: Knowledge Flows, Labor Mobility, and Enduring Social Relationships.” Journal of Economic Geography 6 (5): 571–91. Akcigit, Ufuk, and William R Kerr. 2010. β€œGrowth through Heterogeneous Innovations.” No. w16443. NBER working paper. Audretsch, David B, and Max Keilbach. 2004. β€œEntrepreneurship and Regional Growth: An Evolutionary Interpretation.” Journal of Evolutionary Economics 14 (5): 605–16. Becker, Randy, and Vernon Henderson. 2000. β€œEffects of Air Quality Regulations on Polluting Industries.” Journal of Political Economy 108 (2): 379–421. Blundell, Richard, Rachel Griffith, and John Van Reenen. 1995. β€œDynamic Count Data Models of Technological Innovation.” The Economic Journal, 333–44. 119 BΓΌrker, Matthias, and G Alfredo Minerva. 2014. β€œCivic Capital and the Size Distribution of Plants: Short-Run Dynamics and Long-Run Equilibrium.” Journal of Economic Geography 14 (4): 797–847. Burt, Ronald S. 2004. β€œStructural Holes and Good Ideas.” American Journal of Sociology 110 (2): 349–99. β€”β€”β€”. 2005. Brokerage and Closure: An Introduction to Social Capital. Oxford University Press. Cameron, A C, and Pravin K Trivedi. 2013. Regression Analysis of Count Data. Vol. 53. Cambridge, UK: Cambridge University Press. Caragliu, Andrea, and Peter Nijkamp. 2016. β€œSpace and Knowledge Spillovers in European Regions: The Impact of Different Forms of Proximity on Spatial Knowledge Diffusion.” Journal of Economic Geography 16 (3): 749–74. Chamberlain, Gary. 1980. β€œAnalysis of Covariance With Qualitative Data.” Review of Economic Studies 47: 225–38. Chen, Henry, Paul Gompers, Anna Kovner, and Josh Lerner. 2010. β€œBuy Local? The Geography of Venture Capital.” Journal of Urban Economics 67 (1): 90–102. Coleman, James S. 1988. β€œSocial Capital in the Creation of Human Capital.” American Journal of Sociology 94: S95–120. Combes, Pierre-Philippe, and Gilles Duranton. 2006. β€œLabour Pooling, Labour Poaching, and Spatial Clustering.” Regional Science and Urban Economics 36 (1): 1–28. 120 Currid, Elizabeth, and Sarah Williams. 2010. β€œThe Geography of Buzz: Art, Culture and the Social Milieu in Los Angeles and New York.” Journal of Economic Geography 10 (3): 423–51. Dahl, Michael S, and Olav Sorenson. 2012. β€œHome Sweet Home: Entrepreneurs’ Location Choices and the Performance of Their Ventures.” Management Science 58 (6): 1059–71. Delgado, Mercedes, Michael E Porter, and Scott Stern. 2010. β€œClusters and Entrepreneurship.” Journal of Economic Geography 10 (4): 495–518. β€”β€”β€”. 2016. β€œDefining Clusters of Related Industries.” Journal of Economic Geography 16 (1): 1–38. Durlauf, Steven N. 2002. β€œOn The Empirics Of Social Capital.” The Economic Journal 112 (483): F459–79. Ellison, Glenn, Edward L Glaeser, and William R Kerr. 2010. β€œWhat Causes Industry Agglomeration? Evidence from Coagglomeration Patterns.” The American Economic Review 100 (3): 1195–1213. Fairlie, Robert W. 2014. Kauffman Index of Entrepreneurial Activity 1996 - 2013. Kansas City, MO: Kauffman Foundation. Feld, Scott L. 1981. β€œThe Focused Organization of Social Ties.” American Journal of Sociology, 1015–35. Fujita, Masahisa, Paul Krugman, and Anthony J. Venables. 1999. The Spatial Economy: Cities, Regions, and International Trade. Cambridge, MA: MIT Press. 121 Glaeser, Edward L. 2001. β€œThe Formation of Social Capital.” Canadian Journal of Policy Research 2 (1): 34–40. Glaeser, Edward L. 2008. Cities, Agglomeration, and Spatial Equilibrium. Oxford University Press. Glaeser, Edward L., H.D. Kallal, J. A. Scheinkman, and A. Shleifer. 1992. β€œGrowth in Cities.” Journal of Political Economy 100: 1126–52. Glaeser, Edward L, and William R Kerr. 2009. β€œLocal Industrial Conditions and Entrepreneurship: How Much of the Spatial Distribution Can We Explain?” Journal of Economics & Management Strategy 18 (3): 623–63. Glaeser, Edward L, Stuart S Rosenthal, and William C Strange. 2010. β€œUrban Economics and Entrepreneurship.” Journal of Urban Economics 67 (1): 1–14. Granovetter, Mark. 1973. β€œThe Strength of Weak Ties.” American Journal of Sociology 78 (6): 1360–80. β€”β€”β€”. 1995. β€œThe Economic Sociology of Firms and Entrepreneurs.” In The Economic Sociology of Immigration, edited by Alejandro Portes. New York: Russell Sage. Guimaraes, Paulo, OctΓ‘vio Figueirdo, and Douglas Woodward. 2003. β€œA Tractable Approach to the Firm Location Decision Problem.” Review of Economics and Statistics 85 (1): 201–4. Hausmann, Ricardo, and Bailey Klinger. 2006. β€œThe Evolution of Comparative Advantage: The Impact of the Structure of the Product Space.” CID Working Paper, no. 106. 122 Helsley, Robert W, and William C Strange. 1990. β€œMatching and Agglomeration Economies in a System of Cities.” Regional Science and Urban Economics 20 (2): 189–212. Hidalgo, CΓ©sar A, and Ricardo Hausmann. 2009. β€œThe Building Blocks of Economic Complexity.” Proceedings of the National Academy of Sciences 106 (26): 10570–75. Hidalgo, CΓ©sar A, Bailey Klinger, A-L BarabΓ‘si, and Ricardo Hausmann. 2007. β€œThe Product Space Conditions the Development of Nations.” Science 317 (5837): 482–87. Hoang, Ha, and Bostjan Antoncic. 2003. β€œNetwork-Based Research in Entrepreneurship: A Critical Review.” Journal of Business Venturing 18 (2): 165–87. Hsiao, Cheng. 1986. Analysis of Panel Data. New York: Cambridge University Press. Ioannides, Yannis M. 2013. From Neighborhoods to Nations: The Economics of Social Interactions. Princeton University Press. Jackson, Matthew O. 2008. Social and Economic Networks. Vol. 3. Princeton: Princeton University Press. Jacobs, Jane. 1969. The Economy of Cities. New York: Vintage. Jaffe, Adam B, Manuel Trajtenberg, and Rebecca Henderson. 1993. β€œGeographic Localization of Knowledge Spillovers as Evidenced by Patent Citations.” The Quarterly Journal of Economics, 577–98. Jofre-Monseny, Jordi, Raquel MarΓ­n-LΓ³pez, and Elisabet Viladecans-Marsal. 2011. β€œThe Mechanisms of Agglomeration: Evidence from the Effect of Inter- 123 Industry Relations on the Location of New Firms.” Journal of Urban Economics 70 (2): 61–74. Kemeny, Tom, Maryann Feldman, Frank Ethridge, and Ted Zoller. 2016. β€œThe Economic Value of Local Social Networks.” Journal of Economic Geography 16 (5): 1101–22. Kitson, Michael, Ron Martin, and Peter Tyler. 2004. β€œRegional Competitiveness: An Elusive yet Key Concept?” Regional Studies 38 (9): 991–99. Krugman, Paul. 1991. Geography and Trade. Cambridge, MA: MIT Press. Leontief, Wassily W. 1941. The Structure of American Economy, 1919-1929. Cambridge, MA: Harvard University Press. Marshall, Alfred. 1920. Principles of Economics. London: MacMillan. McPherson, Miller, Lynn Smith-Lovin, and James M Cook. 2001. β€œBirds of a Feather: Homophily in Social Networks.” Annual Review of Sociology, 415–44. Mukim, Megha. 2014. β€œCoagglomeration of Formal and Informal Industry: Evidence from India.” Journal of Economic Geography, 329–351. Murphy, James T. 2003. β€œSocial Space and Industrial Development in East Africa: Deconstructing the Logics of Industry Networks in Mwanza, Tanzania.” Journal of Economic Geography 3 (2): 173–98. Porter, Michael E. 1998. β€œLocation, Clusters, and the New Microeconomics of Competition.” Business Economics, 7–13. β€”β€”β€”. 2003. β€œThe Economic Performance of Regions.” Regional Studies 37 (6–7): 549–78. 124 Putnam, Robert D. 2001. Bowling Alone: The Collapse and Revival of American Community. Simon and Schuster. Putnam, Robert D, Robert Leonardi, and Raffaella Y Nanetti. 1993. Making Democracy Work: Civic Traditions in Modern Italy. Princeton university press. Putnam, Robert, Ivan Light, Xavier de Souza Briggs, William M. Rohe, Avis C. Vidal, Judy Hutchinson, Jennifer Gress, and Michael Woolcock. 2004. β€œUsing Social Capital to Help Integrate Planning Theory, Research, and Practice.” Journal of the American Planning Association 70 (2): 142–92. Rauch, James E. 1993. β€œProductivity Gains from Geographic Concentration of Human Capital: Evidence from the Cities.” Journal of Urban Economics, no. 34: 380– 400. Reynolds, Paul D, and Richard T Curtin. 2009. New Firm Creation in the United States: Initial Explorations with the PSED II Data Set. Vol. 23. Springer. Rosenthal, Stuart S, and William C Strange. 2001. β€œThe Determinants of Agglomeration.” Journal of Urban Economics 50 (2): 191–229. β€”β€”β€”. 2003. β€œGeography, Industrial Organization, and Agglomeration.” Review of Economics and Statistics 85 (2): 377–93. β€”β€”β€”. 2004. β€œEvidence on the Nature and Sources of Agglomeration Economies.” Handbook of Regional and Urban Economics 4: 2119–71. Saxenian, AnnaLee. 1996. Regional Advantage: Culture and Competition in Silicon Valley and Route 128. Cambridge, MA: Harvard University Press. 125 Scherer, Frederic. 1984. β€œUsing Linked Patent and R&D Data to Measure Interindustry Technology Flows.” In R&D, Patents, and Productivity, 417–64. University of Chicago Press. Schumpeter, Joseph A. 1934. The Theory of Economic Development: An Inquiry into Profits, Capital, Credit, Interest, and the Business Cycle. Cambridge, MA: Harvard University Press. Scott, Allen J, John Agnew, Edward W Soja, and Michael Storper. 2001. Global City- Regions: An Overview. Oxford University Press. Sorenson, Olav. 2005. β€œSocial Networks and Industrial Geography.” In Entrepreneurships, the New Economy and Public Policy, 55–69. Springer. Sorenson, Olav, and Pino G Audia. 2000. β€œThe Social Structure of Entrepreneurial Activity: Geographic Concentration of Footwear Production in the United States, 1940–1989.” American Journal of Sociology 106 (2): 424–62. Sorenson, Olav, and Toby E Stuart. 2001. β€œSyndication Networks and the Spatial Distribution of Venture Capital Investments.” American Journal of Sociology 106 (6): 1546–88. Souza Briggs, Xavier de. 1998. β€œBrown Kids in White Suburbs: Housing Mobility and the Many Faces of Social Capital.” Housing Policy Debate 9 (1): 177–221. Storper, Michael. 1995. β€œCompetitiveness Policy Options: The Technology‐regions Connection.” Growth and Change 26 (2): 285–308. β€”β€”β€”. 2013. Keys to the City: How Economics, Institutions, Social Interaction, and Politics Shape Development. Princeton University Press. 126 Storper, Michael, and Susan Christopherson. 1987. β€œFlexible Specialization and Regional Industrial Agglomerations: The Case of the US Motion Picture Industry.” Annals of the Association of American Geographers 77 (1): 104–17. Storper, Michael, and Anthony J Venables. 2004. β€œBuzz: Face-to-Face Contact and the Urban Economy.” Journal of Economic Geography 4 (4): 351–70. Stuart, Toby E, and Olav Sorenson. 2003. β€œThe Geography of Opportunity: Spatial Heterogeneity in Founding Rates and the Performance of Biotechnology Firms.” Research Policy 32 (2): 229–53. β€”β€”β€”. 2005. β€œSocial Networks and Entrepreneurship.” In Handbook of Entrepreneurship Research, 233–52. Springer. Turkina, Ekaterina, Ari Van Assche, and Raja Kali. 2016. β€œStructure and Evolution of Global Cluster Networks: Evidence from the Aerospace Industry.” Journal of Economic Geography, August. doi:10.1093/jeg/lbw020. Wilson, Kenneth L, and Alejandro Portes. 1980. β€œImmigrant Enclaves: An Analysis of the Labor Market Experiences of Cubans in Miami.” American Journal of Sociology, 295–319. Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. MIT press. Zipf, George Kingsley. 1949. Human Behavior and the Principle of Least Effort. New York: Hafner. 127 CHAPTER 4 PATHWAYS FOR ENTREPRENEURSHIP DRIVEN ECONOMIC GROWTH: ENVISIONING THE INDUSTRY SPACE 4.1. Introduction Many theories exist as to why economic growth takes place. Among others, one of the oldest theories of economic growth emphasizes capital deepening, which refers to the increase in physical capital per worker (Smith 1776). In this simple model, more capital per worker increases productivity – and thus wages – by allowing each worker to work more efficiently, as opposed to the case where production is labor intensive. In more recent years, endogenous growth theory (Lucas 1988; Romer 1986) has developed, and suggests that the accumulation of human capital and resulting technological advances are the main cause of increased productivity and income per worker. Finally, at least with urban economies, agglomeration effects have been theorized to play the most critical role in growth, by allowing cities to reap the benefits of physical proximity and increasing returns to scale (Marshall 1920; Glaeser 2008). The most fundamental aspect of cities is this prevalence of agglomeration. Firms agglomerate to reap the benefits of increasing returns, which are gained primarily through proximity and thus a reduction in transport costs. Most famously, Marshall (1920) emphasized the importance of three types of agglomeration economies, namely those of goods, people, and ideas. Qualitatively, Saxenian’s (1996) study of Silicon Valley and Route 128, among others, provided a glimpse as to why some regions succeed while others decline, and urban economists and regional 128 scientists alike have developed a lengthy literature on why cities develop (Christaller 1966; Losch 1954; Krugman 1991; Glaeser 2008). Yet while much progress has been made on why economic growth takes place, relatively little has been done with regards to how cities should promote growth given this theoretical backdrop. Possibly one of the reasons for such paucity is due to the complexity of the urban economy itself. The urban economy is a complex system that encompasses countless different elements, including but not limited to firms, local governments, institutions, and people and their social networks. Externally, urban economies are also influenced by their surrounding environment through competition and collaboration with neighboring areas. These different internal and external factors constitute an β€œecosystem” where a plethora of elements act together to shape the overall outcome of the system as a whole (Batty 2013). Naturally, a holistic theory of economic development at the local level is extremely difficult to develop, because fully understanding all aspects of this ecosystem and how these elements interact with one another is almost impossible. Nevertheless, a theory as to how economic growth should take place, if provided for, should prove valuable to planners and policy makers alike, for it would allow for the efficient use of public resources towards achieving optimal growth. This paper attempts to bridge this gap between what is available and what is needed by integrating new insights from complexity science and development economics with more traditional theories of economic development that exist in the urban planning and urban economics literatures to study optimal patterns of economic growth, albeit narrowly defined. Rather than taking the holistic approach, I focus on 129 one of the key underlying aspects of growth, namely structural change (W. A. Lewis 1954). This transformation of economies is characterized by a continuous evolution of the underlying technologies, capital, institutions, and social fabric such that markets evolve and new products emerge. At the national level, development economics has traditionally focused on the shift from agriculture to manufacturing and services (Solow 1956), which leads to productivity gains and thus growth. However, at the local level, this type of simplistic transformation need not always occur. Due to spatial proximity, local economies are much more integrated with each other, and trade occurs much more easily than across national borders. Thus for example, places in central Iowa and South-central Illinois are still able to rely heavily on corn farming and processing as a main industry while maintaining comparable levels of wages. In order to provide evidence for a theory of how local economic growth through structural change should take place, I consider the collection of individual industries as elements of an economic β€œecosystem.” Thus rather than attempting to document all aspects of a local economy, I focus just on industries, which makes the analysis much more tenable. Nonetheless, when modeling structural change, this focus on industries should be suitable, as the change in composition of industries within a given local economy should represent well the structural change that is occurring within. Another critical aspect of ecosystems that is emphasized throughout this paper is the inter-relatedness of its components, which in this particular case would be how different industries are related to one another. Thus in order to properly model the industry ecosystem, I construct a network of the β€œindustry space,” where industries are linked to each other based on how similar they are. This terminology as well as the 130 broad conceptual framework follow that of Hidalgo, Klinger, BarabΓ‘si, and Hausmann’s (2007) work on the β€œproduct space,” which constructs a similar network of products (instead of industries), but differs in its focus on national economies. After constructing such a model, I develop a simple measure of a city’s position within the industry space that is based on specialization patterns, and conduct empirical analyses at the city level to discern how the position of cities within the industry space effects economic growth. I also utilize GIS to add a spatial dimension to the general empirical results and see if cities’ spatial positions affect growth pathways. 4.2. Related literature 4.2.1 Traditional theories of economic growth The characteristic feature of early growth theories is that production involves three inputs; namely labor, capital, and natural resources. Most famously, Adam Smith’s (1776) β€œThe Wealth of Nations” emphasized capital accumulation and increased labor productivity as the engine of growth, by stating that income per capita must in every nation be regulated by two different circumstances; first, by the skill, dexterity, and judgment with which its labour is generally applied; and, secondly, by the proportion between the number of those who are employed in useful labour, and that of those who are not so employed. Accordingly, Smith’s focus was on determining the factors that enhanced labor productivity, that is, the factors that affected the skill, dexterity, and judgment of workers. The key argument for Smith was that the division of labor – both within firms and industries as well as between them – was critical in achieving this increased 131 productivity, which in turn depended largely on capital accumulation. Smith – along with later scholars such as Ricardo (1891) and Malthus (1888) – argued that larger divisions of labor created more productive processes, which resulted in increasing returns and thus larger markets. The β€˜neoclassical’ school of economic thought superseded this classical theory of growth, asserting that the factors of production – labor, capital, and natural resources such as land – were scarce, and thus an increase in capital only had a temporary and limited impact on growth due to diminishing returns. Thus for continuous growth to take place, other exogenous factors needed to be taken into consideration (Cassel 1932; Domar 1947; Harrod 1948). Most famously, the neoclassical model of Solow (1956) and Swan (1956) posited that increases in the rate of economic growth were dependent on mainly two factors; the first being increased investments, and the second being technological progress. As in the classical model, an increased proportion of GDP that is invested leads to increases in capital and thus growth, yet diminishing returns due to scarcity of resources leads to convergence in growth rates. This is offset by technological progress, which is exogenous in the model and is theorized to increase the productivity of both labor and capital, resulting in sustained growth rates. Neoclassical theories of growth were met with strong criticisms due to their many simplifying assumptions, including a single production function that was assumed for all economies, as well as identical trajectories of growth that did not explain empirical discrepancies in growth patterns. As such, new growth theory (i.e. endogenous growth) emerged in the 1980s, attempting to explain the poor performance of many developing countries that had implemented policies aligned with neoclassical 132 theories. Unlike neoclassical models, new growth theory considered technological progress to be endogenous, emphasizing that economic growth results from increasing returns to the use of knowledge rather than labor and capital (Aghion and Howitt 1992; Lucas 1988; Romer 1986). Workers with greater knowledge, education, and training were theorized to increase rates of technological advancement, which boosted output and thus economic growth. The theory argued that the higher rate of returns as expected in the Solow-Swan model is greatly eroded by lower levels of complementary investments in human capital (education), infrastructure, or research and development. In addition, knowledge was theorized to be different from other economic goods because of 1) its possibility to grow boundlessly, 2) it being able to be reused at zero additional cost, and 3) its creation of spillover benefits to other firms and industries. 4.2.2 Urban economic growth Theories of growth at the urban level differ from theories developed within the development economics literature mentioned above in that the geographical perspective is greatly emphasized. While it is still the case that capital deepening, investment, human capital, and technological progress are important in promoting sustained economic growth, growth at the urban level is to a large extent dependent on agglomeration economies. Within cities, the physical proximity of labor and capital reduces transport costs, thus increasing productivity and promoting growth by creating externalities related to the sharing of production inputs, the pooling of labor markets, and the spillover of knowledge (Marshall 1920). Other than bringing the inputs of the production process together, cities also increase productivity and income by 133 facilitating face-to-face communication (Ioannides and Topa 2010; Ioannides 2013) and thus serving as the engines of economic growth (Lucas 2001). Specialization is one aspect of urban economies that promotes productivity gains. Specialization arises because denser aggregations of urban communities with a large number of firms producing in proximity can support firms that are more specialized in producing intermediate products. The gains from specialization also extend to the production of services. For example, specialized legal services – such as taxation, copyright law, or secured transactions – can be provided for more efficiently by firms that concentrate in specific areas. Specialization increases the opportunities for cost reduction through the routinization or automation of production, and the specialized firms and industries can provide for a wider spectrum of customers due to larger markets in urban areas. Large urban labor markets also facilitate better matches between worker skills and job requirements, and help to weather fluctuations in labor demand at the firm level. In an urban economy, an increase in the number of workers across the skills spectrum increases the probability that workers with a specific skill set will exist in the labor market (Helsley and Strange 1990). This in turn reduces costs associated with job searches and training, while also generating better skill matches and thus higher net wages for workers. This higher wage incentivizes workers to migrate to cities, which reinforces the process and creates additional agglomeration externalities. Knowledge spillovers are also a key aspect of agglomeration economies, and is one of the most studied (Berry and Glaeser 2005; Glaeser, Scheinkman, and Shleifer 1995; Moretti 2004; Rauch 1993; Shapiro 2006). As is the case for national 134 economies, an increase in the education or skills of a worker increases worker productivity, which induces competition among employers to increase wages to match higher productivity levels. However, a key difference is that at the urban level, face- to-face interactions – in both formal and informal settings – create a multiplier effect by allowing workers with more human capital to better share and communicate their skills with peers. Thus if workers with higher human capital generate more and better ideas, an increase in human capital at the urban level increases rates of technological innovation. Glaeser et al. (1995) show with a cross section of cities that cities with higher human capital levels experienced large increases in per-capita income over the period 1960 to 1990. In addition, spillovers due to human capital externalities have been shown to benefit the lesser skilled, suggesting that human capital also has a favorable distribution effect (Moretti 2004). Apart from agglomeration economies, regional economic theory has to a large extent been influenced by export-base theory (North 1955; Tiebout 1956). Traditional theories of economic growth at the national level mentioned in the previous section are to a large extent supply oriented, presuming that factor and product cost adjustments boost supply and resource utilization. Export-base theory on the other hand is fundamentally demand oriented, essentially making it a Keynesian-type model (W. C. Lewis 1972). It argues that an economy may be bifurcated into two sectors; an export oriented sector and a non-export oriented sector. Here, the export sector trades outside the region’s boundaries, bringing in money into the local economy and thus providing for further growth, while the non-export oriented sector supplies local consumption goods and amenities whose activity depends on the sales of the export sector. Thus the 135 focus of growth is on promoting production in the export sector, which is theorized to create β€˜multiplier effects’ by inducing growth in other related sectors that support the sector that is exporting its goods outside the region. Anecdotally, many regions have seen economic growth by developing their export- oriented sectors, as in the case of the IT industry in Silicon Valley or the biotechnology industry in Boston. Nonetheless, export-base theory has been met with much criticism. Primarily, promoting development of the export sector has been touted as simply shifting productive activity from one region to another, resulting in a zero- sum outcome where localities compete for firms by providing extensive tax incentives and other benefits that have been proved to be ineffective while degrading the quality of local government services (Bartik 1992, 2005; Zheng and Warner 2010). Others have argued that the role of urban density in promoting consumption is equally as important, finding that the local amenities provided and thus the attractiveness of the β€˜consumer city’ promotes growth by retaining workers and attracting migrants (Glaeser, Kolko, and Saiz 2001; Glaeser and Gottlieb 2006). Acknowledging this ongoing debate, the next section highlights studies that have attempted to provide theory and evidence suggesting optimal pathways for economic growth. 4.2.3 Pathways for economic growth Within the development economics literature, one of the most prominent theories of growth that focuses on patterns of development is the structural change model (Chenery 1960; W. A. Lewis 1954). This model demonstrates how an economy transforms from the subsistence level concerned with agriculture for personal 136 consumption to a modern industrialized economy. The Lewis (1954) model considers two sectors in an underdeveloped economy with an overpopulated agricultural sector and an urbanized manufacturing economy to which the excess labor migrates. The theory suggests that the excess labor migrating to the manufacturing sector brings about productivity gains and an expansion of output. The Patterns of Demand model of Chenery (1960), while similar to that of Lewis, focuses on the changing composition of consumer demand from emphasis on food commodities to multiple manufactured goods and services. The early models of Lewis and Chenery have been met with criticism regarding their underlying restrictions, such as the assumption of unlimited supply of rural labor and neglect of agriculture as a viable sector. Such criticisms notwithstanding, structural change models are unique in that they focus more on the pattern of development, rather than why development takes place. For the purposes of the current analysis, this focus on the β€˜how’ of economic development is why this model is chosen as a starting point, albeit in a general sense. Defining structural change broadly as a shift in the basic way a market or economy functions or operates (Todaro and Smith 2012), the insights of the early models can be utilized in a more contemporary, urban setting. Although not explicitly considering urban economies in general, many scholars have already expanded the model to incorporate structural changes not only in agriculture and manufacturing, but across all economic functions including urbanization, growth of populations, and trade (Chenery and Taylor 1968; Chenery, Syrquin, and Elkington 1975; Kuznets 1971). 137 Considering structural change as the underlying mechanism in which economies grow, recent studies that merge complexity science with economic growth provide a good starting point to analyze optimal pathways for growth (Hidalgo et al. 2007; Hidalgo and Hausmann 2009). Most relevant to this analysis, Hidalgo et al. (2007) develop a network model of the β€˜product space’ consisting of all products that are manufactured for export. Products are linked to each other by a measure of proximity based on observed co-production patterns across countries, where the links are stronger if the production of two products move in tandem across countries. The study shows that the product space has a distinct core-periphery structure, with less advanced products such as agricultural goods constituting the periphery, and the core being populated by more advanced goods such as vehicles, machinery, or electronics. The authors empirically demonstrate that more advanced countries populate (i.e. export) products nearer the core, and conclude that development should ideally aim to shift a country’s product mix towards products nearer the core of the network, home to the more advanced products that yield higher returns. This current study aims to extend upon this work by considering the network of industries (instead of products), and linking these industries based on agglomeration patterns, thus creating what will be referred to as the β€˜industry space.’ Considering industries allows for the analysis to be better aligned with structural change theory by suggesting that a change in the underlying structure of economies is better represented by a shift in the underlying industrial structure of economies rather than the products that these industries produce. Furthermore, linking these industries based on agglomeration patterns merges complexity science and structural change theory with 138 the urban economics and regional science literature by allowing the spatial aspects of economies to enter into the design of the network itself. Ultimately, the goal of this study is to suggest optimal pathways for structural change – through shifts in the underlying industry structure of urban economies – that maximizes growth. 4.3. The industry space 4.3.1 The Ellison-Glaeser (EG) index of coagglomeration In order to construct a network of industries (i.e. the industry space) and measure the relative positions of cities within this constructed network, first a measure of pairwise relationships (which correspond to the links in the network) between any two industries is needed. This measure should ideally capture both 1) how similar the two industries are across a variety of dimensions – such as input-output linkages, occupational mix, and the utilization of knowledge – as well as 2) the observed location patterns of the two industries, with industries that tend to be collocated earning a higher value. An ideal starting point for such a measure is the Ellison and Glaeser (1997, hereafter EG) index, which is a single-industry metric based on a β€œdartboard” model of location choice, where firms sequentially choose locations in order to maximize profits. The EG index 𝛾𝑖 is such that: 𝐺𝑖/(1 βˆ’ βˆ‘ 2 π‘š π‘₯π‘š) βˆ’ 𝐻𝑖 𝛾𝑖 = , π‘€β„Žπ‘’π‘Ÿπ‘’ 1 βˆ’ 𝐻𝑖 𝑀 𝐺𝑖 = βˆ‘(π‘ π‘šπ‘– βˆ’ π‘₯ 2 π‘š) . π‘š=1 139 Here π‘ π‘šπ‘– is the share of industry i’s employment contained in region m, and π‘₯π‘š is another measure of the size of region m, such as the region’s share of population or aggregate employment. Thus 𝐺𝑖 can be considered as a simple measure of raw geographic concentration, while Hi is the plant-level Herfindahl index of employment for industry i. The index takes a value of zero when observed employment is only as concentrated as when firms choose locations by throwing darts at a map (Ellison and Glaeser 1997). The index has several advantageous properties. First, it is an index of agglomeration, which embodies the tendency for firms to collocate due to similarities across a wide variety of aspects including labor, goods used for production, and knowledge. Second, it controls for the lumpiness of employment by accounting for plant size through the plant Herfindahl measure. This is advantageous for industries for which plant sizes are unusually large, for in these cases the observed agglomeration of employment is not all due to agglomeration per se, but also due to the fact that large plant sizes preclude a wide dispersion of employment across areas. Finally, the index is theoretically invariant to industry size and the granularity of geographic data, and thus facilitates comparisons across industries, regions, and time. Nonetheless, the EG index is insufficient for this analysis since it is a single- industry metric, and not a pairwise metric which measures the relationship between any two given industries. Ellison, Glaeser, and Kerr (2010) propose a modified version of the EG index, the EG coagglomeration index, which measures the coagglomeration of two industries, and it is this metric which is used in this analysis as a measure of the strength of relation between two given industries. Ellison, Glaeser, and Kerr (2010) 140 show that the EG coagglomeration index is equivalent to the EG index when the number of industries equals two, and that the index can be regarded as a measure of agglomerative strength in a location choice model. The EG coagglomeration index for industries i and j is βˆ‘π‘€π‘š=1(π‘ π‘šπ‘– βˆ’ π‘₯π‘š)(π‘ π‘šπ‘— βˆ’ π‘₯π‘š) 𝛾𝑖𝑗 = . 1 βˆ’ βˆ‘π‘€ 2π‘š=1 π‘₯π‘š Here, π‘ π‘šπ‘– is the share of industry i’s new establishment births – as opposed to employment shares of incumbent firms – within region m, and π‘₯π‘š is a measure of the size of area m, which in this case is the share of new establishment births in the region with respect to the US.. The use of establishment counts in lieu of employment levels is mainly due to data availability, for employment counts for new establishments at a detailed industry classification level is difficult to obtain. I also consider only the coagglomeration of new establishments instead of that of existing firms, to better account for structural change. Finally, I only consider the new establishment births of single establishment businesses and exclude multi-establishment firms to better capture true entrepreneurial activity. Since single establishments are usually much smaller in size (typically less than five employees), this has an added benefit of mitigating the error in considering establishment counts over employment levels. As opposed to the original EG index, the EG coagglomeration index for two industries does not contain the plant level employment Herfindahl index Hi, which makes data collection much easier because establishment level employment counts does not need to be measured. Instead of focusing on just the manufacturing industries as has been done in previous agglomeration studies (Ellison and Glaeser 1997; Ellison, Glaeser, and Kerr 141 2010; Rosenthal and Strange 2001, 2003), I compute pairwise EG coagglomeration index values for all industries at the 4-digit industry level using the 2007 North American Industry Classification System (NAICS), excluding agriculture, private households, public administration, as well as some industries for which entrepreneurship data was not available.50 The 4-digit NAICS level is used in order to strike a balance between granularity and error in constructing concordances between the 2002, 2007, and 2012 NAICS classifications, which needs to be done due to the panel nature of the data spanning the years 2006 to 2013. The data for new establishment counts is drawn from the Statistics of U.S. Businesses (SUSB), an annual dataset produced by the US Census Bureau that provides detailed geographic and industry level data on the count of new establishments, as well as firm births, deaths, expansions, and contractions. I calculate the EG coagglomeration index at the Metropolitan Statistical Area (MSA) level for the 2006 to 2013 panel years, using the 2009 MSA definitions published by the US Office of Management and Budget (OMB).51 The result is a total of 39,340 observations of the EG coagglomeration index for 281 industries in each panel year. Table 4.1 lists the ten most and least coagglomerated industry pairs according to the calculated EG coagglomeration measures, averaged across panel years. A comparison between the calculated values and those of Ellison et al. (2010) shows a striking similarity, with textile and apparel industries exhibiting a very high tendency 50 Entrepreneurship data was not available for postal services (NAICS 4911), rail transportation (NAICS 4821), monetary authorities - central bank (NAICS 5211), and insurance and employee benefit funds (NAICS 5251). 51 I only considered the MSAs within the lower 48 states, and further excluded some MSAs for which boundaries changed significantly during the panel years. This resulted in a total of 348 MSAs being included in the analysis. 142 Table 4.4. Ellison-Glaeser (EG) coagglomeration index values 1.a. Highest ten industries EG Rank Industry i (4 digit NAICS code) Industry j (4 digit NAICS code) index Cut and Sew Apparel Manufacturing Independent Artists, Writers, and Performers 1 0.147 (3152) (7115) Cut and Sew Apparel Manufacturing 2 Apparel Wholesalers (4243) 0.143 (3152) Cut and Sew Apparel Manufacturing 3 Motion Picture, Video Industries (5121) 0.128 (3152) Cut and Sew Apparel Manufacturing Agents for Artists, Athletes, Entertainers 4 0.099 (3152) (7114) Independent Artists, Writers, and Performers 5 Apparel Wholesalers (4243) 0.089 (7115) 6 Apparel Knitting Mills (3151) Cut and Sew Apparel Manufacturing (3152) 0.085 Independent Artists, Writers, and Performers 7 Motion Picture, Video Industries (5121) 0.085 (7115) 8 Apparel Wholesalers (4243) Motion Picture and Video Industries (5121) 0.082 9 Apparel Knitting Mills (3151) Apparel Wholesalers (4243) 0.081 Agents for Artists, Athletes, Entertainers 10 Apparel Wholesalers (4243) 0.074 (7114) 1.b. Lowest ten industries EG Rank Industry i (4 digit NAICS code) Industry j (4 digit NAICS code) index 1 Coal Mining (2121) Cut and Sew Apparel Manufacturing (3152) -0.036 2 Coal Mining (2121) Apparel Wholesalers (4243) -0.036 3 Oil and Gas Extraction (2111) Cut and Sew Apparel Manufacturing (3152) -0.035 Cut and Sew Apparel Manufacturing 4 Pipeline Transportation of Natural Gas (4862) -0.033 (3152) 5 Oil and Gas Extraction (2111) Apparel Wholesalers (4243) -0.033 6 Support Activities for Mining (2131) Apparel Wholesalers (4243) -0.033 7 Support Activities for Mining (2131) Cut and Sew Apparel Manufacturing (3152) -0.033 Cut and Sew Apparel Manufacturing 8 Agriculture Machinery Manufacturing (3331) -0.032 (3152) Agriculture Machinery Manufacturing 9 Apparel Wholesalers (4243) -0.031 (3331) 10 Sawmills and Wood Preservation (3211) Apparel Wholesalers (4243) -0.029 143 for coagglomeration. This is in spite of the fact that the Ellison et al. (2010) study considers only manufacturing industries, while the current study considers a broader industry spectrum, and also that the two studies differ in both industry classification schemes (SIC versus NAICS) and time (1987 versus 2006 to 2013). This suggests that coagglomeration patterns across industries does not vary much over time. A final observation is that the observed maximum and minimum values of the EG coagglomeration index are also very similar to those of Ellison et al. (2010), suggesting that as theorized, the index allows for the comparison of coagglomeration values between different geographic units as well as across time. 4.3.2. The EG coagglomeration index and the industry space As mentioned previously, this paper attempts to merge the literature on coagglomeration highlighted in the previous section with theories of development, especially that which is highlighted in Hidalgo et al. (2007) and Jacobs (1969). As mentioned previously, models within the development economics literature largely focus on either 1) the mix of productive factors such as physical capital, labor, and land (Heckscher and Ohlin 1991), or 2) the transformation of production towards more advanced products via technological change (Romer 1986). At the regional level, traditional central place theory suggests that there exists a hierarchy of urban areas, with large metropolises harboring a greater set of goods and services while smaller cities are limited in their production mix (Christaller 1966; Losch 1954). Structural change theory’s argument of country level development being a process of shifting towards an ever more advanced product mix, while informative, is less 144 relevant at the regional level. Most notably, at the subnational level there exists greater competition as well as integration among cities and regions due to lower transportation costs and the relative ease of moving goods and services across borders. Jacobs (1969) and Thompson (1968) both propose similar theories of city growth that take into consideration such nuances. In the initial stage, cities export only a few products, concentrating on specializing their production mix such that their comparative advantages for such products is maximized. In the second stage, a gradual process of economic maturation occurs where locally produced goods and services substitute imports. The third stage is characterized by connections with other cities and cluster economies which together build a diversified regional metropolis, and the latter stage is one of β€œnew work” where new skills and businesses are created based on the enlarged and diversified economy which fuels innovation. Considering this theoretical backdrop, there seems to exist conflicting views as to how regions should develop. In light of the constructed network of the industry space, the development economics literature suggests that development should be directed solely towards moving towards the core of the network where the export- oriented, high-spillover industries reside, while the urban planning and urban economics literature suggests that development paths are nonlinear. In order to facilitate further analysis of conflicting development theories, I first visualize the industry space, constructed by connecting 4-digit NAICS industries by the pairwise 145 Figure 4.8. Network of industries based on Ellison-Glaeser coagglomeration index (traded industries highlighted, nodes sized based on weighted-degree) 146 Figure 4.2. Network of industries. Industries are colored based on their average annual pay, and sized based on weighted degree centrality 147 EG coagglomeration index values. 52 Figures 4.1 and 4.2 correspond to this industry space, where in Figure 1 the dark nodes correspond to the traded industries as classified by Delgado, Porter, and Stern (2010, 2016), and in Figure 4.2 the darker nodes are industries with higher average annual pay. First it can be observed that much like the product space of Hidalgo et al. (2007), the constructed industry space also exhibits a clear core-periphery structure, with the nodes within the core mostly corresponding to the traded industries. This is as expected, since most of the traded industries are within the manufacturing sector, which usually tend to coagglomerate with each other (Marshall 1920; Delgado, Porter, and Stern 2016). From Figure 4.2 however, it can be seen that the relationship between network position and wages is not as clear, with a significant number of industries occupying the periphery also exhibiting relatively higher levels of average pay. This suggests that if higher income levels are a development objective, the directionality of structural change may well point in different directions depending on the circumstances. Table 4.2 lists the weighted degree centrality (i.e. the sum of the link weights for any given industry) of the ten most and least central industries. A higher centrality value is suggestive of an industry that exhibits strong coagglomerative patterns with many other industries, and also suggests that entrepreneurship in such an industry may have high potential spillover effects. It can be seen that the industries with the highest centrality values are mainly industries within the transportation, mining, and 52 The networks are visualized based on the method used by Hidalgo et al. (2007), where in the first step, a β€œskeleton” of the network is constructed using the Maximum Spanning Tree algorithm. This algorithm in essence produces a set of N-1 links (N being the number of industries) that connect all nodes in the network with its most proximal neighbor. Subsequently, all links above a certain threshold value are added to the skeleton to differentiate between more and less central nodes, while at the same time keeping the network visualization tractable. 148 Table 4.5. Weighted degree centrality of 4 digit NAICS industries 4.2.a. Highest ten industries Weighted degree Rank Industry (4 digit NAICS code) centrality 1 Other Pipeline Transportation (4869) 0.709 2 Pipeline Transportation of Crude Oil (4861) 0.665 3 Pipeline Transportation of Natural Gas (4862) 0.663 4 Cut and Sew Apparel Manufacturing (3152) 0.631 5 Oil and Gas Extraction (2111) 0.575 Independent Artists, Writers, and Performers 6 (7115) 0.573 Aerospace Product and Parts Manufacturing 7 (3364) 0.506 8 Support Activities for Mining (2131) 0.456 Agriculture, Construction, and Mining Machinery 9 Manufacturing (3331) 0.411 10 Motion Picture and Video Industries (5121) 0.375 4.2.b. Lowest ten industries Weighted degree Rank Industry (4 digit NAICS code) centrality 1 Taxi and Limousine Service (4853) -0.575 2 Department Stores (4521) -0.569 3 School and Employee Bus Transportation (4854) -0.441 4 Dry cleaning and Laundry Services (8123) -0.436 5 Securities and Commodity Exchanges (5232) -0.412 6 Grocery Stores (4451) -0.411 7 Other General Merchandise Stores (4529) -0.408 8 Apparel Knitting Mills (3151) -0.350 9 Charter Bus Industry (4855) -0.270 10 Specialty Food Stores (4452) -0.263 149 manufacturing sectors, which is as expected considering that such sectors benefit more from the fundamental forces of agglomeration as documented by Marshall (1920). The industries with the lowest centrality values tend to be those that are associated with locally oriented services, broadly corresponding to local area amenities. 4.3.3. Cities’ positions within the industry space Having constructed the industry space, I loosely follow the method of Hidalgo et al. (2007) and first visualize the relative positions of particular cities within the network by coloring the industries based on the location quotient of entrepreneurship. Figure 4.3 depicts the positions of New York and Los Angeles within the industry space, where darker nodes correspond to industries with higher location quotients for entrepreneurship. The two cities are chosen in this case because they represent the two MSAs with the largest populations. It can be seen that the positions of the two cities differ, with New York being positioned more towards the periphery of the network compared to Los Angeles. New York scores a location quotient for entrepreneurship close to 4.9 for both Securities and Commodity Exchanges (NAICS 5232) and Apparel Knitting Mills (NAICS 3151), while Los Angeles exhibits a location quotient of 8.1 for Cut and Sew Apparel Manufacturing (NAICS 3152), 5.7 for Independent Artists, Writers, and Performers (NAICS 7115), and 5.0 for Motion Picture and Video Industries (NAICS 5121). While visualization of the individual positions of cities within the industry space is revealing, nonetheless the analysis can be benefited by a general measure of the position of any given city within the network. I construct a metric which 150 Figure 4.3. Entrepreneurship activity for the New York-Northern New Jersey- Long Island MSA (top) and Los Angeles-Long Beach-Santa Ana MSA (bottom). Nodes are colored based on the location quotient of entrepreneurial activity 151 corresponds to the weighted average centrality of industries for a particular MSA such that 𝑏 βˆ‘ π‘–π‘šπ‘– 𝑐𝑖 Γ— 𝐡 πΆπ‘š = π‘š 𝑏 βˆ‘ π‘–π‘šπ‘– π΅π‘š where 𝑐𝑖 is the weighted degree centrality of industry i, π‘π‘–π‘š is the count of new establishment births for industry i in region m, and π΅π‘š is the aggregate count of new establishment births in region m. The metric is thus simply the average centrality of all the industries for which entrepreneurship takes place within the region, weighted by the share of entrepreneurship in each particular industry. A higher value is suggestive of an MSA being located nearer to the core of the network, where the centrality values of individual industries is higher, and a lower value suggests that an MSA is located nearer towards the periphery. Table 4.3 lists the calculated average centrality values for MSAs based on this metric. The MSAs that score the highest and lowest values represent a distinct pattern. The highest MSAs are smaller cities that are located in relatively geographically isolated areas less surrounded by other urban areas. The lowest MSAs are the large metropolises, including cities such as New York, Washington DC, Philadelphia, and Boston. Such a striking pattern suggests that the smaller, more isolated urban areas exhibit higher levels of entrepreneurship in industries that exhibit strong agglomeration patterns, such as manufacturing, mining, transportation and other traded industries. The larger urban areas that are part of a greater urban system on the contrary exhibit higher levels of entrepreneurship in industries that are 1) rare, and 2) geared towards providing local amenities. This is consistent with central place theory 152 Table 4.6. Average centrality of MSAs 4.3.a. Highest ten MSAs Average weighted Rank Metropolitan Statistical Area centrality 1 Midland, TX 0.108 2 Odessa, TX 0.079 3 Farmington, NM 0.049 4 Wichita Falls, TX 0.047 5 Grand Junction, CO 0.045 6 Abilene, TX 0.042 7 Longview, TX 0.041 8 Lafayette, LA 0.040 9 Houma-Bayou Cane-Thibodaux, LA 0.040 10 San Angelo, TX 0.039 4.3.b. Lowest ten MSAs Average weighted Rank Metropolitan Statistical Area centrality 1 Trenton-Ewing, NJ -0.052 New York-Northern New Jersey-Long Island, NY- 2 NJ-PA -0.050 3 Atlantic City-Hammonton, NJ -0.047 4 Albany-Schenectady-Troy, NY -0.036 5 Vineland-Millville-Bridgeton, NJ -0.035 6 Ocean City, NJ -0.033 Washington-Arlington-Alexandria, DC-VA-MD- 7 WV -0.032 8 Bridgeport-Stamford-Norwalk, CT -0.032 Philadelphia-Camden-Wilmington, PA-NJ-DE- 9 MD -0.031 10 Boston-Cambridge-Quincy, MA-NH -0.030 153 (Christaller 1966; Losch 1954) and theories of urban development outlined by Jacobs (1969) and Thompson (1968), which hypothesizes that large urban areas in the latter phases of development are able to produce goods and services that require a greater local market to sustain their existence. For example, a high level of new establishment births in the department stores or specialty food stores industries is unlikely in small urban areas where the demand for such goods and services is relatively scarce. 4.4. Empirical framework The main goal of the empirical analysis is to determine whether the positions of MSAs within the industry space influence economic growth, and if so, the directionality of structural change. Thus it is assumed that the network positions of MSAs at time t -1 (i.e. the average centrality πΆπ‘š) impact the growth of the MSA at time t. The outcome of interest is a measure of economic size, and in this case I utilize 1) employment, 2) log GDP, and 3) log GDP per capita as the relevant metrics. I also include a host of control variables that have been utilized in previous studies of growth. I include industrial diversity using the Hirschman-Herfindahl Index in order to differentiate the effects of increased diversity from the effects of change in network positions of MSAs. I also include a measure of market access that proxies for the relative size of neighboring markets, in order to control for spatial clustering effects, where POP𝑠𝑑 MAπ‘Ÿπ‘‘ =βˆ‘ . 𝑑2π‘Ÿπ‘  π‘ β‰ π‘Ÿ Here POP𝑠𝑑 is the population of the neighboring region and 𝑑 2 π‘Ÿπ‘  is the square of the distance between the centroids of the MSAs. I set a threshold value of 300 miles in calculating this metric to reflect a reasonable distance for which a market may be 154 defined. In addition, I also include the aggregate entrepreneurship rate of the MSA, calculated as the number of new establishments divided by thousands in the laborforce, as well as the log of population to account for city size. Finally, I include various demographic controls such as the unemployment rate, homeownership rate, educational attainment, as well as the share of manufacturing firms and the number of patents per capita. The specification of the model is a simple OLS regression with fixed effects: πΊπ‘Ÿπ‘œπ‘€π‘‘β„Žπ‘šπ‘‘ = 𝛼 + 𝛽1πΆπ‘šπ‘‘βˆ’1 + π‘‹π‘šπ‘‘βˆ’1𝛽𝑐 +π‘€π‘š + 𝑇𝑑 + πœ€π‘šπ‘‘βˆ’1 where πΆπ‘šπ‘‘βˆ’1 is the average centrality measure for MSAs, π‘‹π‘šπ‘‘βˆ’1 is the set of control variables, and π‘€π‘š and 𝑇𝑑 are the MSA and year fixed effects, respectively. In later Table 4.7. Summary statistics Variables N Mean Standard Min Max Deviation Average 2,784 0.000979 0.0204 -0.0778 0.147 centrality Patents per 2,784 1.443 2.359 0.0305 28.58 capita Diversity 2,784 0.0150 0.00185 0.0120 0.0307 (Hirschman- Herfindahl Index) Market access 2,784 2,412 2,120 51.09 18,289 Log population 2,784 12.71 1.057 11.16 16.77 Unemployment 2,784 7.013 2.991 2.017 28.90 rate Homeownership 2,784 67.05 5.636 47.41 85.08 rate Educational 2,784 25.54 8.045 10 59.10 attainment Manufacturing 2,784 11.91 6.718 0 53.97 share New 2,784 3.384 1.159 1.486 12.04 establishments / 1,000 labor force 155 specifications, I include a square term for the centrality measure πΆπ‘šπ‘‘βˆ’1 in order to test for the significance of the Jacobs (1969) hypothesis that pathways for growth are nonlinear. I also include interactions terms between the centrality measure πΆπ‘šπ‘‘βˆ’1 and the log population and manufacturing share variables, to account for differential effects of centrality on growth based on varying levels of city size and manufacturing intensity. I utilize panel data for the years 2006 to 2013, corresponding to eight panel years, for a total of 2,784 observations (348 MSAs Γ— 8 years). It is important to note that there may be many other explanations for the variations in growth levels across MSAs. While the list of control variables is far from exhaustive, the careful selection of variables coupled with the utilization of panel data (and thus fixed effects) is hoped to soak up a large portion of the unobservables. The single most pertinent of these is natural advantages, where growth has been noted to be largely influenced by geographic advantages such as proximity to water bodies or other physical features (Ellison, Glaeser, and Kerr 2010). The inclusion of MSA fixed effects largely eliminates the confounding of the results due to such time invariant characteristics at the MSA level, and the year fixed effects eliminate the effects of macroeconomic shocks such as the recent recessionary period which affected the nation as a whole. 4.5. Results 4.5.1 OLS estimates I first present the main empirical results estimating the effect of MSAs positions within the industry space on economic growth. The average weighted centrality 156 Table 4.5. Regression results – Log employment (1) (2) (3) (4) Dependent variable: Employment Average weighted centrality 0.902* -0.166** -0.042 -0.076 (0.473) (0.067) (0.061) (0.053) Average weighted centrality2 1.375 (0.939) Log population 1.032*** 0.646*** 0.558*** 0.547*** (0.008) (0.060) (0.067) (0.066) Average weighted centrality Γ— Log population -0.179*** (0.048) Manufacturing share 0.007*** 0.003*** 0.003*** 0.003*** (0.002) (0.000) (0.000) (0.000) Average weighted centrality Γ— Manuf. share -0.043*** (0.009) - Hirschman-Herfindahl Index 26.025*** 2.929*** 4.655** 4.719** (5.095) (0.718) (1.909) (1.950) Unemployment rate -0.026*** -0.013*** -0.016*** -0.015*** (0.002) (0.000) (0.001) (0.001) Number of establishment births per labor force -0.005 0.031*** 0.021*** 0.022*** (0.007) (0.002) (0.002) (0.002) Patents per capita -0.004 -0.004 -0.005* -0.005** (0.006) (0.002) (0.003) (0.003) Homeownership rate 0.008*** -0.001 -0.000 -0.000 (0.002) (0.000) (0.000) (0.000) Educational attainment 0.011*** 0.001 0.000 0.001 (0.002) (0.000) (0.000) (0.000) Market access -0.000 -0.000 -0.000 0.000 (0.000) (0.000) (0.000) (0.000) Constant 11.458*** 11.682*** 11.663*** 11.629*** (0.162) (0.058) (0.060) (0.058) MSA fixed effects X X X Year fixed effects X X N 2,784 2,784 2,784 2,784 R-squared 0.681 0.636 0.678 0.689 can be seen that the average centrality measure is only weakly positively significant when no fixed effects are included, and changes signs when MSA fixed effects are included. When both MSA and year fixed effects are included, the centrality measure 157 measure is mean centered in order to make interpretation of the linear and quadratic terms within the empirical model more straightforward. Table 4.5 presents the results for the specification in which the outcome variable is the log of employment levels. It ceases to be significant, and the addition of the quadratic and interactions terms does not change this outcome. When it comes to economic growth in terms of employment change, the results suggest that not the average centrality of the MSA, but rather industrial diversity, city size, manufacturing share, and aggregate entrepreneurship levels are more influential in determining growth. Turning to the results where GDP is the outcome of interest, it can be seen from columns 1 to 3 in Table 4.6 that the average centrality measure is insignificant when not considering the quadratic relationship between network position and economic growth. However, in column 4 it can be seen that the quadratic term is positive and highly significant, lending strength to Jacobs’ (1969) and Thompson’s (1968) argument that the relationship between industry mix and growth is nonlinear. The positive coefficient for the quadratic term suggests that for MSAs with average centrality values below the tipping point, it is more beneficial for growth to continue on a pathway for development that shifts the position of the MSA within the industry space towards the network periphery. On the contrary, for MSAs with average centrality values above the tipping point, the results suggest that it may be more beneficial to continue on a path of concentration of entrepreneurship in the industries within the core of the network. Thus according to the results, small cities such as Midland Texas or Farmington New Mexico with high average centrality values would benefit from more entrepreneurship in highly central industries such as manufacturing, 158 Table 4.6. Regression results – Log GDP (1) (2) (3) (4) Dependent variable: GDP Average weighted centrality 1.098 -0.160 -0.061 -0.188* (0.988) (0.113) (0.098) (0.098) Average weighted centrality2 7.990*** (2.798) Log population 1.074*** 1.106*** 0.882*** 0.876*** (0.011) (0.128) (0.116) (0.110) Average weighted centrality Γ— Log population -0.202** (0.089) Manufacturing share 0.001 0.001 0.004** 0.003** (0.002) (0.002) (0.002) (0.002) Average weighted centrality Γ— Manuf. share -0.022 (0.015) - Hirschman-Herfindahl Index 15.831** 4.308*** 8.916*** 8.237*** (6.997) (1.274) (3.007) (3.006) - - - - Unemployment rate 0.020*** 0.005*** 0.020*** 0.020*** (0.003) (0.001) (0.002) (0.002) Number of establishment births per labor force -0.002 0.033*** 0.028*** 0.027*** (0.010) (0.004) (0.006) (0.005) Patents per capita 0.009 0.005 0.006 0.006 (0.006) (0.006) (0.007) (0.006) Homeownership rate -0.001 0.000 0.001* 0.001** (0.002) (0.001) (0.001) (0.001) Educational attainment 0.013*** 0.001 0.000 0.000 (0.002) (0.001) (0.001) (0.001) Market access 0.000 -0.000 -0.000 -0.000 (0.000) (0.000) (0.000) (0.000) Constant 9.580*** 9.448*** 9.351*** 9.357*** (0.201) (0.099) (0.088) (0.086) MSA fixed effects X X X Year fixed effects X X N 2,784 2,784 2,784 2,784 R-squared 0.667 0.258 0.377 0.389 159 Table 4.7. Regression results – Log GDP per capita (1) (2) (3) (4) Dependent variable: GDP per capita Average weighted centrality 1.274 -0.199* -0.059 -0.188* (0.908) (0.111) (0.097) (0.098) Average weighted centrality2 8.054*** (2.831) Log population 0.077*** 0.257** 0.056 0.052 (0.010) (0.121) (0.117) (0.111) Average weighted centrality Γ— Log population -0.197** (0.090) Manufacturing share 0.002 0.001 0.003* 0.003* (0.002) (0.002) (0.002) (0.002) Average weighted centrality Γ— Manuf. share -0.018 (0.016) Hirschman-Herfindahl Index -15.821** 3.639*** 8.246*** 7.543** (6.526) (1.276) (2.984) (2.983) Unemployment rate -0.019*** -0.005*** -0.019*** -0.018*** (0.003) (0.001) (0.002) (0.002) Number of establishment births per labor force -0.008 0.031*** 0.023*** 0.023*** (0.010) (0.004) (0.006) (0.005) Patents per capita 0.010* 0.005 0.006 0.005 (0.005) (0.006) (0.006) (0.006) Homeownership rate -0.000 0.000 0.001* 0.001** (0.002) (0.001) (0.001) (0.001) Educational attainment 0.013*** 0.001 0.000 0.000 (0.002) (0.001) (0.001) (0.001) Market access 0.000 -0.000 -0.000 -0.000 (0.000) (0.000) (0.000) (0.000) 10.639** 10.565** 10.464** 10.473** Constant * * * * (0.186) (0.095) (0.086) (0.084) MSA fixed effects X X X Year fixed effects X X N 2,784 2,784 2,784 2,784 R-squared 0.471 0.250 0.355 0.367 160 mining, or transportation, while large cities such as New York, Philadelphia, or Boston would benefit from more entrepreneurship in industries near the network periphery, such as those that are geared towards local amenities. Interestingly, this result is also in line with the argument for β€˜consumer cities,’ which suggests that larger urban areas attract people and firms due to the diversity and quality of local amenities. Because of mean centering, the negative and marginally significant estimate for the linear average centrality term simply means that at the aggregate mean, the effect of average centrality on GDP growth is negative. The interaction terms between average centrality and population and manufacturing share respectively both exhibit negative coefficients, yet only the interaction term for population is significant. The results suggest that a larger city size mitigates the effects of the average centrality measure, yet the share of manufacturing firms in the region has no clear relationship on the marginal effects of the average centrality measure. Similar to the regression with employment levels as the outcome variable, controls such as industrial diversity, population, unemployment, and aggregate entrepreneurship continue to be significant. Table 4.7 presents the results for the specification where the outcome of interest is GDP per capita. The coefficient estimates as well as the significance levels are very much similar to those for which GDP was considered as the outcome. This is especially the case for the linear and quadratic terms of the average centrality measure, suggesting that the estimated tipping point is very much similar for both specifications. This suggests that similar to overall GDP growth, growth in individual 161 Figure 4.4. Average weighted centrality versus linear prediction for GDP, with 95% confidence intervals Figure 4.5. Average marginal effects of centrality measure at different levels of population, with 95% confidence intervals 162 wealth also is similarly influenced by the network positions of MSAs within the industry space. As an additional step, I examine the predictive margins as well as the marginal effects of the average centrality measure on GDP growth. I exclude the same analysis for GDP per capita due to redundancy. Figure 4.4 plots the relationship between average centrality values and the linear prediction, with 95% confidence intervals, for average centrality values within the observed range in the data. Figure 4.5 plots the relationship between average centrality values and average marginal effects, at representative values of log population. It can be seen visually that the tipping point mentioned above occurs very close to 0, which corresponds to the aggregate mean. While the large confidence interval corresponding to the point estimate of the vertex of the parabola lends caution to the direct interpretation of the results, nonetheless the overall shape of the curve strongly suggests that the relationship between average centrality and GDP and GDP per capita growth is nonlinear. 4.5.2. A mapping of average centrality values for MSAs As a final step, I consider the geographic distribution of the average centrality values for MSAs by grouping MSAs into two categories corresponding to high and low average centrality values. Figure 4.6 depicts the geographic location of the MSAs included in the analysis, with the darker MSAs corresponding to the cities with centrality values above the average. The mapping is consistent with the previously outlined theories of Jacobs (1969) and Thompson (1968). The MSAs with the higher centrality values are (with a 163 Figure 4.6. MSA groupings by centrality levels. 164 few exceptions, most notably Los Angeles) mostly small cities that are isolated from other urban areas.53 Thus within the scheme of economic development theory, these urban areas can be viewed as being in the early to mid stages of development, where greater specialization in export oriented traded industries such as manufacturing, transportation, and mining is more beneficial for economic growth. Thus policy prescriptions which focus on fostering entrepreneurship in these highly central industries would be more beneficial for these MSAs as opposed to policies which promote a shift towards industries in the periphery of the industry space. On the contrary, most of the MSAs that are below the average are either large metropolises or part of a regional urban system surrounded by geographically proximal urban areas. Thus for these MSAs, it would be more beneficial to promote policies centered towards fostering entrepreneurship in more peripheral industries, such as those that cater to local amenities or those are very rare such that they require a large market in order to be sustainable. Overall, the empirical results suggest that the unidirectional development paths outlined in development economics at the country level do not lend well to development at the urban and regional level, and that a more nuanced approach to development which considers the current industrial mix as well as spatial patterns is needed in order to promote sound economic growth. 4.6. Conclusions 53 The high average centrality of the Los Angeles area is due to the fact that the city has a high specialization in the motion picture and video industry (NAICS 5121) and in Independent artists, writers, and performers (NAICS 7115), both of which are in the top ten for industries with high centrality values. 165 Overall, this study provides support for structural change theory at the urban level. I find consistent evidence that the position of cities within the industry space has a significant relationship to growth. Considering fixed effects, the results also suggest that the optimal growth paths for cities depend on the current position of these cities within the industry space. Cities that harbor establishment birth patterns that are more geared towards high-spillover industries such as manufacturing should continue on this path towards the network core in order to achieve further growth. On the other hand, cities that show birth patterns focused on local demand oriented industries nearer the periphery of the network should continue on their paths toward the network periphery. While this relationship is strong when considering GDP and GDP per capita, it does not seem to apply when considering the relationship between structural change and employment. Such results suggest that structural change, while benefiting the overall growth in production of a city, may not have a significant effect on job creation. This may be due to other factors, such as the fact that job creation is a gradual process that takes longer to manifest than direct increases in output. Due to the limitation in panel length of this current study, this lagged effect cannot be studied, and thus further investigation to the causes of employment growth is warranted. Furthermore, this result may also be due to the fact that structural change and growth do not directly correlate with increased jobs. It very well may be the case that output growth does not lead to job creation, but rather increases in productivity of current workers, leading to higher wages. It could even be the case that output growth is the result of specialization and automation, which would also dampen the employment gain effects of structural change. 166 When considering the spatial location of cities together with their position within the industry space, the basic conclusion is consistent with that of central place theory (Christaller 1966; Losch 1954) and the argument of Jacobs (1969). Cities that are spatially clustered within a larger urban system generally possess industrial structures that are more focused on local amenities and local demand. Examples of such industries are department stores, specialty food stores, or amusement parks and arcades. According to central place theory, such industries for which per-capita demand is low locate in large cities because they require a threshold amount of demand in order to exist. Large cities, or cities that are part of a larger urban system have the luxury of being able to harbor such industries, and the results suggest that focusing on such industries (near the periphery of the industry space) may be more beneficial than promoting growth in high-spillover industries nearer the core of the industry space. However, cities that are small and isolated do not have this luxury, and must concentrate on the high-spillover industries near the core in order to maintain maximal economic growth. Such results contradict the linear stages of growth models within the development economics literature (Domar 1947; Harrod 1948; Rostow 1962), and imply that national growth and subnational growth follow different trajectories. 167 APPENDIX Pairwise correlations Avg. Log Educa- weigh- Log Diver- Home- Employ Log GDP Manuf. Unemp- Estab. tional ted popul- sity Patents Owner- ment GDP per share rate births attain- central- ation (HHI) ship capita ment ity Employment Log GDP 0.98 Log GDP 0.56 0.61 per capita Avg. weighted -0.16 -0.17 -0.11 centrality Log 0.98 0.97 0.45 -0.16 population Manufacturi -0.21 -0.25 -0.11 0.03 -0.26 ng share Diversity -0.31 -0.27 -0.30 0.04 -0.24 -0.17 (HHI) Unemploym -0.08 -0.06 -0.34 0.03 0.01 0.01 0.23 ent rate Establishme 0.17 0.18 0.14 -0.03 0.16 -0.37 0.05 -0.28 nt births Patents 0.20 0.22 0.36 -0.17 0.17 0.00 -0.08 -0.10 0.11 Homeowner -0.12 -0.17 -0.10 0.07 -0.17 0.21 -0.11 -0.13 0.03 -0.11 ship Educational 0.40 0.43 0.57 -0.27 0.35 -0.31 -0.14 -0.29 0.25 0.53 -0.29 attainment Market 0.03 0.04 0.06 -0.39 0.03 0.22 0.01 0.11 -0.26 0.13 0.14 0.05 access 168 REFERENCES Aghion, Philippe, and Peter Howitt. 1992. β€œA Model of Growth Through Creative Destruction.” Econometrica 60 (2): 323–51. https://doi.org/10.2307/2951599. Bartik, Timothy. 1992. β€œThe Effects of State and Local Taxes on Economic Development: A Review of Recent Research.” Economic Development Quarterly 6 (1): 102–11. β€”β€”β€”. 2005. β€œSolving the Problems of Economic Development Incentives.” Growth and Change 36 (2): 139–66. Batty, Michael. 2013. The New Science of Cities. Cambridge, MA: MIT Press. Berry, Christopher R, and Edward L Glaeser. 2005. β€œThe Divergence of Human Capital Levels across Cities.” Papers in Regional Science 84 (3): 407–44. Cassel, Gustav. 1932. The Theory of Social Economy. New York: Harcourt Brace. Chenery, Hollis B. 1960. β€œPatterns of Industrial Growth.” The American Economic Review 50 (4): 624–54. Chenery, Hollis B, Moises Syrquin, and Hazel Elkington. 1975. Patterns of Development, 1950-1970. Vol. 75. Oxford University Press London. Chenery, Hollis B, and Lance Taylor. 1968. β€œDevelopment Patterns: Among Countries and over Time.” The Review of Economics and Statistics, 391–416. Christaller, Walter. 1966. Central Places in Southern Germany. Prentice-Hall. Delgado, Mercedes, Michael E Porter, and Scott Stern. 2010. β€œClusters and Entrepreneurship.” Journal of Economic Geography 10 (4): 495–518. 169 β€”β€”β€”. 2016. β€œDefining Clusters of Related Industries.” Journal of Economic Geography 16 (1): 1–38. Domar, Evsey D. 1947. β€œExpansion and Employment.” The American Economic Review 37 (1): 34–55. Ellison, Glenn, and Edward L Glaeser. 1997. β€œGeographic Concentration in US Manufacturing Industries: A Dartboard Approach.” Journal of Political Economy 105 (5): 889–927. Ellison, Glenn, Edward L Glaeser, and William R Kerr. 2010. β€œWhat Causes Industry Agglomeration? Evidence from Coagglomeration Patterns.” The American Economic Review 100 (3): 1195–1213. Glaeser, Edward L. 2008. Cities, Agglomeration, and Spatial Equilibrium. Oxford University Press. Glaeser, Edward L, and Joshua D Gottlieb. 2006. β€œUrban Resurgence and the Consumer City.” Urban Studies 43 (8): 1275–99. Glaeser, Edward L, Jed Kolko, and Albert Saiz. 2001. β€œConsumer City.” Journal of Economic Geography 1 (1): 27–50. Glaeser, Edward L, JosΓ©A Scheinkman, and Andrei Shleifer. 1995. β€œEconomic Growth in a Cross-Section of Cities.” Journal of Monetary Economics 36 (1): 117–43. Harrod, Roy Forbes. 1948. Towards a Dynamic Economics, Some Recent Developments of Economic Theory and Their Application to Policy. London: Macmillan. 170 Heckscher, Eli Filip, and Bertil Gotthard Ohlin. 1991. Heckscher-Ohlin Trade Theory. The MIT Press. Helsley, Robert W, and William C Strange. 1990. β€œMatching and Agglomeration Economies in a System of Cities.” Regional Science and Urban Economics 20 (2): 189–212. Hidalgo, CΓ©sar A, and Ricardo Hausmann. 2009. β€œThe Building Blocks of Economic Complexity.” Proceedings of the National Academy of Sciences 106 (26): 10570–75. Hidalgo, CΓ©sar A, Bailey Klinger, A-L BarabΓ‘si, and Ricardo Hausmann. 2007. β€œThe Product Space Conditions the Development of Nations.” Science 317 (5837): 482–87. Ioannides, Yannis M. 2013. From Neighborhoods to Nations: The Economics of Social Interactions. Princeton University Press. Ioannides, Yannis M, and Giorgio Topa. 2010. β€œNeighborhood Effects: Accomplishments and Looking beyond Them.” Journal of Regional Science 50 (1): 343–62. Jacobs, Jane. 1969. The Economy of Cities. New York: Vintage. Krugman, Paul. 1991. Geography and Trade. Cambridge, MA: MIT Press. Kuznets, Simon S. 1971. Economic Growth of Nations: Total Output and Production Structure. Cambridge: Belknap Press of Harvard University Press. Lewis, W Arthur. 1954. β€œEconomic Development with Unlimited Supplies of Labour.” The Manchester School 22 (2): 139–91. 171 Lewis, William C. 1972. β€œA Critical Examination of the Export-Base Theory of Urban-Regional Growth.” The Annals of Regional Science 6 (2): 15–25. Losch, August. 1954. β€œEconomics of Location.” Lucas, Robert E. 1988. β€œOn the Mechanics of Economic Development.” Journal of Monetary Economics 22: 3–42. β€”β€”β€”. 2001. β€œExternalities and Cities.” Review of Economic Dynamics 4 (2): 245– 74. Malthus, Thomas Robert. 1888. An Essay on the Principle of Population: Or, A View of Its Past and Present Effects on Human Happiness. Reeves & Turner. Marshall, Alfred. 1920. Principles of Economics. London: MacMillan. Moretti, Enrico. 2004. β€œHuman Capital Externalities in Cities.” In Handbook of Regional and Urban Economics, 4:2243–91. Elsevier. North, Douglass C. 1955. β€œLocation Theory and Regional Economic Growth.” Journal of Political Economy 63 (3): 243–58. Rauch, James E. 1993. β€œProductivity Gains from Geographic Concentration of Human Capital: Evidence from the Cities.” Journal of Urban Economics 34 (3): 380– 400. Ricardo, David. 1891. Principles of Political Economy and Taxation. G. Bell. Romer, Paul M. 1986. β€œIncreasing Returns and Long-Run Growth.” The Journal of Political Economy, 1002–37. Rosenthal, Stuart S, and William C Strange. 2001. β€œThe Determinants of Agglomeration.” Journal of Urban Economics 50 (2): 191–229. 172 β€”β€”β€”. 2003. β€œGeography, Industrial Organization, and Agglomeration.” Review of Economics and Statistics 85 (2): 377–93. Rostow, Walt W. 1962. The Stages of Economic Growth: A Non-Communist Manifesto. Cambridge, MA: Cambridge university press. Saxenian, AnnaLee. 1996. Regional Advantage: Culture and Competition in Silicon Valley and Route 128. Cambridge, MA: Harvard University Press. Shapiro, Jesse M. 2006. β€œSmart Cities: Quality of Life, Productivity, and the Growth Effects of Human Capital.” The Review of Economics and Statistics 88 (2): 324–35. Smith, Adam. 1776. An Inquiry into the Nature and Causes of the Wealth of Nations. New York: Bartleby. Solow, Robert M. 1956. β€œA Contribution to the Theory of Economic Growth.” The Quarterly Journal of Economics 70 (1): 65–94. Swan, Trevor W. 1956. β€œEconomic Growth and Capital Accumulation.” Economic Record 32 (2): 334–61. Thompson, Wilbur Richard. 1968. A Preface to Urban Economics. Baltimore: Johns Hopkins University Press. Tiebout, Charles M. 1956. β€œA Pure Theory of Local Expenditures.” Journal of Political Economy 64 (5): 416–24. Todaro, M.P., and S.C. Smith. 2012. Economic Development. Pearson Series in Economics. Addison-Wesley. 173 Zheng, Lingwen, and Mildred Warner. 2010. β€œBusiness Incentive Use among U.S. Local Governments: A Story of Accountability and Policy Learning.” Economic Development Quarterly 24 (4): 325–36. 174 CHAPTER 5 CONCLUDING REMARKS The underlying premise of this dissertation has been that urban economies comprise a complex system, where various socioeconomic actors interact to together create emergent outcomes, such as growth and decline. One of the key theoretical arguments has been that social interactions underlie economic outcomes such as agglomeration economies, inequalities in socioeconomic resources, entrepreneurship, job creation, and economic growth. Utilizing this framework, this dissertation has attempted to answer a series of key questions regarding the interface between social interactions, agglomeration economies, new firm formation, and economic growth. The first paper focused on the question of how the dynamics of social interactions that take place within a spatial setting affect the inequality in socioeconomic resources among social actors. Utilizing an agent-based model of social network formation based on a model of preferential attachment within space, the paper first examined the evolution of degree distributions under different parameter configurations to establish conditions in which the power law is sustained and the cases in which it breaks down. While the presentation focused on a few select parameter settings, sensitivity analysis reveals that the results are robust across different configurations for world size and introvert visibility. Generally, it is found that networks in which ties are scarce and the rate of tie dissolution is relatively high exhibit power law degree distributions. In addition, networks with link dissolution are found to be fundamentally different from those in which ties are relatively permanent, 175 underscoring the importance of distinguishing between the two types of networks. The results suggest that the power law distributions found for networks grown under PA are just a special case of a broader class of networks that take into consideration churning dynamics. With regards to inequalities in social resources, it is found that the relationship between network density and inequality evolves in three distinct phases. Sparse networks exhibit a decrease in social capital inequality as network density increases, moderately dense networks exhibit increases in inequality with higher density, and very dense networks exhibit a decrease in inequality as the network reaches full saturation. The model suggests that due consideration for the relative strength of tie formation and dissolution is warranted when aiming to mitigate disparities over control of network resources. For example, when considering the spread of tacit information, encouraging more networking activity in an ethnic enclave where ties are relatively permanent and dense would have very different results than encouraging such activity among trade association members where ties are weaker and more transient. This relationship between network density and inequality among agents is further complicated when considering the spatial aspects of inequality. It is found that spatial inequality is greater – in the form of higher social capital agents being distributed near the core – when inequality among agents overall is low, suggesting that we should acknowledge the potential tradeoff between spatial inequality and individual inequality with respect to social resources. The second paper builds upon the idea that social interactions and economic outcomes are related, by examining the relative strengths of social capital and the three Marshallian agglomeration economies in promoting entrepreneurship and new 176 firm formation in cities. The key argument has been that social interactions, and more broadly social capital within the community or region, aids entrepreneurs in the early stages of forming new firms. I argue that social interactions and agglomeration economies are better represented as characteristics of a network that consider the broader regional entrepreneurial ecosystem, and propose a set of measures based on various constructed networks of industries, patents, and nonprofit organizations. Utilizing current data on entrepreneurship taken from the Statistics of US Businesses, a panel model of the count of new firm births in a Metropolitan Statistical Area- industry pair is estimated as a function of labor market proximity, input-output linkages, knowledge spillovers, and community social capital. I find evidence in support of all mechanisms, with labor market proximity being the most dominant. However, the relative magnitudes of their effects differ when considering a diverse set of industries, including the traded, local, high-tech, low-tech, manufacturing, and non- manufacturing sectors, suggesting the importance of diversified entrepreneurship policies when considering economic development. These results are non-trivial considering that most previous studies have focused on a narrow subset of industries in testing the effects of agglomerative forces on entrepreneurship. Given that many theories exist as to why economic growth takes place but few consider how growth should take place given this theoretical background, the final paper considers the question of how specifically urban economies should grow. Viewing economic growth as a process of structural change, I construct an β€˜industry space’ that consists of a network of industries linked by coagglomeration patterns. Using a measure that quantifies cities’ industrial structure as their position within this 177 industry space, I conduct empirical analysis on the relationship between industrial structure and economic growth. Results suggest that optimal growth paths vary depending on current industrial structure as well as the spatial location of cities. Cities that are isolated and more focused on high-spillover industries such as manufacturing should continue on this growth path, while those that clustered within a larger urban system and more focused on local amenities should also follow their current patterns of specialization. Overall, the results are consistent with central place theory, and provides support for structural change theory at the urban level. Consistent evidence is found that the position of cities within the industry space has a significant relationship to growth. Such results contradict the linear stages of growth models within the development economics literature, and imply that national growth and subnational growth follow different trajectories. Overall, the goal of this series of papers has been to provide policy relevant evidence in support of the notion that community development and economic growth are interrelated, as well as providing planners and policy makers alike with a methodology to identify detailed pathways for economic growth that take into account the specific socioeconomic and spatial circumstances of the city. Future research should emphasize the inseparability of social factors of a region with its economic outcomes, for the overarching findings suggest that the two are intricately related and mutually reinforcing. Furthermore, future research could be benefited by studying the urban economy as a complex system, through various techniques such as agent-based modeling and network analysis that have been attempted in this series of papers. 178