EXPLORING THE IMPACT OF BUILT ENVIRONMENT ON BIKE-SHARING IN NEW YORK CITY A Research Paper Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Master of Regional Planning by Zhiyuan Shen May 2023 © 2023 Zhiyuan Shen ABSTRACT This study evaluates the impact of the built environment on bike-sharing in New York City utilizing a multiple regression model. It explores how infrastructure characteristics, land use characteristics, and socio-demographic characteristics influence the use of the Citi Bike program. Methodologies such as buffer analysis and geospatial statistics were employed to collect observations and variables. It also acknowledges the effects of outliers on the accuracy of the model. Based on the findings, the study not only offers insightful policy recommendations to enhance bike-sharing but also provides constructive reflections on methodologies for future research, suggesting the need for more context-specific and time- bound variables to improve precision. BIOGRAPHICAL SKETCH Zhiyuan Shen is a free soul. 听凭风引 iii This work is dedicated to my mother. 谢谢最好的卢姐 iv ACKNOWLEDGMENTS Thank you all for this journey. 山川阻隔 昼夜不舍 v TABLE OF CONTENTS BIOGRAPHICAL SKETCH .................................................................................................. iii ACKNOWLEDGMENTS ....................................................................................................... v TABLE OF CONTENTS ....................................................................................................... vi LIST OF FIGURES ............................................................................................................... vii LIST OF TABLES ................................................................................................................ viii LIST OF MAPS ...................................................................................................................... ix CHAPTER 1 – INTRODUCTION ....................................................................................... 1 CHAPTER 2 - LITERATURE REVIEW ........................................................................... 3 CHAPTER 3 - DATA AND VARIABLES OF INTEREST .............................................. 6 Introduction of Study Area .............................................................................................. 6 Spatial Data Collection .................................................................................................... 9 Social Demographic Data Collection ............................................................................ 10 Data Preparation and Multiple Linear Regression Model ............................................. 10 CHAPTER 4 - EXPLANATORY ANALYSIS AND FINDINGS ................................... 14 Multiple Linear Regression Model ................................................................................ 14 Outlier Analysis ............................................................................................................. 17 Moving Forward - Policy Recommendations ................................................................ 19 Reflection - Drawback of the Model and Improvements .............................................. 20 CHAPTER 5 - CONCLUSION .......................................................................................... 23 REFERENCES .................................................................................................................... 25 APPENDIX 1 - TWO-WAY SCATTER PLOTS AND HISTOGRAMS FOR EACH VARIABLE .......................................................................................................................... 28 APPENDIX 2 - MLR MODEL IN STATA ....................................................................... 33 APPENDIX 3 - VIF ANALYSIS ........................................................................................ 34 vi LIST OF FIGURES Figure 1 - Citi Bike Station in New York City ------------------------------------------------------1 Figure 2 - NYC Citi Bike Expansion History -------------------------------------------------------7 Figure 3 - Residuals Versus Fitted Values Plot ----------------------------------------------------15 Figure 4 - Residuals Versus Fitted Values Plot with Outliers -----------------------------------17 Figure 5 - Station Clermont Ave & Lafayette Ave in September 2017 and Nearby Construction --------------------------------------------------------------------------------------------19 vii LIST OF TABLES Table 1. Statistic Summary of Data -----------------------------------------------------------------10 Table 2. Descriptive Summary of Data -------------------------------------------------------------11 Table 3. Regression Model Result -------------------------------------------------------------------14 viii LIST OF MAPS Map 1 - Citi Bike Stations in This Research --------------------------------------------------------8 Map 2 – Spatial Data Processing For One Station --------------------------------------------------9 ix CHAPTER 1 – INTRODUCTION Bikesharing has become a vital component of urban transportation systems worldwide, offering a convenient, affordable, and environmentally-friendly option for short trips and commutes. In New York City, the introduction of Citi Bike in 2013 has been a transformative event in the city's transportation landscape. With over 14,000 bikes and 600 stations spread throughout the five boroughs, Citi Bike has become a ubiquitous presence in the city, serving millions of riders annually (Figure 1) (Citi Bike, 2023). Figure 1 - Citi Bike Station in New York City (Citi Bike, 2023) However, while the availability of bikes is crucial to the success of bikesharing systems, the built environment of a city plays a crucial role in determining their viability. In particular, the design and infrastructure of streets, sidewalks, and public spaces can significantly impact the usage of bikesharing, either promoting or hindering the adoption of this sustainable mode of transportation. Additionally, social factors such as population density and economic conditions contribute to the overall success of the bikesharing program (Fuller, Gauvin, & Kestens, 2013). 1 In New York City, the built environment has undergone significant changes in recent years to promote bike-friendly infrastructure. The city has invested heavily in the expansion of bike lanes and dedicated cycling paths, with over 1,400 miles of bike lanes currently installed. Furthermore, the city has introduced bike parking corrals and increased the availability of bike racks and lockers. These improvements have made it easier and safer for people to use bikes for transportation in the city (Yunhe & Xiang, 2023). However, there are still significant barriers to the adoption of bikesharing in New York City. For instance, some neighborhoods lack sufficient bike lanes or dedicated cycling infrastructure, making it difficult for residents to access bikesharing stations or use bikes for transportation. Furthermore, concerns over safety and traffic congestion can dissuade some people from using bikes for transportation, particularly in areas with high traffic volumes. Given these challenges, it is essential to understand the relationship between bikesharing and the built environment in New York City. By analyzing the ways in which the built environment impacts the adoption of bikesharing, we can identify opportunities for improving the city's infrastructure and promoting more sustainable transportation practices. This essay explores the relationship between bikesharing and the built environment in New York City. Using the data of Citi Bike Sharing, I will examine the existing condition of bikesharing infrastructure in the city, analyzing the relationship between built environment factors (infrastructure characteristics, land use characteristics and socio-demographic characteristics) and bikesharing usage. Ultimately, the goal of this essay is to contribute to a broader understanding of how the built environment can promote sustainable transportation and improve the livability of cities. By highlighting the importance of the built environment in promoting bike-sharing and sustainable transportation, I hope to encourage policymakers, city planners, and other stakeholders to prioritize the development of bike-friendly infrastructure and support the growth of bikesharing systems in New York City and beyond. 2 CHAPTER 2 - LITERATURE REVIEW Bike sharing has become an increasingly popular mode of transportation in urban areas worldwide. The concept of bike sharing refers to a system in which a fleet of bicycles is made available to the public for shared use, typically through a network of docking stations. The swift growth of bike share initiatives across the globe has garnered significant interest from transportation experts and research circles (Teixeira, Silva, & Moura e Sá, 2020). The connections between features of cycling infrastructure, land use, public transit amenities, socio-demographic characteristics, weather conditions, and bike-sharing usage at the station level have been thoroughly examined. A common finding among these studies is that strategically placing bike share stations in areas with well-developed biking infrastructure, such as bike lanes, tends to boost bike- sharing usage. This may be because individuals in such areas are already comfortable and familiar with cycling as a mode of transportation, making them more likely to utilize bike- sharing services. Aghih-Imani and Elurun (2015) employed a discrete choice modeling framework to investigate the factors that influence the destination choices of users of Chicago's Divvy bike share system. The findings of the study indicate that the presence of bike lanes is positively associated with bike share trip generation. Noland, Smart, and Guo (2018) investigated the trip patterns of bike-sharing users in New York City and their association with land use, subway systems, and bike lanes. The authors found that the availability of bike lanes was positively associated with bike-sharing usage. Furthermore, several studies have investigated the land-use factors that influence bike- sharing usage in urban areas. Specifically, studies have found that the proximity of bike share stations to areas with cultural, commercial, educational, and recreational land uses is likely to attract bike share users. He et al. (2020) suggest that system operators consider the potential for bike share users to visit these types of places when planning station placement. Moreover, the presence of these amenities aligns with the idea that bike share systems can contribute to enhancing the overall urban experience by providing access to diverse destinations and experiences. Wang and Cheng (2021) applied a geographically weighted 3 regression model to capture the spatially varying relationships between bike-sharing demand and its influencing factors. They researched retail, school, and park areas near bike-sharing stations and found that the larger these areas are, the more bike-sharing usage the stations generate. However, the relationship between bike share usage and public transport facilities has been found to be more complex (Martin & Xu, 2022). Some studies report a complementary effect between public transit and bike-sharing usage. Rixey (2017) examined the factors that influence station-level bike-sharing ridership in three American bike share systems. The study's findings reveal that higher access to public transit stations like bus stops and subway stations contributes positively to bike-sharing ridership. Furthermore, Rixey identified network effects, such as connectivity and accessibility of bike share stations within the system, as significant factors influencing ridership levels. In contrast, Campbell and Brakewood's (2017) research highlights the complex relationship between bike-sharing and public transit. The study suggests that, in some cases, bike-sharing systems can act as substitutes for traditional public transit options, such as buses, impacting their ridership levels. Socio-demographic variables play a crucial role in the adoption of bike-sharing services. Buck and Buehler (2012) conducted a study on the factors affecting ridership in the Capital Bikeshare program in Washington, D.C.. Their findings showed that higher population density and increased employment density were positively associated with bike-sharing ridership. This can be attributed to the fact that denser areas provide more potential users and trip destinations, making bike-sharing more convenient and accessible for residents and workers. On the other hand, other studies identified factors that negatively influenced bike- sharing usage. For instance, Zhu and Ali (2022) that lower-income neighborhoods were less likely to use bike-sharing services. This could be due to various reasons, such as affordability, lack of awareness about the service, or insufficient infrastructure catering to the needs of these communities. Furthermore, Böcker and Anderson’ research (2020) reveal that bike-sharing is less common in older age communities because bike-sharing both as a 4 stand-alone system and in conjunction with public transport is less accessible to and suited to older age groups. Weather and time-related conditions have been found to play a significant role in affecting bikesharing usage. Weather plays a significant role in the demand for bike-sharing services. Favorable weather, such as sunny days and mild temperatures, encourage more people to use bike-sharing. Conversely, adverse weather conditions like rain, snow, and extreme temperatures reduce bike-sharing usage (El-Assi & Nurul, 2015). Faghih-Imani et al. (2014) discovered that increased temperatures lead to higher bikeshare usage in Montreal, Canada. Rixey (2013) determined that rainfall negatively impacts bikesharing ridership. Gebhart and Noland (2014) revealed that fluctuations in weather conditions on an hourly basis not only influence bikeshare usage but also the duration of the trips taken. As highlighted in the literature review, the relationship between bikesharing and the built environment is a growing area of research that seeks to understand the factors that contribute to the success and adoption of bikesharing systems in urban areas. The built environment encompasses various aspects of a city, including its infrastructure, land use, and socio- demographic characteristics, all of which play a critical role in determining the usage and adoption of bikesharing systems. Moreover, research has shown that weather and time- related conditions can significantly impact bikesharing usage, adding another layer of complexity to the relationship between bikesharing and the built environment. In light of the existing research, this study will focus on the relationship between bikesharing and the built environment in New York City. Using the data of Citi Bike Sharing, I will investigate the impact of various built environment factors on bikesharing usage, aiming to contribute to a deeper understanding of the conditions that promote or hinder the adoption of bikesharing systems in urban areas. Regarding the research focused on New York City and the weather and time-related conditions are the same across the research area, this research does not include these factors. 5 CHAPTER 3 - DATA AND VARIABLES OF INTEREST Introduction of Study Area As of now, Bike sharing has become an essential mode of transportation in New York City, providing a convenient, affordable, and eco-friendly option for short trips and commutes. Since its inception, Citi Bike has continuously expanded its coverage area to include more neighborhoods and boroughs in New York City. Citi Bike's planning and development in New York City can be divided into three main phases, each representing significant milestones and expansion efforts. These phases played a crucial role in shaping the bike- sharing program and extending its reach to serve a broader range of residents and visitors (Pic 2) (Wikimedia, 2023). Phase 1: Initial Launch and Early Expansion (2013 - 2015) The initial launch of Citi Bike took place in May 2013, with 6,000 bikes and 332 stations in Manhattan and parts of Brooklyn. This phase marked the beginning of the bike-sharing program in New York City, focusing on establishing a reliable and accessible network of bikes and stations. During this period, Citi Bike gained popularity and increased its ridership, prompting the need for further expansion (NYC DOT, Alta, & Citi, 2014). Phase 2: Significant Expansion and System Upgrades (2015 - 2018) In 2015, Citi Bike initiated a significant expansion to increase its service area and the number of bikes and stations. This round of expansion ended in around 2018, adding new neighborhoods in Manhattan, Brooklyn, and Queens, and increasing the fleet to over 12,000 bikes and 750 stations. This phase also included system upgrades to improve the user experience, such as the introduction of lighter bikes, improved docking stations, and enhancements to the mobile app (Rivoli, 2018). Phase 3: Enlarging the Service Area and Introduction of Electric Bikes (2019 - now) In July 2019, Citi Bike announced its ambitious phase 3 expansion plan, aiming to promote a more inclusive and equitable approach to bike-sharing. This phase targeted neighborhoods in Brooklyn, Queens, Upper Manhattan, and the South Bronx (Offenhartz & Corso, 2020). 6 Another notable development during this phase was the introduction of electric pedal-assist bicycles in June 2021. The addition of 4,000 electric bikes to the fleet provided an alternative for users facing physical limitations or challenging topography, making bike- sharing more accessible to a wider audience (Furfaro, 2019). Figure 2 - NYC Citi Bike Expansion History To analyze the relationship between NYC bike-sharing and the built environment, this research uses the data of bike-sharing ride time generated in each station in September 2017 as the factor. The year of 2017 is interesting to research for the following reasons. Firstly, in 2017, Citi Bike was reaching the end of its Phase 2 expansion. This period saw a significant increase in the number of bikes and stations, as well as the expansion of service areas. Analyzing data from 2017 can provide a snapshot of how the program was performing and what factors contributed to its success during this period of growth. Secondly, since 2019, the Citi Bike system has updated the database of bike stations to embrace the launch of E-bikes. Thus, data from 2017 better captures the research focus for it avoids the influence brought by E- bikes. 7 In bike-sharing research in the U.S., bike-sharing data in September is widely used because the temperature in September is most suitable for biking throughout the year. Additionally, the total number of rainy days is small in September. Thus, this research uses ride time data in September to minimize the influence of weather conditions. Ride time generated in each station reveals the bike-sharing usage situation. Compared to ridership, ride time data decrease the impact of the extreme cases where some users accidentally open the bike and quickly decide to return it. The data is acquired from the Citi Bike NYC website (Citi Bike, 2023). The raw data includes information of the trip start and end times, dates, and station coordinates. This research does not include stations in Jersey City and Staten Island because of their special location. After the data cleaning process, there are 688 stations that are used in this research. Below is a map illustrating the geographic distribution of Citi Bike stations. As is shown in the map, the stations are concentrated in high-density areas, such as Midtown and Downtown Manhattan and Downtown Brooklyn, where there is a high demand for transportation options (Map 1). Apart from the ride time data, the data of the number of docks at one station is also acquired from the Citi Bike NYC website to measure the capability of each station. Map 1 - Citi Bike Stations in This Research 8 Spatial Data Collection The raw data of built environment elements is gathered from NYC Open Data (NYC Open Data, 2023). Based on literature review and consideration of data availability, this research uses bike route length, parks, universities and colleges, subway entrances, galleries and museums and sidewalk cafes to represent biking infrastructure, recreation, education, public transportation, tourism and commercial factors. To collect the data of the built environment near each bike station, this research creates a 500-meter buffer zone of each station via ArcGIS. Buffer analysis is a spatial analysis technique used in Geographic Information Systems (GIS) to evaluate the area surrounding a particular feature. It involves creating a zone or "buffer" of a specified distance around a point and effectively capturing all the elements within that distance. A 500-meter buffer is a reasonable size to conduct bikesharing station usage research because it represents a comfortable walking distance for most people, typically taking around 5-7 minutes to cover on foot. This distance is generally considered the acceptable range for accessing public transport or shared mobility services, including bikesharing stations. Furthermore, overlay analysis and geospatial statistics are conducted to capture the built environment factors within each buffer zone (Map 2). Map 2 – Spatial Data Processing For One Station 9 Social Demographic Data Collection The social demographic data is gathered from American Community Surveys (ACS) 2016- 2020 (5-year estimates) at the census block group level (Social Explorer, 2023). Census block group is the smallest geographic unit of the ACS data. To improve the data accuracy, this research chose the census block group as the research unit and used the data of the census block group to represent the social demographics of each bike station. Based on literature review and consideration of data availability, the demographic data include Medium Age, Per Capita Income, Ratio of Income Under Poverty Level and Population Density. Data Preparation and Multiple Linear Regression Model For explanatory analysis, this research conducts a Multiple Linear Regression (MLR) between bike-sharing ride time in each station and the built environmental variables. Multiple linear regression is a widely used statistical method that aims to model the relationship between a dependent variable and multiple independent variables. It is a fundamental technique in predictive modeling and data analysis. To prepare the data for MLR, I first join and clean the data acquired above. In order to improve the accuracy of this research, stations with missing values are deleted. There are 586 stations that are used for further research. The statistical summary of each variable is detailed below. Table 1. Statistic Summary of Data Variable Mean Std. Dev. Min. Max. Ride time generated at each station 58570 44591 234 308900 Bikesharing & Bicycling Infrastructure Factors Number of docks at each station 32 10.5 0 67 Bike lane length (kilometers) in 500-m buffer 2.1 1.1 0 5.17 10 Land Use Factors Area of parks in 500-m buffer (kilometers2) 0.03 0.05 0 0.28 Number of universities or colleges in 500-m buffer 0.34 0,76 0 5 Number of galleries or museums in 500-m buffer 6.29 13.97 0 32 Number of subway entrances in 500-m buffer 1.38 1.59 0 11 Sidewalk cafe length (kilometers) in 500-m buffer 6.99 3.78 0 18.38 Social Demographic Factors Population density at census block group level (per mile2) 67889 56672 0 389511 Poverty rate at census block group level (percentage) 7 8 0 51 Per capita income at census block group level 90965 58784 4928 587587 Median age at census block group level 38.38 9.27 12.5 71.2 After joining and cleaning the data, I create two-way scatter plots and histograms for each variable to understand the relationship between the dependent and independent variables as well as their distributions (Appendix 1). Notably, because total ride time, population density and per capita income are very skewed and exhibit an exponential growth pattern, I apply natural logarithm transformation to both data. After the transformation, both of their two- way scatter plots and histograms represent a better fit of linear regression. The descriptive summary of variables is as follows: Table 2. Descriptive Summary of Data Variable Functional Description Estimated Names Form Sign y bikeu ln Ride time generated at each station. (min) x1 capac - Number of docks at each station + x2 lane - Bike lane length (kilometers) in 500- + m buffer 11 x3 park - Area of parks in 500-m buffer + (kilometers2) x4 uni - Number of universities or colleges in + 500-m buffer x5 gallery - Number of galleries or museums in + 500-m buffer x6 subway - Number of subway entrances in 500- +/- m buffer x7 cafe - Sidewalk cafe length (kilometers) in + 500-m buffer x8 popud ln Population density at census block + group level (per mile2) x9 poor - Poverty rate at census block group - level (percentage) x10 capti ln Per capita income at census block + group level x11 mage - Median age at census block group - level The Multiple Linear Regression model can be represented by the following equation: ln(bikeu) = a + b1*capac + b2*lane + b3*park + b4*uni + b5*gallery + b6*subway + b7*cafe + b8*ln(popud) + b9*poor + b10*ln(capti) +b11*mage + ε where: ln(bikeu) represents the natural logarithm of the dependent variable (total ride time generated in each station) a is the intercept, which corresponds to the value of y when all independent variables are equal to 0. b₁, b₂, ..., bₙ are the coefficients 12 capac, park, cafe, gallery, lane, uni, mage, popud, poor and capti represent independent variables ε denotes the error term, capturing the difference between the observed value and the predicted value of the dependent variable. This equation shows how the dependent variable (total ride time generated in each station) is related to the independent variables (built environment factors) through a linear combination of their respective coefficients (b₁, b₂, ..., bₙ), plus an error term (ε) to account for any discrepancies between the model's predictions and the actual observed values. 13 CHAPTER 4 - EXPLANATORY ANALYSIS AND FINDINGS Multiple Linear Regression Model Table 3 presents the results of the Multiple Linear Regression model. As expected, most measurements of the built environment and demographic information are significantly associated with the bikeshare station capacity. This is not surprising since the relationships between surrounding environments and station-level ridership are well documented in literature. Table 3. Regression Model Result Number of obs = 541 R-squared = 0.3450 Adj R-squared = 0.3314 Variable Coef. t-stat P-value Constant 6.636 9.05 0.000 Bikesharing & Bicycling Infrastructure Factors Number of docks at station 0.003 7.42 0.000 Bike lane length (kilometers) in 500-m buffer 0.137 5.28 0.000 Land Use Factors Area of parks in 500-m buffer (kilometers2) 2.942 3.04 0.002 Number of universities or colleges in 500-m buffer 0.091 2,34 0.020 Number of galleries or museums in 500-m buffer 0.004 1.81 0.071 Number of subway entrances in 500-m buffer -0.005 -0.21 0.837 Sidewalk cafe length (kilometers) in 500-m buffer 0.052 4.55 0.000 Social Demographic Factors ln(Population density at census block group level (per mile2)) 0.094 2.78 0.006 Poverty rate at census block group level 0.016 2.68 0.008 ln(Per capita income at census block group level) 0.125 2.31 0.021 14 Median age at census block group level –0.002 -0.51 0.612 Overall, the R-Square of the model is 0.3450 (Table 3). That is to say, this model explains the linear relationship well. To further test the MLR model, we first create residuals versus fitted values plot (Figure 3). According to the graph, most of the spots reside near the fitted value-line. Figure 3 - Residuals Versus Fitted Values Plot Furthermore, we conduct a Variance Inflation Factor (VIF) test of the model. VIF is a measure of the degree of multicollinearity (correlation among independent variables) in a multiple linear regression (MLR) model. VIF assesses how much the variance of the estimated regression coefficient for a particular independent variable increases because of the correlation of that variable with the other independent variables in the model. A VIF value of 1 indicates that there is no correlation between the independent variable and the other independent variables, and a VIF value of greater than 1 indicates the presence of some degree of multicollinearity. A commonly used rule of thumb is that VIF values greater than 5 or 10 indicate high multicollinearity and should be addressed. The VIF of each variable ranges from 1.06 to 2.18. The mean VIF is 1.34. Thus, there is no significant multicollinearity issue of this model (Appendix 3). 15 After testing the model, we now interpret the result of the MLR. As shown in the model, the P values of capacity of each bikestation and the presence of bicycle lanes are 0.000, which exhibit strong linear relationships with the bikesharing usage, indicating that areas with more docking points and more bicycle lanes tend to have more bikesharing usage. This supports the idea that planners and system operators prioritize areas with existing cycling infrastructure and higher capacities for bikeshare installations. Additionally, the positive coefficients for the number of docks at a station (0.003) and bike lane length (0.137) indicate that increasing the number of docks and bike lane length leads to a slight increase in bike- sharing ride time. The P-values of parks (0.002), cafes (0.000) and universities (0.020) in the neighborhood are all below the 0.01 threshold, indicating that these variables have a statistically significant relationship with bike-sharing ride time. The p-value for museums and galleries (0.071) suggests that the relationship between this variable and bike-sharing usage is statistically significant at the 10% significance level. The coefficient of universities or colleges (0.091), galleries or museums (0.004), sidewalk cafe length (0.052), and parks (2.942) all show positive associations with bike-sharing ride time, which might reflect the expectation of system operators that bikeshare users are likely to visit cultural, commercial, educational and recreational spots in the city. This is in line with the idea that bikeshare systems can contribute to enhancing the overall urban experience. Notably, the relatively large positive coefficient for the area of parks (2.942) implies that parks play a significant role in attracting more bike-sharing users. On the other hand, the P-value for number of subway entrances is 0.837, reflecting that the linear relationship between subway and bikesharing ride time is not statistically significant. This result echoes that the relationship between bikesharing and subway station distribution is ambiguous. As for the social demographic factors, the P-value for population density after natural logarithm transformation and poverty rate are 0.006 and 0.008, indicating a significant relationship at 1% significance level. Surprisingly, the coefficient for poverty rate is 0.016, 16 showing that poverty rate is positively correlated with bikesharing usage. This is in contrast to the former research, which indicates that people with poor economic status tend to use bikesharing more often than other transportation for its lower prices. Thus, the affordability of bikesharing might be an issue that officials, policy makers and urban planners need to address in the future. On the other hand, P-values for per capita income and median age are larger than 0.1, which represent non-significant relationships with the bikesharing ride time. This result indicates that there might be other variables that better capture the relationship between income, age and bikesharing ride time. Outlier Analysis After interpreting the results of the regression analysis, we have identified several significant relationships between the independent variables and bikesharing usage. However, it is essential to explore the possible presence and impact of outliers in the data, as they can potentially influence the estimated coefficients and weaken the model's explanatory power. According to the residuals versus fitted values plot (Figure 4), we have defined 5 outliers. Figure 4 - Residuals Versus Fitted Values Plot with Outliers 17 Among 5 outliers, 3 of them are located within or on the edge of Central Park. These observations share similar characteristics - although they have very low value on variables like population density, number of universities or colleges, sidewalk cafe length, etc, they have the highest value on area of parks. In the MLR model, the variable area of parks has the highest coefficient. Thus, these observations have the highest residual values. This result suggests that factors beyond population density are contributing to the increased demand for bike-sharing services in these locations. Factors such as the park's popularity as a tourist destination, its appeal for recreational activities, and the desire for alternative, eco- friendly transportation methods may all play a role in driving up usage. This discovery underscores the importance of considering additional variables and local context when evaluating the performance of bike-sharing stations and designing efficient allocation strategies. The finding that bike-sharing stations in large public spaces, such as Central Park, experience high ride time despite low population density also suggests that future expansion strategies should consider incorporating such areas. Central Park sets a great example for introducing bikesharing system into large public spaces. Central Park Full Loop is a 6.1-mile route that circumnavigates the park. More than bikesharing stations are installed in the spots that are along the Loop and have good visibility and accessibility, providing flexibility for visitors to start and end the tour. While the other two outliers, station Clermont Ave & Lafayette Ave and station Lexington Ave & E 63 St have unexpected low total ride of time. These outliers indicate that seasonal or time-specific factors may influence station-level total ride time. Factors specific to the time period such as nearby construction, events, or changes in commuting patterns might affect bikesharing. For station Clermont Ave & Lafayette Ave, in September 2017, the nearby high school had undergone construction, which discouraged the use of bike-sharing (Figure 5). Station Lexington Ave & E 63 St was temporarily removed for New York City road work in September 2017 (Citi Bike Service Update, 2023). These outliers indicate that 18 for future research, deleting observations that are influenced by seasonal or time-specific factors may improve the accuracy of the model. Figure 5 - Station Clermont Ave & Lafayette Ave in September 2017 and nearby construction (Google Map, 2023) Moving Forward - Policy Recommendations Based on the findings from the regression analysis, the following policy implications and recommendations can be derived to enhance bikesharing usage and improve urban planning: 1 - Prioritize and expand cycling infrastructure: As the capacity of bike stations and the presence of bicycle lanes have strong positive relationships with bikesharing usage, it is recommended that planners and system operators prioritize areas with existing cycling infrastructure for bikeshare installations. Additionally, investing in the expansion of bicycle 19 lanes can further encourage bikesharing usage, promote cycling as a sustainable transportation mode, and improve road safety for cyclists. 2 - Integrate bikesharing with cultural, commercial, educational, and recreational spots: The positive associations between bikesharing usage and the presence of parks, museums, galleries, cafes, and universities suggest that bikeshare users are likely to visit these locations. Urban planners and system operators should consider integrating bikesharing stations with these points of interest, making it more convenient for users to access these spots and enhance the overall urban experience. 3 - Address affordability and accessibility for low-income populations: The positive correlation between the poverty rate and bikesharing usage implies that low-income individuals may rely on bikesharing as an affordable transportation option. To ensure equitable access to bikesharing, policymakers and planners should consider implementing subsidized membership programs, expanding the bikesharing network to underserved areas, and providing various payment options for users without access to credit or debit cards. 4 - Investigate the ambiguous relationship between bikesharing and public transit: The non- significant relationship between bikesharing usage and proximity to subway stations warrants further investigation. It is crucial to explore how bikesharing can complement public transit systems, reduce congestion, and facilitate last-mile connectivity. Planners and policymakers should consider integrating bikesharing stations with public transit hubs, providing real-time information on bikesharing availability and public transit schedules, and offering joint ticketing or discounted passes for multimodal transportation. Reflection - Drawback of the Model and Improvements In reflecting on the analysis, multiple linear regression (MLR) might not be the best model for understanding the relationships between bikesharing usage and various explanatory variables. MLR assumes that the relationships between the dependent and independent variables are constant across the entire study area. However, in reality, these relationships 20 might be spatially heterogeneous, with different local patterns emerging in different parts of the city. For instance, access to the parks might be an important factor in areas like lower Manhattan, where people have little access to the green spaces. However, this factor’s impact might be weaker in areas like Downtown Brooklyn, where the total area of parks are larger and the accessibility to parks are much higher. Geographically Weighted Regression (GWR) could be a more appropriate method to capture spatial variation in the relationships between docking points and the explanatory variables. GWR is a local modeling technique that allows the regression coefficients to vary across space, accounting for spatial non-stationarity in the relationships between variables. By using GWR, it is possible to identify local hotspots and areas where certain factors have a stronger or weaker influence on bikeshare docking points. This approach could provide a more nuanced understanding of the underlying spatial patterns and help planners and system operators make more informed decisions about bikeshare installations in different neighborhoods. Another reflection worth considering is the potential to improve data quality in order to obtain more accurate and reliable results. Specifically, this research uses census block groups as the size to measure social demographic information. Though it is the smallest size of the current available database, in future research, if more accurate data could be acquired to represent the social demographic factors of each station usage, it could better capture the relationship. Apart from that, there might also be better data to measure the factors in the model. For example, using jobs within the buffer instead of the unemployment rate to represent the job market, using shops with the buffer instead of sidewalk cafes to represent commercial development might be a better fit in this model. In summary, this study has highlighted the importance of reflecting on the choice of analytical methods and data quality when investigating the factors influencing bikesharing. While multiple linear regression has provided valuable insights, Geographically Weighted Regression could be a more appropriate method for capturing spatial variation in the relationships between variables. Future research could explore the use of GWR to identify 21 local patterns and better understand the spatial dynamics of bikeshare systems. Additionally, improving data quality through refining variable measurements, incorporating additional factors, and ensuring data accuracy and reliability is essential for obtaining more accurate and reliable results. By addressing these limitations, future research can provide deeper insights into the factors influencing bikeshare docking points and inform more targeted strategies for promoting cycling and active transportation in urban areas. 22 CHAPTER 5 - CONCLUSION In conclusion, bike-sharing has proven to be a valuable mode of transportation in New York City, offering residents and visitors an accessible, affordable, and environmentally-friendly means of getting around. The Citi Bike program, in particular, has transformed the city's transportation landscape, with its growing network of stations and bikes catering to millions of riders every year. As bike-sharing continues to grow in popularity, it is essential for urban planners, policymakers, and system operators to understand the factors driving its usage and adopt targeted strategies to promote cycling and active transportation. This research has analyzed the built environment (infrastructure characteristics, land use characteristics and socio-demographic characteristics) impact on bike-sharing in New York City based on a multiple regression model. Methodologies including buffer analysis and geospatial statistics are adopted to acquire observations and variables. The result shows that the capacity of bike stations and the presence of bicycle lanes, parks, museums, galleries, cafes, and universities were all positively associated with bike-sharing usage, emphasizing the importance of investing in cycling infrastructure and strategically placing bike stations near key attractions. Interestingly, the ratio of people under the poverty level also showed a positive correlation, suggesting that bike-sharing's affordability may be an attractive factor for economically disadvantaged populations. However, factors such as median age and proximity to subway entrances did not exhibit significant relationships with usage, indicating the need for further research to better understand these variables' impacts on bike- sharing. Additionally, outliers impact the model's accuracy. Three outliers in Central Park highlight the importance of considering local context and additional factors, while two others with low ride times suggest that seasonal or time-specific factors should be accounted for in future research to improve model accuracy. Furthermore, based on the MLR model, the study offers policy recommendations, including prioritizing and expanding cycling infrastructure, integrating bikesharing with cultural, commercial, educational, and recreational spots, addressing affordability and accessibility for low-income populations, and investigating the ambiguous relationship between bikesharing and public transit. 23 This article ends by acknowledging the limitations of multiple linear regression in capturing spatial variation in the relationships between variables and has proposed using Geographically Weighted Regression in future studies to provide a more nuanced understanding of spatial patterns. Additionally, improving data quality through refining variable measurements, incorporating additional factors, and ensuring data accuracy and reliability is essential for obtaining more accurate and reliable results. By addressing these limitations, future research can provide deeper insights into the factors influencing bikeshare docking points and inform more targeted strategies for promoting cycling and active transportation in urban areas. 24 REFERENCES City of New York, N. Y. C. O. D. (n.d.). NYC open data. NYC Open Data WP Engine. Retrieved May 1, 2023, from https://opendata.cityofnewyork.us/ Additional informationNotes on contributorsYunhe CuiYUNHE CUI is a PhD Student in the Department of Geography. (n.d.). Competition, integration, or complementation? exploring dock-based bike-sharing in New York City. Taylor & Francis. Retrieved May 1, 2023, from https://www.tandfonline.com/doi/full/10.1080/00330124.2022.2081224 Böcker, L., Anderson, E., Uteng, T. P., & Throndsen, T. (2020). Bike sharing use in conjunction to public transport: Exploring spatiotemporal, age and gender dimensions in Oslo, Norway. Transportation Research Part A: Policy and Practice, 138, 389–401. https://doi.org/10.1016/j.tra.2020.06.009 Citi Bike: NYC's Official Bike Sharing System: Citi Bike NYC. Citi Bike: NYC's Official Bike Sharing System | Citi Bike NYC | Citi Bike NYC. (n.d.). Retrieved May 1, 2023, from https://citibikenyc.com/ Cui, Y., Chen, X., Chen, X., & Zhang, C. (2022). Competition, integration, or complementation? exploring dock-based bike-sharing in New York City. The Professional Geographer, 75(1), 65–75. https://doi.org/10.1080/00330124.2022.2081224 El-Assi, W., Salah Mahmoud, M., & Nurul Habib, K. (2015). Effects of built environment and weather on bike sharing demand: A station level analysis of Commercial Bike Sharing in Toronto. Transportation, 44(3), 589–613. https://doi.org/10.1007/s11116- 015-9669-z Faghih-Imani, A., Eluru, N., El-Geneidy, A. M., Rabbat, M., & Haq, U. (2014). How land- use and urban form impact bicycle flows: Evidence from the bicycle-sharing system (Bixi) in Montreal. Journal of Transport Geography, 41, 306–314. https://doi.org/10.1016/j.jtrangeo.2014.01.013 Furfaro, D. (2019, February 28). Electric Citi Bikes will be easier to find but more expensive. New York Post. Retrieved May 1, 2023, from https://nypost.com/2019/02/28/electric-citi-bikes-will-be-easier-to-find-but-more- expensive/ 25 Gebhart, K., & Noland, R. B. (2014). The impact of weather conditions on bikeshare trips in Washington, DC. Transportation, 41(6), 1205–1225. https://doi.org/10.1007/s11116- 014-9540-7 He, B. Y., Zhou, J., Ma, Z., Chow, J. Y. J., & Ozbay, K. (2020). Evaluation of city-scale built environment policies in New York City with an emerging-mobility-accessible synthetic population. Transportation Research Part A: Policy and Practice, 141, 444–467. https://doi.org/10.1016/j.tra.2020.10.006 Jake OffenhartzPublished Apr 29, 2020S. F. T. R. E., & Phil CorsoPublished Apr 30, 2023 at 5:54 p.m. (n.d.). Citi bike expands to the Bronx and Upper Manhattan, as lyft lays off workers. Gothamist. Retrieved May 1, 2023, from https://gothamist.com/news/citi-bike-expands-bronx-and-upper-manhattan-lyft-lays- workers Martin, R., & Xu, Y. (2022). Is tech-enhanced bikeshare a substitute or complement for public transit? Transportation Research Part A: Policy and Practice, 155, 63–78. https://doi.org/10.1016/j.tra.2021.11.007 Noland, R. B., Smart, M. J., & Guo, Z. (2018). Bikesharing trip patterns in New York City: Associations with land use, subways, and Bicycle Lanes. International Journal of Sustainable Transportation, 13(9), 664–674. https://doi.org/10.1080/15568318.2018.1501520 Press releases. DOT Press Releases – NYC DOT, Alta and Citi Announce Agreement to Expand and Enhance Citi Bike Program in New York City. (2014, October 28). Retrieved May 1, 2023, from https://web.archive.org/web/20141103185842/http://a841- tfpweb.nyc.gov/dotpress/2014/10/citi-bike-program-in-new-york-city/#more- 339#more-339 Pucher, J., & Buehler, R. (2012). Bicycle integration with public transport. Encyclopedia of Sustainability Science and Technology, 806–821. https://doi.org/10.1007/978-1- 4419-0851-3_490 Raviv, T., & Kolka, O. (2013). Optimal inventory management of a bike-sharing station. IIE Transactions, 45(10), 1077–1093. https://doi.org/10.1080/0740817x.2013.770186 26 Rivoli, D. (2018, April 7). Citi bike reaches 50 millionth ride milestone as Bike-Share Network expands. New York Daily News. Retrieved May 1, 2023, from https://www.nydailynews.com/new-york/citi-bike-reaches-50-millionth-ride- milestone-article-1.3541251 Social explorer. Social Explorer. (n.d.). Retrieved May 1, 2023, from https://www.socialexplorer.com/ Teixeira, J. F., Silva, C., & Moura e Sá, F. (2020). Empirical evidence on the impacts of Bikesharing: A literature review. Transport Reviews, 41(3), 329–351. https://doi.org/10.1080/01441647.2020.1841328 Wang, X., Cheng, Z., Trépanier, M., & Sun, L. (2021). Modeling bike-sharing demand using a regression model with spatially varying coefficients. Journal of Transport Geography, 93, 103059. https://doi.org/10.1016/j.jtrangeo.2021.103059 Wikimedia Foundation. (2023, April 17). Citi Bike. Wikipedia. Retrieved May 1, 2023, from https://en.wikipedia.org/wiki/Citi_Bike Zhu, L., Ali, M., Macioszek, E., Aghaabbasi, M., & Jan, A. (2022). Approaching sustainable bike-sharing development: A systematic review of the influence of built environment features on bike-sharing ridership. Sustainability, 14(10), 5795. https://doi.org/10.3390/su14105795 27 APPENDIX 1 - TWO-WAY SCATTER PLOTS AND HISTOGRAMS FOR EACH VARIABLE y - Total ride time generated at each station (before and after logarithm transformation) x1 - Number of docks at each station x2 - Bike lane length (kilometers) in 500-m buffer 28 x3 - Area of parks in 500-m buffer (kilometers2) x4 - Number of universities or colleges in 500-m buffer x5 - Number of galleries or museums in 500-m buffer 29 x6 - Number of subway entrances in 500-m buffer x7 - Sidewalk cafe length (kilometers) in 500-m buffer x8 - Population density at census block group level (per mile2) 30 After natural logarithm transformation x9 - Poverty rate at census block group level x10 - Per capita income at census block group level 31 After natural logarithm transformation x11 - Median age at census block group level 32 APPENDIX 2 - MLR MODEL IN STATA Multiple Regression Model in STATA: 33 APPENDIX 3 - VIF ANALYSIS Result of VIF of Multiple Regression Model in STATA: 34