Prediction/Causality Tradeoffs and Data Size Issues in Transportation Modeling: The Example of Highway-Safety Analysis
The analysis of transportation data is largely dominated by traditional statistical methods (standard regression-based approaches), advanced statistical methods (such as models that account for unobserved heterogeneity), and data-driven methods (machine learning, neural networks, and so on). In the analysis of highway safety data, these methods have been applied mostly using data from observed crashes, but this can create a problem in uncovering causality since individuals that are inherently riskier than the population as a whole may be over-represented in the data. In addition, when and where individuals choose to drive could affect data analyses that use real-time data since the population of observed drivers could change over time. This issue, the size of the data (which can often influence the analysis method), and the implementation target of the analysis imply that analysts must often tradeoff the predictive capability (dominated by data-driven methods) and the ability to uncover the underlying causal nature of crash-contributing factors (dominated by statistical and econometric methods). However, the selection of the data-analysis method is often made without full consideration of this tradeoff, even though there are potentially important implications for the development of safety countermeasures and policies. This talk provides a discussion of the issues involved in this tradeoff with regard to specific methodological alternatives, and presents researchers with a better understanding of the trade-offs often being inherently made in their analysis.
U.S. Department of Transportation 69A3551747119
Attribution 4.0 International
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as Attribution 4.0 International