MACHINE LEARNING APPLICATIONS IN ECONOMICS A Dissertation Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Qilu Yu May 2022 © 2022 Qilu Yu MACHINE LEARNING APPLICATIONS IN ECONOMICS Qilu Yu Ph.D. Cornell University 2022 There are three chapters in this dissertation. Chapter 1 introduces the machine learning and its advantages and disadvantages in the context of economic research. The machine learning algorithms can complement traditional econometric methods and expand the boundaries of research. Chapter 2 uses a novel deep learning approach, “Temporal Causal Discovery Framework (TCDF)” to uncover the causal graph structure on the European countries’ credit default swaps during the 2010-2013 eurozone crisis. TCDF uses attention-based convolutional neural networks combined with a causal validation step to learn the causal relationships and the time delay between a cause and the occurrence of its effect. This study provides a granular report of the eurozone crisis contagion and spillovers, adding new findings to the repository. The benchmark Granger causality tests are implemented by vector autoregression. The comparison between the two methods suggests the TCDF can filter the “real” cause-effect relationships using causality validation. Chapter 3 extends the famous Jordà-Schularick-Taylor Macrohistory database with a new crisis variable by referencing other crisis datasets. This new dataset contains 1570 observations of 17 countries from 1870 to 2016, of which 322 observations are crisis periods. XGBoost, random forest, and the logit model are applied to this dataset to establish early warning systems for financial crisis. Though XGBoost is a popular tool in applied ML, it has rarely been used in previous studies for early warning systems. This chapter shows that XGBoost outperforms the benchmark logit model, its performance is on a par with random forest. The two machine learning methods can achieve excellent prediction performance evaluated by the AUROC. Shapley values of the variables are calculated from the models to rank the variable importance in terms of predictive power. BIOGRAPHICAL SKETCH Qilu Yu was born in Huangmei, China in 1987. She moved to Wuhan at the age of 9 and finished her pre-college education in Wuhan. She entered Wuhan University in 2005 and earned a dual degree of Bachelor of Arts in Economics and Bachelor of Science in Mathematics in 2009. Later, she started her graduate studies in the Department of Economics at Cornell University. She holds a Level 3 Award of Wines from WSET, and is also an enthusiastic graphic designer. v This document is dedicated to my parents, Manli Qi and Jian Yu. vi ACKNOWLEDGMENTS I am truly grateful for my committee members who have helped and supported me throughout my long journey in pursuing my degree. Being an “outlier” student, I must have brought so much troubles, but they never gave up on me, yet always encouraged me during times of difficulties. My advisor, Prof. Yongmiao Hong, has helped me so much in every way, without Prof. Hong’s guidance and support, I would never have the chance to write this thesis. With his vast knowledge, hardworking spirit and altruistic kindness, Prof. Hong has always been the role model I look up to and learn from. I am also deeply indebted to Prof. Nancy Chau, who had supported me unselfishly during my hardest moments. Prof. Chau’s kind words and actions had given me faith and confidence when I was low and in distress. Prof. Panle Barwick’s genuine support has motivated me enormously. With her guidance and encouragement, I was able to finish writing the thesis. I also want to express my gratitude to Prof. Assaf Razin and Prof. Karl Shell, who had taught me not only economics but also perseverance and determination. I want to thank Ms. Robin Hamlisch at Cornell Health, who had brought hope, and given me the whole-hearted understanding. My special thanks go to Prof. Hong’s wife, Xin Wang. Her kindness and cheering character are like the warm sunshine in Ithaca’s winter, that always inspires me. I also want to thank Prof. Jimmy Yu, Dr. Ming-Yi Chou, Cynthia Du, Prof. Paul Koch and many others who helped me along the way. Last, I want to thank my parents for all their love, patience and sacrifice. vii TABLE OF CONTENTS Chapter 1 Introduction 1.1 Development of machine learning ........................................................................ 1 1.2 Why use machine learning in economics? ............................................................ 7 1.2.1 Big data promotes the use of machine learning ............................................. 8 1.2.2 The two research paradigms ........................................................................ 12 1.2.3 Integrative modeling .................................................................................... 17 1.2.4 Some economic problems are predictive ..................................................... 19 1.2.5 On smaller datasets ...................................................................................... 21 1.3 Limitations of machine learning ......................................................................... 22 1.3.1 Lack of causal interpretation ........................................................................ 23 1.3.2 Inference on coefficients .............................................................................. 24 1.3.3 Overfitting .................................................................................................... 27 1.4 Conclusion .......................................................................................................... 28 Reference .................................................................................................................. 30 Chapter 2 Contagion during the eurozone crisis: An empirical study using convolutional neural networks 2.1 Introduction ......................................................................................................... 34 2.2 Background ......................................................................................................... 39 2.2.1 The root and buildup of the crises ................................................................ 39 2.2.2 The causes and nature of the crises .............................................................. 48 2.3 Related literatures ............................................................................................... 55 2.4 Data description .................................................................................................. 61 2.5 Analytical frameworks ........................................................................................ 68 2.5.1 Temporal Causal Discovery Framework (TCDF) ....................................... 69 2.5.2 Granger causality ......................................................................................... 74 2.6 Results ................................................................................................................. 77 2.6.1 TCDF results ................................................................................................ 77 viii 2.6.2 Granger causality results .............................................................................. 84 2.6.3 Comparison .................................................................................................. 89 2.7 Discussion ........................................................................................................... 90 2.8 Conclusion .......................................................................................................... 94 Appendix A ............................................................................................................... 97 Reference ................................................................................................................ 105 Chapter 3 Developing Early Warning Systems for financial crisis using machine learning methods 3.1 Introduction ....................................................................................................... 110 3.2 Related literatures ............................................................................................. 112 3.3 Data description ................................................................................................ 116 3.4 Methodology ..................................................................................................... 124 3.4.1 XGBoost ..................................................................................................... 125 3.4.2 Random forest ............................................................................................ 127 3.4.3 Logit model ................................................................................................ 128 3.4.4 Performance measure ................................................................................. 129 3.5 Results ............................................................................................................... 130 3.6 Variable significance ........................................................................................ 133 3.7 Conclusion ........................................................................................................ 142 Appendix B ............................................................................................................. 143 Reference ................................................................................................................ 145 ix LIST OF FIGURES Figure 2.1. 10-year government bond yields of the European countries ...................... 41 Figure 2.2. Government debt to GDP ratio .................................................................. 43 Figure 2.3. Bank assets/private debt to GDP ratio ....................................................... 43 Figure 2.4. Current account balances of the European countries ................................. 45 Figure 2.5. Current account balances to GDP ratio of the European countries ............ 46 Figure 2.6. Current account balances of the eurozone and EU .................................... 47 Figure 2.7. Time series plot of CDS spreads in basis points ........................................ 64 Figure 2.8. Architecture of the TCDF method ............................................................. 73 Figure 2.9. Temporal causal graph of 12 eurozone countries in phase 2 ..................... 78 Figure 2.10. Temporal causal graph of all 13 countries in phase 3 .............................. 79 Figure 2.11. Temporal causal graph of 12 eurozone countries in phase 2 and 3 ......... 82 Figure 2.12. Temporal causal graph of all 13 countries in phase 4 .............................. 83 Figure 3.1. Principal Coordinate Analysis for crisis and non-crisis subgroups ......... 123 Figure 3.2. Correlations matrix of the explanatory variables ..................................... 124 Figure 3.3. AUROC for XGBoost with GDP included .............................................. 130 Figure 3.4. AUROC for XGBoost with GDP excluded ............................................. 131 Figure 3.5. AUROC for random forest with GDP included ....................................... 131 Figure 3.6. AUROC for random forest with GDP excluded ...................................... 132 Figure 3.7. AUROC for the logit model with GDP excluded .................................... 132 Figure 3.8. Shapley summary plot for XGBoost with GDP excluded ....................... 135 Figure 3.9. Shapley summary plot for random forest with GDP excluded ................ 136 Figure 3.10. Gini index for the random forest with GDP excluded ........................... 137 Figure 3.11. Shapley summary plot for logit model with GDP excluded .................. 138 Figure B.1. Shapley summary plot for XGBoost with GDP included ....................... 143 Figure B.2. Shapley summary plot for random forest with GDP included ................ 144 x LIST OF TABLES Table 1.1. A schematic for organizing empirical modeling along two dimensions ..... 18 Table 2.1. Country groups in the EU ............................................................................ 38 Table 2.2. Breakdown by sector of holdings of marketable debt ................................. 50 Table 2.3. Related studies on eurozone crisis ............................................................... 59 Table 2.4. Crisis phases and data availability ............................................................... 63 Table 2.5. Augmented Dickey-Fuller test statistics for the log-difference of CDS ..... 67 Table 2.6. Augmented Dickey-Fuller test statistics for the log-level of CDS .............. 67 Table 2.7. Optimal lag length for pairwise Granger causality ..................................... 85 Table 2.8. Granger causality test for phase 2 ............................................................... 86 Table 2.9. Granger causality test for phase 3 ............................................................... 86 Table 2.10. Granger causality results for the periphery eurozone countries ................ 87 Table 2.11. Instantaneous causality Wald test .............................................................. 88 Table 2.12. Optimal lag for Hsiao’s version of Granger causality ............................... 88 Table A.1 Timeline of major events in the European Union ....................................... 97 Table A.2. Descriptive statistics of CDS spreads in basis points ................................. 98 Table A.3. Correlation matrix of the log-changes of the CDS spreads ...................... 101 Table A.4. Country codes for the European countries ............................................... 104 Table 3.1. Explanatory variables summary ................................................................ 121 Table 3.2. Descriptive statistics of the explanatory variables .................................... 122 Table 3.3. AUROCs for the three models .................................................................. 133 Table 3.4. Logit regression results ............................................................................. 139 Table 3.5. Variable importance summary across the three methods .......................... 140 xi CHAPTER 1 INTRODUCTION Leo Breiman, one of the founding fathers of modern machine learning, writes in his farsighted 2001 paper: “There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that a given stochastic data model generates the data. The other uses algorithmic models and treats the data mechanism as unknown” (Breiman, 2001). The ideas conveyed in Breiman’s work show the differences in model-driven approach and data-driven approach. Machine learning algorithms are flexible data-driven models, they differ from the traditional econometric and statistical tools in many aspects. In this chapter, we provide a brief overview of machine learning and explain why it is suitable for some economic questions. In Chapter 2 and Chapter 3, we apply machine learning methods to two economic questions. 1.1 Development of machine learning This section will briefly introduce the concepts and terminology of machine learning (ML), its early development and connection with the econometric toolbox, then we lay out the foundations for the burgeoning development of ML in recent decades. The goal of this part is not to conduct a comprehensive investigation on ML, but to provide the background and intuition of applying ML in economic research, especially on the empirical side. 1 The first task is to understand what is ML? ML is a term that often comes with Artificial Intelligence (AI) and Deep Learning (DL). In fact, these three terms can be put in a hierarchical order, that DL is part of ML, and ML belongs to the broader concept of AI. AI refers to systems or machines that mimic the problem-solving and decision-making capabilities of the human intelligence and can iteratively improve their abilities based on the data (Russell, and Norvig, 2021). Narrowing down from the general ability to emulate the human mind, ML, which is a subcategory of AI, primarily deals with pattern identification and decision making through experience and data. ML is a broad topic that encompasses computer science, statistics, engineering and other fields; hence, its definition is context specific. From the perspectives of an applied economist, Athey (2018) provides a relatively “narrow and practical” definition of ML: “machine learning is a field that develops algorithms designed to be applied to datasets, with the main areas of focus being prediction (regression), classification, and clustering or grouping tasks.” Athey confines this definition to two major branches in ML that are most commonly used in economics and other social sciences: supervised learning and unsupervised learning. DL is a subset of ML, which does not fall directly under the two branches of supervised learning and unsupervised learning. DL can be used with or without supervision. DL enables the computer to learn complicated concepts using a hierarchy of concepts, with each concept defined or learned through previous simpler concepts (Goodfellow et al., 2016). Recent advancements in ML are mostly in the DL area (LeCun et al., 2015). ML tools can be divided into three subcategories: supervised learning, unsupervised learning, and reinforcement learning. The differences of the subcategories lie in the goal of optimization and data type. In supervised learning and unsupervised learning, the goal is to minimize a loss function associated with prediction performance, whereas, in reinforcement learning, the goal is to maximize the reward in a dynamic environment. Between supervised learning and 2 unsupervised learning, the key difference is the data type. Supervised learning uses labeled datasets to train or “supervise” the algorithm. The machine can measure its prediction accuracy using the labeled input-output pairs, thus learn from its experience. Unsupervised learning analyzes and groups unlabeled data by detecting similarities and patterns in the data. The machine needs to discover the hidden structure of the data by itself, hence “unsupervised”. Supervised learning and unsupervised learning are the major tools that are employed in recent economic studies, and are also the focus of this paper. Supervised machine learning tries to identify a function that maps a known input to an unknown output based on example input- output pairs (Russell and Norvig, 2021). A supervised learning algorithm infers a function by analyzing a training data set containing labeled examples, the function can produce an output from unseen data (Mohri et al., 2012). The nature of supervised learning is optimization with regularization using computer algorithms (Hong and Wang, 2021). Supervised learning mainly deals with two types of problems, regression and classification. In regression problems, the output is a continuous variable (for instance, housing prices), common models include linear regression and polynomial regression with shrinkage methods. In classification problems, the output is a categorical variable (for instance, crisis or no crisis), common models include support vector machine, trees-based models, and neural networks. Unsupervised learning can analyze the unlabeled, unclassified data to detect similarities and patterns, thus cluster or group the data. Unlike supervised learning, unsupervised learning methods cannot be directly applied to a regression or a classification problem, because the unlabeled data does not have input-output pairs. Common usages of unsupervised learning are clustering, anomaly detection and dimensionality reduction. Clustering techniques such as the K-means algorithm are used to partition the feature space into subspaces, then the clusters can be used to create new features based on subspace membership (Athey, 2019). Anomaly detection is to examine the data to discover atypical data points (e.g., fraud detection in 3 insurance claims). Dimensionality reduction is an important technique in preprocessing high- dimensional data (e.g., images, texts, video footage, etc.) with a large number of features. It can reduce the number of features to a manageable size without hurting data integrity. For example, principal component analysis (PCA) is a common technique for dimensionality reduction. Unsupervised learning techniques are often used in different stages of complex ML problems, such as reducing data dimension in the preprocessing stage. Some ML algorithms combine both supervised learning and unsupervised learning, for example, the Generative adversarial networks (GANs, Goodfellow et al., 2014) contain two sub-models, a generator model for generating new examples and a discriminator model to classify whether generated examples are real or fake, the discriminator model is unsupervised learning. In such cases, the learning method is called semi-supervised learning. Reinforcement learning is the training of a computer agent to make a sequence of decisions through repeated trial-and-error interactions in a dynamic reward system. The machine is trained without answers or hints. It can only learn from its experience, and its goal is to maximize the total reward. Examples of reinforcement learning include Alpha Go and self-driving cars. This “game simulation” feature of reinforcement learning provides functionality in optimal control problems in economics, game theory, operational research and finance (Charpentier et al., 2020), but it is of less importance to causal inference and prediction problems, which are the emphasis of most economic studies. The above part gives a concise introduction to the concepts and terminology of ML, next we will discuss the early development of ML and its connection with the econometric toolbox. Although ML seems like a new concept, it is not a new trick for researchers. ML has been around since the 1950s. Pioneering ML research was conducted since 1950 (Russell and Norvig, 2021). In 1950, Alan Turing was the first to propose the question “can machines think?” In 1951, Marvin Minsky and Dean Edmonds created the first neural network. In 1959, Arthur Samuel 4 popularized the phrase “machine learning”. Since then, researchers have developed more and more ML algorithms. For example, the nearest neighbor was invented in 1967, recurrent neural network was created in 1982, reinforcement learning was discovered in 1989. In fact, a lot of the well-known ML algorithms are decades old. Econometricians, such as Halbert White, are also pioneers in the development of ML algorithms. White applied neural networks in economic research to predict the IBM stock market prices (White, 1988), and had done extensive research on neural networks. Traditionally, econometricians usually adopt statistical tools to build probabilistic models to describe economic phenomena (Charpentier et al., 2019). The model free ML algorithms are unsuited for this purpose due to its lack of interpretability, therefore ML are rarely used in economic research. In econometrics, there is a line of research that exercises similar methodology to ML, that is nonparametric analysis. Nonparametric analysis uses particular statistical techniques that do not require pre-specified functional forms for objects being estimated, nor require much prior information on the data generating process (DGP, Racine, 2008; Cai and Hong, 2003). If in a regression framework, it is called “nonparametric smoothing”. Nonparametric smoothing includes spline smoothing, kernel smoothing, K-nearest neighbor (KNN) smoothing, and decision trees, to name but a few. Nonparametric analysis is analogous in spirit to ML algorithms, that they both assume the DGP is an unknown stochastic process. They both depend on the data to derive the forms of the estimator or predictor, thus provides little economic interpretability. Nonparametric analysis generally requires a large dataset for precise estimation because there are many unknown parameters, it can suffer from the curse of dimensionality when using local smoothing methods for multivariate data (Hong, 2020). Some ML algorithms (e.g., K-nearest neighbors (KNN), decision trees, neural networks) can be viewed as nonparametric methods with a regularization component based on optimization using computer algorithms. The regularization can afford 5 more explanatory variables, thus “break” the curse of dimensionality (Hong and Wang, 2021). Sarle (1994) has shown that the multilayer perceptrons, which are the most commonly used neural networks, are just nonparametric nonlinear statistical models. In this regard, econometricians are no stranger to the ML concepts and methodology. Just like nonparametric analysis, ML algorithms require large datasets and strong computational power to fulfill their full potential. Up until the 1990s, datasets were small, computers were slow and costly, those impediments hindered the development of ML. Though ML algorithms have been around since the 1960s, their broad applications only start in the early 1990s. This late resurgence is brought by three forces: big data, cheap computational power and algorithmic advances. First, data availability increases unprecedentedly since the early 1990s. With the Internet of Things (IoT), data is created in the forms that are never seen before, such as remote sensing data, high frequency trading data, online sales data, the data generating speed also increased rapidly. In economics, areas such as agricultural economics, environmental economics, and marketing have benefited from abundant data sources to extract information and patterns in all sorts of activities. However, as the traditional statistical tools are mostly developed for smaller datasets, they lack the ability to handle the complex large datasets. The advance in big data demands new methods for data processing and analysis, the ML algorithms can precisely fill this gap. Big data and ML complement each other that ML needs large datasets to train the machines, big data can use ML algorithms for in-depth data mining. We will reserve the more detailed discussion on big data in the next section. Second, computational power has experienced an exponential growth over the past decades1, innovations such as parallel computing increase the speed and efficiency of central processing 1 According to the Moore's Law, which states that the number of transistors on a microchip doubles about every two years. 6 unit (CPU) usage, and lower the cost of computing. Since the early 2000s, the computing power of multi-processor graphic cards (or GPU) is also employed by the ML community (Storm et al., 2020). All these advancements contribute to the rapid growth of ML. Third, the research community of ML from both academia and industry is constantly advancing the frontiers of ML algorithms. The open-source nature of the ML programs (e.g., R, Python, etc.), off-the-shelf ML algorithms and libraries, encourages the broad applications of ML methods (Schmidhuber, 2015). These three forces jointly create a huge pool of ML tools readily available for researchers in all fields of study. The above part summarizes the early development of ML, the next section will discuss the strengths of ML and its comparison with the econometric methods. 1.2. Why use machine learning in economics? ML methods are flexible, rich, data-driven models. This section intends to lay out the five reasons why ML methods are gradually recognized in economic research. For comparison, we will use traditional econometric methods for benchmarking. ML methods and econometric methods have different objectives. ML methods are primarily intended for accuracy in prediction and classification, while the econometric methods are usually developed for deriving the statistical properties of estimators. One important terminology distinction arises from this difference in objectives. In ML, terms such as bias, standard deviation or mean squared error are defined for the prediction process, while in econometrics, those terms are reserved for the coefficient estimators in hypothesis testing, the statistical properties of estimators are usually not obtained in ML (Storm et al., 2020). Despite the differences, one should also acknowledge the similarities between ML methods and econometric methods. They are not competitors, but collaborators. They excel in solving 7 different types of problems, therefore can complement each other to produce high-quality research. The ideas in this sector are drawn from previous works such as Athey and Imbens (2017), Athey (2018), Hong (2021), Hong and Wang (2021) and many others. The reasons provided here are not a comprehensive overview of ML’s merits, but they build the foundation for discussions in chapter 2 and 3 of this paper. 1.2.1 Big data promotes the use of machine learning The first reason is the impact of big data. Traditionally, economists are accustomed to work with data that fits nicely in a spreadsheet, but the big data are often too large and complex for a spreadsheet. The quality and quantity of economic data are expanding rapidly (Einav and Levin, 2014). The use of big data can improve causal inference and provide better prediction of economic phenomena (Harding and Hersh, 2018). Big data is commonly characterized by the 4 V’s: volume, velocity, variety and veracity (Hong and Wang, 2021). Volume comes from the word “big”, which shows the sheer volume. Coming into the information age, humans are creating as much information as once did from the dawn of civilization up until 2003 in every two days (Eric Schmidt, the CEO of Google). We now have data that is several gigabytes in size. In the past, data was usually collected for a specific purpose by a national statistical agency; but as the world becomes increasingly quantified, data are now collected through a vast ecosystem of software and hardware, including phones, Wi-Fi connected appliances, and satellites (Harding and Hersh, 2018). Through collaborations, economists can study many large-scale proprietary and administrative data, for instance, the eBay online audition data and tax records. Velocity means high frequency. In economics, high frequency data (i.e., intraday trading data) has been widely used for prediction problems in the past decades, but the usage is mostly 8 in finance. In other fields such as macroeconomics, the data is usually aggregated and in low frequency, new initiatives are creating more high frequency data in these fields. For example, FRED-MD2 is a large macroeconomic monthly database that includes 127 granular variables such as retail and services sales, it is updated in real time for economic nowcasting. Researchers at MIT and Harvard put together the Billion Prices Project3 to offer high-frequency online retail price, as a proxy for the Consumer Price Index, to “bring big data for macro and international economics”. Variety stands for the numerous types of data. As more information can be digitized, data such as text, audio and video, satellite images, can all be captured and stored in relational databases. Unlike traditional data, those non-traditional data is often unstructured. Lots of studies have used unstructured data in areas such as health economics, agricultural economics, environmental economics. For example, Kleinberg et al. (2015) use ML methods on a dataset that contains 3,305 explanatory variables to study mortality rate of patients after surgery. The 3,305 variables include image data (MRI scans), demographic data (age, geography, etc.), interval data (range of blood pressures), etc. Veracity points at the noise, abnormalities, and inconsistencies in the data. There are two levels of veracity in big data. One is data credibility, meaning the reliability of the data sources and collecting process. The other is the discrepancies in large datasets where the data are sourced differently and are in different structures. Traditional econometric methods have low tolerance towards data veracity, while ML methods can cope well. For example, some ML algorithms, such as the K-nearest neighbors and naïve bayes, are robust to missing values in the dataset. 2 See https://research.stlouisfed.org/econ/mccracken/fred-databases/. FRED-MD and FRED-QD are large macroeconomic databases designed for the empirical analysis of “big data.” 3 See http://www.thebillionpricesproject.com/ 9 There are also ML based imputation methods which can impute missing values without the assumption of normality or specification of a parametric model (Hong and Lynn, 2020). With the 4 V’s, big data has expanded the boundaries of research in many fields of study, such as geology, hydrology, and biology. The enriched data sources enable in-depth analysis and can extend the research topics. But big data is not just expanded larger datasets, it has a number of properties that differ from traditional data, which requires different practice of data storage, management, processing and analyzing. Here, we pick out three properties that are most relevant to ML algorithms. First, big data often comes unstructured. Unlike the traditional data types (time-series, cross section, and panel data) which fit well in spreadsheets, big data has novel date types such as speech, text, images. Those unstructured data types cannot be directly processed using traditional statistical tools to extract useful information. In practice, researchers need to turn those unstructured data into quantifiable indices or metrics based on domain knowledge. However, this data transformation process may cause bias and loss of information. In contrast, ML methods can process unstructured data directly, or use dimensionality reduction techniques to obtain data in lower dimension while preserving data integrity. This automatic data-driven feature extraction is crucial to the avoidance of introduced bias and keeping as much information as possible. For instance, principal component analysis (PCA) is a widely used dimensionality reduction technique. PCA is a ML technique that transforms a large number of features into a smaller set of uncorrelated features called principal components. For complex unstructured data, ML methods have been proven to be of great importance for data preprocessing. Second, big data can have a large number of explanatory variables (𝐾 variables). For example, in marketing, if location data such as zip code is to be used, thousands of dummy variables would be included; in environmental economics, climatological data can store numerous features for one observation. If the number of observations is 𝑁, when 𝐾 is large or 10 𝐾 > 𝑁, the inclusion of a large number of explanatory variables would lead to the curse of dimensionality for traditional econometric methods. For instance, regression or classification become susceptible to error and overfitting as the data space becomes sparse with a large number of explanatory variables. To use big data that has a lot of explanatory variables, the traditional econometric approach focuses on a subset of variables by certain criteria, or uses aggregation based on domain knowledge. Similar to the unstructured data, this process may introduce bias and cause loss of information. ML methods are much more flexible in variable selection and dimensionality reduction through data-driven feature selection. PCA is again a popular tool for this task. Spike-and-slab method in the Bayesian structural time series (bsts) model is another technique to select features that have higher predictive power. Random forest (and other tree-based models) can overcome the curse of dimensionality by building independent decision trees that are only trained on a subset of features. ML methods are more flexible and efficient when dealing with a large number of explanatory variables. Third, big data usually have complex data sources with large heterogeneity, missing data points and noises. Nonlinear relationship and multicollinearity among the explanatory variables are also common. Those issues would render most econometric methods inapplicable, methods such as generalized additive models, nonparametric methods (kernel estimation) can be put to work, but they are subject to a manageable size of explanatory variables. In contrast, ML methods show great tolerance for those data issues. Many ML methods can identify nonlinear and complex linear connections in the data, for example, K-nearest neighbors (KNN) uses the average value of 𝑘 nearest neighbors for the predicted outcome; decision trees follow the tree- like structure to split data at each node; these algorithms are well suited for nonlinear datasets. Multicollinearity is a challenge in traditional multivariate regression and should be avoided for the sake of interpretability. But in ML, when the goal is prediction, multicollinearity does not inhibit ML algorithms’ ability to achieve prediction accuracy. Tree-based models such as 11 random forest and XGBoost are immune to multicollinearity, because the algorithms only pick one of the correlated features for deciding a split at a node. When dealing with noisy data, many noise identification and noise handling techniques in unsupervised learning can be used to filter out the anomalies and improve prediction accuracy (Gupta and Gupta, 2019). Also, many ML algorithms (KNN, naïve bayes, etc.) are robust to missing data points. In a nutshell, with the advent of big data usage in economic research, ML methods quickly gains attention. The flexible ML algorithms are designed for large datasets and can dig into rich data to decipher complex data patterns. Not only can ML methods be used for prediction and classification problems, they can also complement the traditional econometric methods through techniques such as dimensionality reduction and variable selection. This part summarizes the impact of big data on ML. The next section will explain the differences between the two research paradigms and show the merits of data-driven ML methods. 1.2.2 The two research paradigms At the beginning of this chapter, we introduced the two cultures in using statistical modeling to reach conclusions from data (Breiman, 2001), one is model-driven, the other is data-driven. The two cultures are exactly the two scientific research paradigms described in E (2021): the Newtonian paradigm and the Keplerian paradigm. These two paradigms encompass almost all modern scientific research. The Newtonian paradigm, or model-driven approach, discovers the fundamental principles that govern the world and the universe. Researchers distill the relevant factors from countless causes and conditions, build theories and models to explain the observed phenomena, then use data to validate the framework and estimate model parameters. The Keplerian paradigm, or data- 12 driven approach, observes the data to induce and generalize scientific discoveries. Researchers make no assumption of the underlying data generating process, they analyze and approximate the data to develop practical understanding of the world and the universe. In economics, the Newtonian paradigm prevails since Adam Smith explained the theory of the invisible hand in his “Wealth of nations”. Economics is a discipline that aims to study the underlying relationships among economic factors, to discover how the factors interact with each other to bring about an economic phenomenon. The modern economic research in the Newtonian paradigm can be characterized using the following process (see Hong (2007) for a detailed exposition). The first step is to collect data and summarize the empirical stylized facts. For instance, the positive relationship between years of education and lifetime income is a stylized fact. The second step is to build an economic theory or model to fit the stylized fact. Economists usually use mathematical tools to set up models that show how the variables work together to produce the observed outcome. The third step is to test the model using econometric tools, to validate model specification and estimate parameters from data. Finally, the validated model can be used for decision-making, economic forecasting, etc. To sum up, model-driven economic research aims to uncover the economic principles by building theories and models, then use econometric tools to validate the models and quantify the relationships among economic factors from data. An early example of the model-driven approach is the Solow growth model built by Nobel laureate Robert Solow. It uses a Cobb–Douglas production function to show how the long-run economic growth is governed by capital accumulation, population growth, and increases in productivity. Solow (1957) applies this model to the United States’ data and finds that productivity increase brought by the technological progress is the main driver for economic growth. With the advance in mathematical modeling, statistical tools and data availability, new theories and models are constructed with more variables and complex functional forms. For 13 instance, the dynamic stochastic general equilibrium (DSGE) models attempt to explain and predict the co-movements of the macroeconomic time series over the business cycle by applying the micro-foundations of a competitive equilibrium model. A typical DSGE model includes a representative household, a representative firm, the government, and foreign sectors. Each agent has its own set of maximization problems which interdepend on other agents in the model. Shocks are also introduced to inject randomness. The DSGE models are used to make economic predictions about the business cycles and economic growth. The complexity demonstrated by the DSGE models shows the direction of evolvement in the model-driven approach. Though DSGE models enjoy much popularity among macroeconomists, they also received critiques. The many assumptions about a representative household with an infinite lifetime and competitive market are targets of criticism (Stiglitz, 2018). Robert Solow points out that small deviations can be amplified through the complex systems in the DSGE model, which can lead to substantive digression from the true outcomes (Solow, 2010). Some DSGE modelers respond to the critiques by increasing the level of complexity, for example, by introducing heterogeneous agents (Christiano et al., 2018). This trend of increasing complexity is common in many economic models. The Keplerian paradigm, or data-driven approach aims to find the relationships between the explanatory variables and outcome variables of an economic phenomenon, with few or no prior assumptions. The data-driven approach usually has three steps. The first step is to collect data and summarize the empirical stylized facts (the same as the model-driven approach). The second step is to use data-driven models to fit the data. For example, one can use the autoregressive integrated moving average (ARIMA) to model the monthly totals of airline passengers, or train the XGBoost algorithm to predict daily stock market price. The third step is to use the model to make predictions on new data. The model usually lacks the ability to interpret the relationships among variables, and requires a sufficiently large dataset. Since economists are always 14 interested in finding causal relationships, and the data availability is not ideal in the past decades, the data-driven approach has been a less popular subject in economics. Generally, there are two types of data-driven approaches in economics. The first type is statistical or econometric approach. For example, many time series forecasting methods, such as autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA) can predict future values of a time series based on past results without building structural models based on domain knowledge. Nonparametric analysis is another example of model-free statistical tool. The second type is ML methods. ML algorithms can learn from data and experience to identify patterns and relationships between variables with little prior information. To sum up, the model-driven approach aims to identify important variables and estimate the parameters of a model that describes the distribution of a set of variables (Athey and Imbens, 2019), the goal of which is to understand the principles of economics. The data-driven approach puts the emphasis on extracting patterns and information from data to develop practical understanding of the underlying data generating process. If we assume that the economy is governed by certain stochastic data generating processes (this statement itself, is an assumption), one can approximate the real economic activities using mathematical or statistical modeling on a set of variables, and disregard less relevant ones. The simplified models can help to understand the main aspects of the DGP; however, no model can ever provide a completely accurate depiction of the DGP, that “all models are wrong but some models are useful.”4 If the model is a reasonably accurate approximation of the DGP, then model outputs are also approximately true (Hong, 2020). If the model is mis-specified, such as leaving out important factors, including irrelevant ones or choosing inappropriate function forms; the 4 A famous quote by statistician George Box. 15 model outputs can be biased and inconsistent. Model mis-specification can have adverse effects in policy analysis and decision making, this is a major drawback of the model-driven approach (Hong, 2021). Also, we are often very willing to make strong assumptions when constructing a model, but the strong assumptions cannot be directly tested, and they might just be the source of bias. If the model assumptions are to be relaxed, like the case with DSGE models, where modelers introduce heterogeneous agents to relax the representative agent assumption, model complexity increases. If we introduce nonlinearities, interactions or heterogeneity to the model, the model will become complex and susceptible to bias, model interpretation will become more difficult as coefficients cannot be interpreted directly. We often accuse the flexible and complex ML methods for their lack of interpretability, however, this tradeoff between flexibility and interpretability is not exclusive to ML, it is also a tradeoff for the structural models (Storm et al., 2020). Simplified economic structural models cannot capture every aspect of the economy, and building complex models “would make them unwieldy for either theoretical insight or applied analysis” (Low and Meghir, 2017). Another drawback of the model-driven approach is model uncertainty. In practice, researchers would consider many model specifications and perform various specification tests before choosing a preferred model that can produce preferable results. The standard practice is to present estimates of the preferred specification with several other specifications with different functional form, controls or instrument variable. This is to show that the estimate of the parameter of interest is not very sensitive to the choice of the preferred specification (Athey and Imbens, 2017). The fact that an estimated parameter varies with different models represents a simple form of model uncertainty (Varian, 2014). This approach of picking the “best” model out of many model specifications cannot deplete all possible models and functional forms, therefore the results can suffer from model uncertainty (Varian, 2014). In contrast, ML 16 algorithms can afford highly flexible functional forms and use data-driven function selection to avoid this problem. In summary, model-driven approach and data-driven approach differ in their objectives and methodologies, each approach has its advantages and limitations. Model-driven approach is not immune to bias and errors, and is not always superior to data-driven approach. Therefore, it is important that we recognize the virtue and strength of the data-driven approach (especially the ML methods). 1.2.3 Integrative modeling The last section shows the dichotomy of the model-driven approach and the data-driven approach, but there are other classifications as well. Hofman et al. (2021) point out that empirical modeling can be organized along two dimensions, representing different levels of emphasis placed on explanation and prediction (Table 1.1). Descriptive modeling (quadrant 1) refers to activities that define, measure, collect, and describe relationships between variables of interest. Explanatory modeling (quadrant 2) refers to activities to identify and estimate causal effects, not focusing directly on prediction. A lot of model-driven economic research belongs to this quadrant. Predictive modeling (quadrant 3) refers to activities in predicting the outcome of interest directly but do not explicitly deal with the identification of causal effects. Most ML methods belong to this quadrant. Integrative modeling (quadrant 4) combines the explanatory properties of quadrant 2 and the predictive properties of quadrant 3, it refers to activities that attempt to predict unseen outcomes in terms of causal relationships, the focus is on generalizing “out of distribution” outcome that might change naturally, or through human intervention such as controlled experiment (Hofman et al., 2021). 17 Table 1.1 A schematic for organizing empirical modeling along two dimensions, from Hofman et al. (2021) No intervention or Under interventions or distributional changes distributional changes Focus on specific Quadrant 1: Descriptive modeling Quadrant 2: Explanatory modeling features or effects Describe situations in the past or Estimate effects of changing a present (but neither causal nor situation (but many effects are predictive) small) Focus on predicting Quadrant 3: Predictive modeling Quadrant 4: Integrative modeling outcomes Forecast outcomes for similar Predict outcomes and estimate situations in the future (but can break effects in as yet unseen situations under changes) The concept of integrative modeling admits to the tradeoff between explanatory insight and prediction accuracy, but it also recognizes this tradeoff as “an exciting opportunity for new and impactful research” (Hofman et al., 2021). For instance, in estimating average treatment effects, economists have been using semi-parametric methods without making parametric assumptions about how explanatory variables affect the outcomes in the 1990s (Athey, 2018). Since the 2010s, economists introduce ML methods into this framework. Belloni et al. (2014) uses a double-selection method based on LASSO in high-dimensional data with many explanatory variables, to allow inference about economically interesting model parameters. Athey et al. (2018) propose to use “residual balancing” that takes the average of the efficient score as the measure of the treatment effect, which is calculated from the inverse of estimated propensity score and the conditional mean of the outcome variable. The propensity score weights are obtained through quadratic programming and the conditional means are estimated using LASSO. Their main result shows that this procedure is efficient and can achieve the same rate of convergence as a well specified structural model. The above examples are situations when we are interested in estimating a parameter of causal interest, but the tools we use to recover that parameter may contain a prediction component (see 18 Mullainathan and Spiess (2017) for a review). Another approach in integrative modeling is akin to a “coordinate ascent” algorithm, where researchers iteratively alternate between predictive and explanatory models during the experiment stage and data analysis stage (Hofman et al., 2021). In experiments, researchers can use ML algorithms to predict participants’ decisions and identify features of importance, the results can then be used in new rounds of experiments to test the predictions, and so on. In conclusion, explanatory modeling is usually used to identify causal relationships among variable, such as quantifying policy impact and estimating counterfactual outcomes, predictive modeling (mainly the ML methods) can complement the explanatory models through innovative methodology for predictive accuracy, combining those two modeling strategies can expand the horizons of economic research. 1.2.4 Some economic problems are predictive Conventional econometric methods are not designed for prediction problems, these model- driven methods put their emphasis on statistical inference on parameters of causal interest, rather than predictive performance. Though a lot of the economic problems that interest the researchers and other stakeholders are causal, but not all important economic problems are causal, some are predictive in nature. For predictive problems, ML methods are perhaps second to none. The primary focus of supervised learning is on accurate prediction, especially the out-of-sample predictive fit. Unsupervised learning can help with the predictive task through data-driven feature selection and dimensionality reduction techniques. Shrinkage methods, tree-based models, and neural networks are the prevailing ML tools researchers use for predictive regression and classification problems. 19 Kleinberg et al. (2015) argue that many important policy problems are essentially prediction problems, that “causal inference is not central, or even necessary.” They study the impact of a resource allocation problem in joint replacement surgery for population with Medicare in the U.S., the benefit of surgery is an improved quality of life, but the cost is hard to quantify. The post-surgical recovery can be painful and takes several months afterwards. A key challenge is the possible mortality for individual patient after surgery. The policy decision of this study is whether the surgery on the predictably riskiest patients will be futile. The authors use 65,395 observations to fit the models using LASSO regression, then measure the payoff of the surgeries on the remaining 32,695 observations. Results show that, for each year, in the whole Medicare population, replacing the riskiest 10 percent patients with lower-risk patients according to ML predictions would avert 10,512 futile surgeries and reallocate 158 million dollars. This replacement can avoid operating on 38,533 riskiest patients who would have died within one year of surgery. This study shows how improved prediction using ML can have significant policy impacts, without identifying the causal factors. There are many resource allocation problems like this in policy problems (see Athey (2017) for a review). For instance, how to predict the locations that might need food-safety inspections? And how to send firefighters more efficiently? ML methods can also help with pure predictive problems. Early warning system (EWS) is a predictive tool for financial crises prediction or other risk detection, there is an increasing literature on using ML methods to construct EWS for financial crisis. Real-time nowcasting of key economic indicators (mostly in macroeconomics) is another example where ML algorithm can make a difference. In such predictive problems, the dataset under study is often large and contains a lot of unstructured variables, flexible ML methods can help researchers to better generate policy impacts and economic insights than traditional econometric methods. 20 1.2.5 On smaller datasets In the beginning of this paper, we quote Leo Breiman that algorithmic models can be used not only on large datasets but also on small datasets, as a more accurate and informative alternative (Breiman, 2001). In the above sections, we have established the foundations for why ML methods are ideal for big datasets, and why ML methods perform well in larger datasets. This section will introduce some ML techniques that can cope with a limited sample size. In economics, the majority of data in areas such as macroeconomic, is of low frequency (e.g., annual GDP) and a limited size; decades-old historical data remains scarce; and there are special datasets that only have few observations. From the frequency aspect, most weekly or daily data are sufficiently large for ML algorithms, yearly or monthly data can be a stretch. We often tend to think that ML methods are only reserved for big datasets, but researchers (mainly outside of economics) have shown that ML methods can be used on smaller datasets as well. Data augmentation techniques can be used to increase the volume and there are methodologies for special treatment. Bootstrapping is a common statistical technique for data augmentation. In ML, a related technique is bagging, which is essentially bootstrapping aggregation. In bagging, several machines are trained on random subsets of the original dataset, then the results of the machines are aggregated to get the final prediction. The random subsets are drawn with replacement (just like bootstrapping). Bagging is a common practice for tree-based models (it is a built-in technique for random forest), it can improve the prediction accuracy by introducing randomness into the structure, then make an ensemble out of the predictions. Over-sampling is another ML technique for small datasets or imbalanced datasets, it can create synthetic data from the original data. For instance, synthetic minority over-sampling technique (SMOTE) is commonly used to increase data size, especially for minority categories in imbalanced datasets; generative 21 adversarial networks (GANs) can learn patterns from the original datasets and create new data that resemble the original data. Other than data augmentation techniques, ML algorithms are adapting to smaller datasets as well. For instance, in neural networks, Olson et al. (2018) decompose the final layer of the fitted neural networks into ensembles of low-bias sub-networks, those sub-networks are relatively uncorrelated, therefore enabling the implicit regularization mechanism to avoid overfitting (a similar strategy as random forest which is an ensemble of low-bias, uncorrelated trees). They apply this approach to small datasets and show that deep neural networks can achieve superior prediction accuracy with minimal tuning. There are many applications of such ML algorithms outside of economics. For instance, Zhang and Ling (2018) use the degree of freedom (DoF) of the model to mitigate the issue of small data size in material science, Caiafa et al. (2021) review a collection of fifteen papers that use novel ML methods on noisy, incomplete or small datasets on different science subjects. In economics, most studies that use ML methods analyze large datasets, there are few ML applications on smaller datasets. We argue that data size should not be an obstacle to apply ML methods. In chapter 3, we will use a relatively small dataset to establish an early warning system for financial crisis. 1.3 Limitations of machine learning In section 1.2, we have shown the five reasons why ML methods can contribute to economic research. In this sector, we will introduce three limitations of ML, as well as recent developments to overcome them. ML excels at predictive tasks but it is not designed to identify the reasons, the most commonly known drawback is its lack of interpretability. Also, ML 22 models cannot provide coefficient estimates for causal inference. Overfitting is another challenge. 1.3.1 Lack of causal interpretation The lack of interpretation stems from the “black-box” nature of ML algorithms, the relationships learned by the ML algorithms are complex and unreadable, this makes the ML methods untrustworthy for causal questions. For example, ML can be applied in assessing credit card applications, but it cannot provide sound reasons for its rejections. The basic framework of ML is to use historical data to discover patterns using computer algorithms, then use the learned knowledge to predict the outcome for unseen data. Patterns can show correlation, but correlation does not imply causation. On the flip side, the fundamental problem of causal inference is that the real outcome and the counterfactual outcome cannot be observed at the same time. Good identification strategy is required for causal analysis. Randomized controlled experiments are ideal, but they are costly and take time. In quasi-experimental settings, methods like difference in difference and regression discontinuity have their own requirement for identification. Under the experimental settings, the causal models can use past data and control variables to “predict” the counterfactual outcome of the treatment group. In this spirit, ML methods can be employed for the prediction part to get the estimated treatment effect. Several off-the-shell ML algorithms are built for this purpose. For example, researchers at Google develop the CausalImpact algorithm based on Bayesian structural time series (bsts) predictions to study the treatment effect of an intervention (Brodersen et al., 2015), the Facebook Prophet is another tool for time series forecasting and treatment effect estimation. 23 To overcome the lack of interpretability, recent development in the intersection of economics and ML has infused causal inference into developing new ML algorithms. For example, Wager and Athey (2018) develop a causal random forest to estimate heterogeneous treatment effects which extend the original random forest algorithm to allow statistical inference. Farrell et al. (2021) construct novel nonasymptotic high probability bounds for deep feedforward neural networks which allow semi-parametric estimation and statistical inference. Several survey papers have documented this recent development, such as Ghoddusi et al. (2019), Moraffah et al. (2020), and Storm et al. (2020). Outside of economics, interdisciplinary work between ML and causal inference has developed a subfield called causal structure learning or causal discovery. For example, Nauta et al. (2019) use an attention-based convolutional neural networks called Temporal causal discovery framework (TCDF) to detect the causal relationships in observational time series data. They construct temporal causal graphs which can determine the time lag between a cause and the occurrence of its effect. We will use TCDF in chapter 2 to uncover the cause-effect relationships during the eurozone sovereign crises from 2010 to 2013. In the temporal causal discovery literature, many tools are being developed and used in applied works (see Glymour et al. (2019) for review). We will discuss this topic in more details in chapter 2. Though most ML algorithms aim at prediction accuracy and efficiency, we can incorporate more causal elements into the current ML framework. The goal is not to understand the “black- box” procedures, but to use novel approaches to obtain causal structure from the framework. 1.3.2 Inference on coefficients Another criticism points to ML methods’ inability to conduct inference on estimated coefficients. In nonlinear models (e.g., random forest, neural networks) there is no coefficient 24 that can be estimated. In linear ML models (e.g., LASSO and ridge regression), the estimated coefficients do not have the same statistical property as in traditional methods. For example, Mullainathan and Spiess (2017) report that LASSO regression can produce familiar coefficient output like traditional linear regressions, but the coefficients are biased toward zero. LASSO cannot form standard errors and confidence intervals about those coefficients neither. This is indeed a fundamental problem of ML in the light of discovering quantifiable causal relationships. Some workaround attempts have been made to establish confidence intervals with ML algorithms. Studies mentioned in previous sections have sought to modify the algorithms to construct valid confidence intervals. Wager and Athey (2018) introduce their causal random forest, which is an average of causal trees; each causal tree is trained with a different subsample and variable space (similar to the nearest neighbor matching). They build asymptotic normality results for the treatment effect in random forest, and propose a consistent estimator for the variance, thus the confidence intervals can be computed. Brodersen et al. (2015) develop the CausalImpact algorithm based on Bayesian structural time series (bsts) to construct synthetic control and calculate the treatment effect, then use a Markov Chain Monte Carlo algorithm for posterior inference to report the pointwise 95% posterior predictive intervals of the treatment effect, the time series of pointwise intervals provides further information of the temporal evolution of the intervention. Those attempts are still far from ideal for inference on estimated coefficients like the traditional statistical models. In particular, ML lacks the capability to conduct the null- hypothesis significance testing (NHST), but this inability to conduct NHST is not necessarily a shortcoming. Though the NHST is a widely used and well-accepted tool, it is not definitively powerful because the 𝑝-values and 𝑡-tests are designed to show that a theory is not inconsistent with the data, therefore the theory can be used as an explanatory tool. NHST is good at “disproving”, but not “proving” (Hofman et al., 2021). For large enough datasets, trivially small 25 effects can be declared statistically "significant", because the large sample size 𝑁 can reduce the standard error closely to zero, which will falsely push up the 𝑡 statistics and lower the 𝑝- values closely to 0%. In other words, statistical significance is not the same as economic significance for large enough data (Hong and Wang, 2021). Since ML methods are mostly applied with large datasets, it might not be as important to obtain the statistical properties of the estimates, and we should always try to develop new methodology to quantify and test causal relationships in ML. Current ML methods can provide a measure to evaluate the importance of explanatory variables (features) for the trained models. This is called feature importance or variable importance. ML algorithms can calculate and rank the feature importance for each explanatory variable, to show the level of importance of a certain explanatory variable in prediction. Feature importance can also help to determine if the trained model is consistent with domain knowledge. Several approaches exist to calculate feature importance. One approach is feature permutation importance. It evaluates how the model score (or prediction error) changes when an explanatory variable is not included in the prediction. Any scoring metric can be used for measurement of the model score. If the model score shows no change when one explanatory variable is not included, it suggests that the model does not rely on this variable. In contrast, a large model score change indicates a large impact, and the sign of the change also matters (depending on the specific scoring metric). Shapley value is an extension of the feature permutation importance based on the game-theory attribution method; it takes the average of all the model score changes of a variable to all possible combinations (coalitions) with other explanatory variables. Another approach is the impurity-based feature importance, a technique mainly used in tree- based models. Impurity is measured by the splitting criterion using a loss function at each node of the trees. The loss function can be of different forms, such as Gini impurity and entropy. With 26 Gini impurity, a Gini index can be calculated to reflect the average cumulative decrease attributed to a certain variable, then the Gini indices are ranked for feature importance. A third approach is to use deep learning algorithms which can detect complex dependence among the variables, there are techniques such as greedy search and averaged input gradient to obtain feature importance from the dependences (Wojtas and Chen, 2020). 1.2.3 Overfitting The third issue of ML is overfitting. A ML algorithm is trained in a sub-sample of the data aiming to achieve prediction accuracy, but while doing so, it can pick up the noise and irrelevant information in the sub-sample. Overfitting is the situation when the model learns the sub-sample too well and relies on the irrelevant information for prediction, it can fail to generate to unseen data. ML algorithms can learn very specific (e.g., nonlinear) relationships in the sub-sample, thus they are susceptible to overfitting. In ML, limiting overfitting is very important. ML algorithms mainly use regularization and train-validation-test split to avoid overfitting (Storm et al., 2020). The idea of regularization is to discourage complex models by putting a penalty when the model becomes too complex, therefore preventing the model from fitting very specific patterns in the sub-sample. For instance, in linear regression, LASSO regression and Ridge regression are the two prevailing approaches. In neural networks, one can decompose the final layer of the fitted neural networks into uncorrelated sub-networks to enabling the implicit regularization mechanism (Olson et al., 2018). Train-validation-test split is to split the data into three sub-samples, one for training, one for validation, and one for testing. The model is trained on the training set, then its performance is validated and optimized using the validation set, and the model performance is finally evaluated 27 using the test set. When datasets are large, the train-validation-test split approach can be easily implemented. On smaller datasets, an alternative is to use 𝑘-fold cross validation. 1.4 Conclusion The above sections discuss the strength and weakness of ML methods, with an emphasis on the comparison with econometric methods. ML methods and econometric methods have different objectives, but we argue that ML can enrich the economists’ toolbox by complementing existing econometric methods and also bringing in novel insights. ML can increase the flexibility in data options, variables and functional form selection; it can bypass some limitations of the traditional econometric methods; combining ML methods with econometric methods can expand the boundaries of economic research. Though ML methods require little domain knowledge to train a machine, economic theories can help researchers to choose the model family that can best describe the relations among variables. Economic theories are also essential in validating and explaining the findings of a trained model, hence can help to make the model scientifically interpretable. Just as Leo Breiman “predicted” (Breiman, 2001), there is a shift from traditional statistical methods to ML in a wide range of science subjects. Guido Imbens, the 2021 Nobel laureate in economics, writes about ML methods that economists should know about and suggests that ML methods should be included in the core graduate econometrics sequences (Athey and Imbens, 2019). The authors conclude that “being familiar with these methods will allow researchers to do more sophisticated empirical work, and to communicate more effectively with researchers in other fields”. Following this trend, this paper uses data-driven ML methods to study two empirical questions. Chapter 2 will look at a novel deep learning framework to uncover the causal patterns 28 in credit default swaps (CDS) during the eurozone crisis in 2010. Chapter 3 will use two ML methods to establish the early warning system for financial crisis. Our datasets are structured medium size data, therefore traditional econometric methods can be applied as well, thus allowing the comparison between ML methods and econometric methods. 29 REFERENCES Athey, Susan (2015): Machine learning and causal inference for policy evaluation. In KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Athey, Susan (2017): Beyond prediction: using big data for policy problems. In Science 355 (6324), pp. 483–485. Athey, Susan (2018): The impact of machine learning on economics. Athey, Susan; Imbens, Guido (2017): The state of applied econometrics: Causality and policy evaluation. In Journal of Economic Perspectives 31 (2), pp. 3–32. Athey, Susan; Imbens, Guido (2019): Machine learning methods economists should know about. Athey, Susan; Imbens, Guido; Wager, Stefan (2018): Approximate residual balancing: debiased inference of average treatment effects in high dimensions. In J. R. Stat. Soc. B 80 (4), pp. 597–623. Belloni, Alexandre; Chernozhukov, Victor; Hansen, Christian (2014): High-dimensional methods and inference on structural and treatment effects. In Journal of Economic Perspectives 28 (2), pp. 29–50. Breiman, Leo (2001): Statistical modeling: The two cultures. In Statistical science 16 (3), pp. 199–231. Cai, Zongwu; Hong, Yongmiao (2003): Nonparametric methods in continuous-time finance: A selective review Caiafa, Cesar F.; Sun, Zhe; Tanaka, Toshihisa; Marti-Puig, Pere; Solé-Casals, Jordi (2021): Machine learning methods with noisy, incomplete or small datasets. In Applied Sciences 11 (9), p. 4132. Charpentier, Arthur; Elie, Romuald; Remlinger, Carl (2020): Reinforcement learning in economics and finance. Charpentier, Arthur; Flachaire, Emmanuel; Ly, Antoine (2019): Econometrics and machine learning. In Ecostat 505 (505d), pp. 147–169. Christiano, Lawrence J.; Eichenbaum, Martin S.; Trabandt, Mathias (2018): On DSGE models. In Journal of Economic Perspectives 32 (3), pp. 113–140. E, Weinan (2021): The dawning of a new era in applied mathematics. In Notices Amer. Math. Soc. 68 (04), p. 1. 30 Efron, Bradley; Hastie, Trevor (2016): Computer age statistical inference. Algorithms, evidence, and data science; Cambridge University Press. Einav, Liran; Levin, Jonathan (2014): Economics in the age of big data. In Science 346 (6210), p. 1243089. Farrell, Max H.; Liang, Tengyuan; Misra, Sanjog (2021): Deep neural networks for estimation and inference. In ECTA 89 (1), pp. 181–213. Ghoddusi, Hamed; Creamer, Germán G.; Rafizadeh, Nima (2019): Machine learning in energy economics and finance: A review. In Energy Economics 81, pp. 709–727. Glymour, Clark; Zhang, Kun; Spirtes, Peter (2019): Review of causal discovery methods based on graphical models. In Front. Genet. 10, p. 524. Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016): Deep learning. Cambridge, Massachusetts: The MIT Press. Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil et al. (2014): Generative adversarial nets. In Advances in Neural Information Processing Systems 27. Gu, Shihao; Kelly, Bryan; Xiu, Dacheng (2020): Empirical asset pricing via machine learning. In The Review of Financial Studies 33 (5), pp. 2223–2273. Gupta, Shivani; Gupta, Atul (2019): Dealing with noise problem in machine learning data- sets: A systematic review. In Procedia Computer Science 161, pp. 466–474. Harding, Matthew; Hersh, Jonathan (2018): Big data in economics. In IZA world of labor. Hastie, Trevor.; Friedman, Jerome H.; Tibshirani, Robert. (2009): The elements of statistical learning. Data mining, inference, and prediction. Cham: Springer International Publishing. Hofman, Jake M.; Watts, Duncan J.; Athey, Susan; Garip, Filiz; Griffiths, Thomas L.; Kleinberg, Jon et al. (2021): Integrating explanation and prediction in computational social science. In Nature 595 (7866), pp. 181–188. Hong, Shangzhi; Lynn, Henry S. (2020): Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. In BMC Medical Research Methodology 2020 (1), p. 199. Hong, Yongmiao (2007): The status, roles and limitations of econometrics. In Economic Research Journal (5), pp. 139–153. Hong, Yongmiao (2020): Foundations of modern econometrics. A unified approach. New Jersey: World Scientific. Hong, Yongmiao (2021): Understanding modern econometrics. In China Journal of Econometrics 1 (2), p. 266. 31 Hong, Yongmiao; Wang, Shouyang (2021): Big data, machine learning and statistics: challenges and opportunities. In China Journal of Econometrics 1 (1), p. 17. Kay H. Brodersen; Fabian Gallusser; Jim Koehler; Nicolas Remy; Steven L. Scott (2015): Inferring causal impact using Bayesian structural time-series models. In The annals of applied statistics 9 (1), pp. 247–274. Kleinberg, Jon; Ludwig, Jens; Mullainathan, Sendhil; Obermeyer, Ziad (2015): Prediction policy problems. In The American economic review 105 (5), pp. 491–495. LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015): Deep learning. In Nature 521 (7553), pp. 436–444. Low, Hamish; Meghir, Costas (2017): The use of structural models in econometrics. In Journal of Economic Perspectives 31 (2), pp. 33–58. Marsland, Stephen (2015): Machine learning. An algorithmic perspective. Second edition. Boca Raton, FL, London, CRC Press. Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012): Foundations of machine learning. The MIT Press. Molnar, Christoph (2022): Interpretable machine learning. A guide for making black box models explainable. Second edition. Moraffah, Raha; Karami, Mansooreh; Guo, Ruocheng; Raglin, Adrienne; Liu, Huan (2020): Causal interpretability for machine learning - problems, methods and evaluation. In SIGKDD Exploring. Newsletters. 22 (1), pp. 18–33. Mullainathan, Sendhil; Spiess, Jann (2017): Machine learning: An applied econometric approach. In Journal of Economic Perspectives 31 (2), pp. 87–106. Olson, Matthew; Wyner, Abraham; Berk, Richard (2018): Modern neural networks generalize on small data Sets. In Advances in Neural Information Processing Systems 31. Racine, Jeffrey S. (2008): Nonparametric econometrics: A Primer. In FNT in Econometrics 3 (1), pp. 1–88. Russell, Stuart J.; Norvig, Peter (2021): Artificial intelligence. A modern approach. Fourth edition. Hoboken: Pearson. Sarle, Warren S. (Ed.) (1994): Neural networks and statistical models. Proceedings of the Nineteenth Annual SAS Users Group International Conference. Schmidhuber, Jürgen (2015): Deep learning in neural networks: an overview. In Neural networks: the official journal of the International Neural Network Society 61, pp. 85–117. Solow, Robert M. (2010): Prepared statement on "Building a science of economics for the real world". 32 Solow, Robert M. (1957): Technical change and the aggregate production function. In The Review of Economics and Statistics 39 (3), p. 312. Stetter, Christian; Mennig, Philipp; Sauer, Johannes (2022): Using machine learning to identify heterogeneous impacts of Agri-environment schemes in the EU: A Case Study. In European Review of Agricultural Economics. Stiglitz, Joseph E. (2018): Where modern macroeconomics went wrong. In Oxford Review of Economic Policy 34 (1-2), pp. 70–106. Storm, Hugo; Baylis, Kathy; Heckelei, Thomas (2020): Machine learning in agricultural and applied economics. In European Review of Agricultural Economics 47 (3), pp. 849–892. Varian, Hal R. (2014): Big data: new tricks for econometrics. In Journal of Economic Perspectives 28 (2), pp. 3–28. Varian, Hal R. (2018).: Artificial intelligence, economics, and industrial organization. Wager, Stefan; Athey, Susan (2018): Estimation and inference of heterogeneous treatment effects using Random forests. In Journal of the American Statistical Association 113 (523), pp. 1228–1242. White, Halbert (1988): Economic prediction using neural networks: the case of IBM daily stock returns. In: IEEE 1988 International conference on neural networks, 451-458 vol.2. Wojtas, Maksymilian; Chen, Ke (2020): Feature importance ranking for deep learning. In Advances in Neural Information Processing Systems 33, pp. 5105–5114. Zhang, Ying; Ling, Chen (2018): A strategy to apply machine learning to small datasets in materials science. In NPJ Computatoinal Materials 4 (1), pp. 1–8. 33 CHAPTER 2 CONTAGION AND SPILLOVERS DURING THE EUROZONE CRISIS: AN EMPIRICAL STUDY USING CONVOLUTIONAL NEURAL NETWORKS 2.1 Introduction The eurozone crisis that started in Greece since 2010 had the most substantial negative impact on Europe’s politics and economies ever since the World War II. The crisis quickly spread to Ireland and Portugal, then Spain, Italy and Cyprus. Five out of the 6 countries had to seek for external bailout because of their sovereigns’ unsustainable fiscal condition. Italy, the only exception, was close to the brink and had the fourth highest public debt in the world in 2013, at 133 percent of its GDP5. During the turmoil, not only the periphery countries were caught in fire, core countries such as Austria, Belgium and France were also enduring high government bond yields and volatility. The eurozone crisis resulted in significant economic and social costs to the whole European Union. Each of the aforementioned 6 countries had its unique crisis features. For Greece, the root of the crisis was the persistent fiscal irresponsibility and lack of reform in its government. For Ireland and Spain, it started with the housing bubble, which led to the banking crises, and subsequently their governments went down while trying to save the banks. Portugal and Italy struggled with a decade of loss of productivity and competitiveness, ending up with large public debts and current account deficits. Cyprus, as a small economy, endured tremendous loss 5 IMF, World Economic Outlook (April 2016). 34 because the Cypriot banks hold too much Greek assets and failed to withstand the waves of the Greek sovereign crisis6. This chapter attempts to study the contagion and spillovers during the 2010–2013 eurozone crises using a novel machine learning (ML) approach. Why is it important to study the contagion and spillovers in the eurozone? The question can be answered threefold. First, it is the first time in modern history to witness sovereign crises breaking out consecutively in so many countries within a monetary union. Under a single currency, member states of the eurozone are unable to shield the economies using independent monetary policies like a standalone country. It is important to investigate the contagion and spillovers among different economies to facilitate structural reform and policy implementation in the eurozone and the broader European Union. Second, the eurozone crisis has great impact on the regional economic and monetary union in Europe, which also affects the globalization and economic integration of other regions. The depth of regional cooperation should be examined carefully to avoid possible channels of crisis transmission. Third, if contagion and spillovers exist, understanding the source of such linkage among crisis countries can provide valuable lessons for crisis management and macro- prudential regulation. To study the contagion and spillovers during the eurozone crisis, there are four critical questions that need to be answered. First, are there contagion and spillovers during the eurozone crisis? Second, if the contagion and spillovers exist, is Greece the origin of all crises in the eurozone and how do the crisis countries affect each other? Third, are there contagion and spillovers from the crisis countries to the non-crisis countries inside the eurozone? Fourth, are there contagion and spillovers from the eurozone countries to other countries in the European Union? These four questions are the core research topics in the existing literature on eurozone sovereign crises, but the answers differ to a great extent. Some studies find no evidence that 6 See Baldwin and Giavazzi (2015) for a broad overview of the eurozone crisis. 35 Greece is the main crisis contributor (e.g., Bhanot et al., 2012; Caporin et al., 2018), while others attribute the regional disturbance fully to Greece (e.g., Arghyrou and Kontonikas, 2012; Missio and Watzka, 2011). Three factors may contribute to the divergence in the empirical findings. The first is data source. Some studies use only credit default swaps (CDS) data, some use CDS and country specific variables, and others use CDS and sovereign bond market data. The second is the different modeling strategies. The commonly used models include the vector autoregression and dynamic factor model. The third is the different definitions of contagion and spillovers used by the researchers. Understanding the difference in the definitions of contagion and spillovers is fundamental to this line of work. Almost every study on contagion and spillovers starts with a definition (so does this one), and the definitions are usually defined to fit the specific modeling strategy of the study, contributing to the differences in empirical results. Existing literature has multiple meanings for contagion and spillovers. Here are three definitions from three seminal papers. Masson (1998) denotes contagion in the context of multiple equilibria, “changes in expectations that are not related to changes in a country's macroeconomic fundamentals” can lead to crisis in a country, the contagion is operated through the change of expectations. Spillovers refer to the interdependence among countries, especially the linkage of macroeconomic fundamentals. Dornbusch (2000) defines contagion as “spread of market disturbances—mostly on the downside—from one country to the other, a process observed through co-movements in exchange rates, stock prices, sovereign spreads, and capital flows.” Spillovers means the co- movements among markets; it may be a result from contagion or from normal interdependence. Kaminsky et al. (2003) define contagion “as an episode in which there are significant immediate effects in a number of countries following an event”, spillovers are the gradual cases that have “protracted effects that may cumulatively have major economic consequences.” These authors’ definitions vary and there is hardly a consensus (see Rigobon (2019) for a review). 36 Specifically, for the eurozone crisis literature, Broto and Pérez-Quirós (2015) document the different definitions and categorize them into three groups. The first group identifies the increased bivariate correlation between country pairs during the stress periods as contagion. The second group, after controlling for common fundamentals, identifies large shock transmission detection during the stress period as contagion. The third group usually identifies lagged volatility spillovers or Granger causality as contagion. Broto and Pérez-Quirós (2015) specify the difference between contagion and spillovers such that “spillovers, which is the lagged transmission of a shock, whereas contagion is simultaneous in nature.” This difference between contagion and spillovers is consistent with Kaminsky et al. (2003) and many others (Mink and Haan, 2013; Bruyckere et al., 2013; Claeys and Vašíček, 2014). In this study, our definition is in line with Kaminsky et al. (2003) and Broto and Pérez-Quirós (2015), the term “contagion and spillovers” includes both simultaneous contagion and lagged spillovers, observed through co-movements in the CDS market, characterized as pairwise cause- effect relationships. While this definition is consistent with previous literature, it is also drawn from our analytical model, which is a deep learning method called Temporal Causal Discovery Framework (TCDF). The results of the TCDF provide answers to the four critical questions mentioned earlier. First, there are contagion and spillovers during the eurozone crisis. Second, Greece is by no means the black sheep of all the eurozone sovereign crises. Greece has a great impact on Cyprus, causing its banking and sovereign crisis, but Greece is not the cause for Ireland, Italy, Portugal and Spain. There is a chain of cause and effect from Ireland to Italy, then to Spain, and there is a confounding factor between Ireland and Portugal. Third, there are spillovers from the crisis countries to the non-crisis countries. Fourth, there is evidence of spillovers from the crisis countries to European Union member state outside of the eurozone. This study contributes to the existing literature in three ways. First, to the author’s knowledge, this is among the first studies that use ML methods to investigate the contagion and 37 spillovers during the eurozone crisis. Second, with the advance of causal machines in the ML community, this paper goes beyond ML’s ability to solve the prediction problem. We use novel causal structure learning in economic research. Third, by adopting the TCDF, we are able to analyze the contagion and spillovers at a larger scale and in higher dimension where the results provide richer information than previous studies. Our results not only agree with some prior works, but also add new findings to the repository, which also confirm the current consensus on the causes of the eurozone crisis. Before going into the details, we want to specify some terms used in this study. We use “crisis countries” to refer to Cyprus, Greece, Ireland, Portugal, and Spain. These five countries joined bailout programs provided by outside parties, such as the International Monetary Fund (IMF), European Stability Mechanism. The bailout programs signify their domestic sovereign crises during the period from May 2010 to March 2013. “Periphery eurozone countries” refers to Cyprus. Greece, Ireland, Italy, Portugal and Spain. This definition reflects their political and geographical status in the eurozone (Bartlett and Prica, 2016). Italy did not receive a bailout program, so it is not in the crisis country group. The six “periphery eurozone countries” are the ones that suffer the most during the eurozone crisis (Baldwin et al., 2015). Table 2.1. Country group in the EU in 2016, adapted from Bartlett and Prica (2016) Country group Countries Core countries within the eurozone and the Austria, Belgium, Finland, France, EU (Inner Core) Germany, Netherlands Core countries outside the eurozone, within Czech Republic, Denmark, Estonia, Latvia, the EU (Outer Core) Lithuania, Poland, Slovakia, Sweden, the United Kingdom Periphery countries within the eurozone and Cyprus, Greece, Ireland, Italy, Portugal, the EU (periphery eurozone countries) Spain, Crisis countries Cyprus, Greece, Ireland, Portugal, Spain 38 “Core countries” are in the inner circle of the European continent, they are also the ones that receive lesser impact during the eurozone crisis. The inner core and outer core are divided by their participation in the European Economic and Monetary Union. Table 2.1 shows the country groups within the EU. This chapter proceeds as follows. Section 2.2 introduces the background of the eurozone crisis. Section 2.3 reviews the current literature on contagion and spillovers during the eurozone crisis. Section 2.4 presents the CDS data used in this paper. Section 2.5 describes the TCDF model and the vector autoregression model. Section 2.6 presents the results of the models. Section 2.7 discusses the findings. Section 2.8 concludes. 2.2 Background This section intends to provide an overview of the eurozone crisis. There are two parts. First, we discuss the institutional setup of the eurozone and the buildup of imbalances in each country. Second, we try to identify the causes of the crises and discern their distinct nature. 2.2.1 The root and buildup of the crises In retrospect, just like every crisis, the root of the eurozone crisis is the disorderly unwinding of economic imbalances (Wanna et al., 2015). The imbalances lead to insupportable public and private debts and foreign lending, and eventually, to crisis. The eurozone countries who suffered the most are the ones who borrowed the most (Baldwin et al., 2015). The source of such imbalances can be traced back to the deficient design of the European Economic and Monetary Union (EMU). A major reason for the imbalances’ buildup is that the imbalances are conceived as a positive sign of financial market integration by the EMU (Peet and La Guardia, 2014). 39 Inside the eurozone, member states surrender the flexibility to wield their own monetary policy, but they are not compensated with sufficient protection against financial crisis from the European Union (EU). EU has insufficient institutions for monetary integration and it is lack of a response mechanism (Frankel, 2015). For example, the no-bailout clause (Article 125 TFEU7) forbids the European Central Bank (ECB) to lend money to member state in times of sovereign insolvency. The structural insufficiency of the EU, coupled with the favorable credit condition before the 2008 subprime crisis, are the main reasons for the large imbalances. Going back in time, the EU and the eurozone all started in 1992 with the Maastricht Treaty, which promised an integrated and cooperative Europe for the European countries. The Treaty put forward the convergence criteria8 for those planning to join the EU, among the criteria was the famous fiscal requirement of a less than 3% government deficit to GDP ratio and a less than 60% government debt to GDP ratio. With more countries meeting the criteria, the Stability and Growth Pact was signed in 1997 in the hope for an optimum currency area. EMU began in 1999 with 11 countries, and the euro was launched in 20029. The creation of the single currency brought faith in stability and growth, and led to convergence of interest rate among members of the eurozone. Figure. 2.1 shows the interest rate of 10-year government bonds for European countries (except Cyprus) from 1990 to 2007. Before 1999, large heterogeneities existed among the countries, the market asked high returns for countries with weaker fiscal fundamentals, such as Italy, Spain and Portugal. Core countries like Germany, France and Belgium paid lower rate, reflecting promising market prospect. Since 1999, heterogeneities in macroeconomic status and fiscal policies of different member states were projected into a homogenous interest rate. Italy, whose rate was 14% in 1990, dropped to 3% in 2001. German rate dropped from 8% to 4%. All countries could access the market at a low rate since 2001. 7 http://data.europa.eu/eli/treaty/tfeu_2016/art_125/oj 8 https://www.ecb.europa.eu/ecb/orga/escb/html/convergence-criteria.en.html 9 https://europa.eu/european-union/about-eu/euro/history-and-purpose-euro_en 40 Figure 2.1. 10-year government bond yields (%) of European countries, source: OECD.stat. Besides the introduction of the euro, another reason for the interest rate convergence is the “global savings glut”, a term coined by Ben Bernanke. It described the situation in which the savings from developing countries (mainly China) were injected into the developed countries such as the U.S. and the eurozone countries10. With the global economic prosperity in the early 2000s, the periphery eurozone countries enjoyed several years of favorable credit conditions just like their core neighbors. In Figure 2.1, the only exception is the U.K., an outsider of the euro area. U.K.’s rate rose above the euro area average rate since 2003, which reflected the true market outlook of U.K.’s fiscal conditions. The unified interest rate between the periphery and the core countries covered up the divergence of their macroeconomic fundamentals. In loose credit conditions, government tend to run sustained budget deficits that are financed by global investors (Cripps et al., 2011), the non-bank private sector becomes euphoric about the economic outlook and over-borrow because of the overly optimistic implicit signal about macroeconomic developments sent by the favorable credit conditions (McKinnon and Pill, 1998). Over-borrowing happens when the lending decisions of foreign financial institutions are 10 https://www.federalreserve.gov/boarddocs/speeches/2005/200503102/ 41 “guided by rough indicators of the country’s macroeconomic performance and not by careful assessment of individual borrowers’ abilities to repay” (Uribe, 2006). Over-borrowing in both the public and private sector of the periphery eurozone countries resulted in large imbalances. The imbalances would have been caught if they happened in a standalone country, but the prosperity of a unified monetary and economic region overshadowed the vulnerability of the member states. There are three components of the imbalances. The first component is the over-accumulation of government debt. The Maastricht Treaty’s threshold of a 60% debt to GDP ratio was clearly not enforced across the eurozone. Figure 2.2 shows the debt to GDP ratios of European countries from 1995 to 2007. The ratios rose above 60% in many countries, including France and Germany. Greece had an upward-sloping curve, with the ratio stood at 104% in 2007 and subsequently rose to 147.5% in 2010. On the left panel, Belgium had an abnormally high debt, which was on a par with Greece and Italy. Germany and France, rose above 60% since 2003, and their budget deficits also surpassed the 3% threshold set by the Maastricht Treaty. However, these two core members of the eurozone successfully evaded any official sanctions and revised the Safety and Growth Pact in favor of themselves in 2005 (Peet and La Guardia, 2014). Since then, the EU bodies no longer had the restraining power over sovereigns’ fiscal conditions. Countries with weak fiscal fundamentals were spared from controlling their budget deficits and instituting necessary structural reforms. As a result, some eurozone countries had accumulated too much debt compared to their sustainable rate of productivity growth. High level of government debts does not suggest an inevitable sovereign crisis. Countries like Belgium and Italy had a government debt of about 100% of their GDP, yet they stayed solvent throughout the eurozone crisis. Ireland and Spain, whose debt to GDP ratios were less than 40%, ended up with bailout programs (for Ireland and Spain, the housing bubble in the private sector had driven up the tax revenues, thus reducing the public debt, see Whelan (2014) for a review on Ireland). 42 Figure 2.2. Government debt to GDP ratio. Left panel: non-crisis country. Right panel: periphery eurozone countries. Source: IMF Global Debt Database. Figure 2.3 Left: Bank assets to GDP ratio in current prices from 1999 to 2009. Right: Private debt to GDP ratio in current prices from 1999 to 2011. Source: OECD.Stat. 43 The second component is the large private debts in almost all eurozone countries. In Figure 2.3, the left panel shows the banks’ total asset to GDP ratio in current prices. Ireland stood out with an astounding percent at 588% of its GDP in 2008. Finland, who had the lowest ratio of all, had a stable and healthy percent around 120%. Italy and Spain were on better terms, following the same trend as most of their northern core neighbors. Due to data availability, Cyprus, Greece and Portugal are not included in the left panel. The right panel of Figure 2.3 shows the private debt to GDP ratio in current prices. Ireland had the highest private debt among all countries, it reached a level of 374.5% in 2011, this is because of the pre-crisis housing bubble (Whelan, 2014). Spain, who also had a pre-crisis housing bubble, had the same issue with large private debt, it reached a percent of 265.3% in 2011. Portugal and the Netherlands had large private debts as well. At the bottom of the right panel, Greece had the smallest private debt to GDP ratio in all European countries. The third component is the current account imbalances that built up inside the eurozone from 2000 to 2008. The current account deficits are a sign of danger for a balance of payment crisis in a standalone country. However, the European Commission never imagined the possibility of a balance of payment crisis inside the eurozone because of the single currency. The large current account deficits in the periphery eurozone countries were seen as “benign reflections of optimizing capital flows, instead of warning signals” (Frankel, 2015). Those current account imbalances were treated as harmless heterogeneity as in federal states in the U.S. However, the European Commission was not entitled to the equivalent administrative power and influence to its member states the same way as the U.S. had over its federal states. Besides the European Commission, other international organizations such as the OECD and the IMF also failed to send the warning signal of unsustainable trade imbalances and loss of competitiveness (Honkapohja, 2014). Eventually, the periphery eurozone countries built up large amounts of current account deficits in the early 2000s. If such imbalances are detected in a standalone 44 country, the large current account deficits would surely trigger the market response, as they are the signal of a loss of competitiveness. Figure 2.4. Current account balances in billions of U.S. dollars of the European countries from 1995 to 2007 in current prices. The left panel is the periphery eurozone countries, the right panel is the non-crisis countries. Source: IMF World Economic Outlook. Figure 2.4 shows the current account balances in billions of U.S. dollars of the European countries from 1995 to 2007 in current prices. Periphery eurozone countries are on the left panel, non-crisis countries are on the right panel. Since 2000, Greece, Ireland, Italy, Portugal and Spain all dropped into large external deficits, while Cyprus seemed to maintain its balances. Spain had the largest external deficits in the eurozone, at 139 billion dollars in 2007. On the right panel, the U.K. ran an external deficit since 1998. However, given the size of its economy, the deficits only accounted for 3% of its GDP (see Figure 2.5). All other countries had external surplus. Figure 2.5 shows the current account balances as a percent of GDP of the European countries. On the left panel, Greece had a large external deficit equaling to 14% of its GDP in 2007. 45 Portugal and Spain had a deficit around 10% of its GDP. Cyprus also built up an external deficit, which was 10.7% of its GDP. The size of the Cypriot economy was small, so the deficit was not obvious in levels in Figure 2.4. All six periphery eurozone countries had large current account deficits, five of them (Greece, Ireland, Portugal, Spain and Cyprus) had a sovereign crisis. Figure 2.5. Current account balances as a percent of GDP of the European countries from 1995 to 2007. The left panel is the periphery eurozone countries, the right panel is the non-crisis countries. Source: IMF World Economic Outlook. Figure 2.6 shows the current account balances in billions of dollars for the euro area and the EU from 1997 to 2016. The current account for the eurozone was above zero from 2001 to 2007, it dropped below zero in 2008 because of the subprime crisis, then it went up again in 2009 and kept growing since then. As a whole, the eurozone’s current account was in balance before and after the crisis (Baldwin et al., 2015). The deficits countries had been importing from their rich neighbors inside the eurozone, mostly from Germany (see Figure 2.4). Hobza and Zeugner (2014) study the buildup of euro area imbalances and show that the current account deficits were linked through intro-eurozone financial inflows rather than trade flow, Germany was the 46 main driver but France played a crucial role for capital flows into the periphery countries as well. Figure 2.6. Current account balances of the euro area and the European Union from 1997 to 2016, in billions of dollars. Source: IMF World Economic Outlook. In the periphery eurozone countries, the foreign capital inflows mostly went to the non- tradable sectors such as real estate and construction (Chen et al., 2013; Hobza and Zeugner, 2014), those sectors could not generate the growth and innovation that was essential to a country’s productivity. What was even worse was that, the incoming investment drove up the housing price and factor prices, leading to a weak position in price competitiveness. The consequence of a decade of external deficits was the loss of productivity and competitiveness. To sum up, the insufficient architecture and mismanagement of the EMU, combined with the loose credit condition before 2007, had allowed large internal and external deficits to build up in the periphery eurozone countries. The deficits were funded by foreign capital inflows (Baldwin et al., 2015; Croci et al., 2016). The EU bodies and the soon-to-be crisis countries turned a blind eye to the deteriorating fiscal fundamentals and loss of competitiveness, that eventually developed into a gap that was too wide to close. 47 2.2.2 The causes and nature of the crises This section looks into the causes and the nature of the crises. First, we will look at the three causes for the crises. The first is the 2008-2009 global financial crisis; the second is the large banks and their close link to the sovereigns; the third is the loss of confidence in countries with high public debt. The insufficient and irresolute response of the EMU served as the catalyzer during the crises. Then, we will discuss the nature of crisis in each periphery eurozone country. The first cause of the eurozone crises is the 2008-2009 global financial crisis. The global financial crisis triggered a series of shocks to the global capital market, causing sudden stop of cross border capital inflow all over the global. The six periphery eurozone countries all had either extensive public debts or private debts with a current account deficit, and foreign capital inflows were vital to sustain their debts. When sudden stops happened to the capital inflow, the imbalances could no longer sustain, triggering market response of a sharp decrease in bond prices (Constâncio, 2013). The periphery eurozone countries could not devalue their currency nor asked the central banks to bail out the government like a standalone country (Eichengreen and Gupta, 2018). Moreover, the eurozone countries did not have a lender of last resort, because the ECB was forbidden to bail out member states by EU Treaties. Grauwe and Ji (2013) points out that, when facing sudden stops, eurozone countries were more prone to self-fulfilling crisis than standalone countries. Taking Greece as an example, the impact of the 2008-2009 global financial crisis on Greece was substantial. Two major Greek industries, shipping and tourism, fell victim to the global recession. The Greek economy contracted and so did the government tax revenue (Peet and La Guardia, 2014). Greece accumulated a 110.3% government debt to GDP ratio (highest in the eurozone), and a 7.1% government deficit to GDP ratio in 2008. Without foreign capital inflows, the huge public debts could collapse at any time. All the six periphery eurozone countries had 48 large debts funded by foreign capital inflows, the 2008 global recession revealed the high risk of such debts. The second cause of the eurozone crisis is the large banks and their debt holdings of their home country’s bonds. The European banks were too large before 2007. Veron (2007) argues that the banks in Europe had grown too large relative to its home country’s GDP since the European integration. The “pan-European banks” spanned their services across the EU, but the bank supervision was only kept at the national level. The monetary policy was controlled by the EU-level bodies; however, no EU-level body had the authority to cope with large-scale bank failure. The author then warns that “faulty cross-border coordination and diverging national views could seriously hamper the ability of the authorities to respond speedily and effectively to an unfolding financial crisis.” This was exactly what happened in Ireland. The Irish banks’ total assets to GDP ratio was 588% in 2008 (see Figure 2.3), far exceeding other eurozone countries. When Irish banks were facing an imminent banking crisis, the sheer size of the banks required the Irish government to save them to avoid further economic disturbance, but the 588% ratio also suggested that the Irish government could not afford to save it. The European banks tend to hold a relatively large share of sovereign debts of their home country. Table 2.2 shows a breakdown by sector of holdings of marketable debt for the European countries and the U.S. in 2007 and 2011 (from Merler and Pisani-Ferry, 2012). Before the subprime crisis, the U.K. and U.S. held -1.6% and 1.4% of their own debt in 2007. In contrast, the continental European banks had a much larger share of their home country’s debts. German and Spanish banks held above 20%, Greece, Italy and Portugal held about 10%. Ireland was the only exception, with just 2.6%. Home bias is a measure that reflects to what extent a bank’s share of domestic government debt exceeds the averages share of other countries’ government debt (Horváth et al., 2015). Many studies have documented a large home bias in European banks’ debt holding before the eurozone crisis (e.g., Horváth et al., 2015; Saka, 2020). 49 Table 2.2. Breakdown by sector of holdings of marketable debt, 2007 and 2011 (billions of national currency and, in parentheses, percent of total stock), modified from Merler and Pisani-Ferry (2012) Domestic Central Other public Other Nonresident Total banks bank institutions residents (excl. ECB) Greece 23.9 3.2 25.4 6.5 166.1 225.1 (10.6) (1.4) (11.3) (2.9) (73.8) Ireland 0.8 n/a 0.1 1.2 28.8 30.9 (2.6) (0.3) (3.95) (93.1) Portugal 10.6 0.0 n/a 17.3 87.7 115.6 (9.1) (0.0) (15.0) (75.9) Italy 159.9 60.3 n/a 450.7 647.1 1317.9 (12.1) (4.6) (34.2) (49.1) Spain 74.3 9.2 26.5 73.3 166.7 349.9 (21.2) (2.6) (7.6) (20.9) (47.7) Germany 456.9 4.4 0.5 317.1 761.5 1540.4 (29.7) (0.3) (0.03) (20.6) (49.4) France 83.3 n/a n/a 205.0 352.4 640.7 (13.0) (32.0) (55.0) Netherlands 18.7 n/a 0.9 44.7 144.6 209.0 (8.9) (0.4) (21.4) (69.2) U.K. -7.9 2.4 0.8 337.3 160.2 492.8 (-1.6) (0.5) (0.2) (68.5) (32.5) U.S. 129.8 754.6 4616.5 1375.1 2353.2 9229.2 (1.4) (8.2) (50.0) (14.9) (25.5) Source: Merler and Pisani-Ferry (2012) This interdependence of banks’ holding in home country’s debt can create a “negative feedback loop” between the distressed banks and the distressed sovereign (Angelini et al., 2014; Saka, 2020). When the market loses faith in a sovereign’s solvency, their banks, who have heavily invested in their own sovereign’s bonds, are also caught in the fire. The underperforming banks depress the economy, thus depressed the already distressed sovereign’s budget situation. The viscous cycle of deteriorating bank and sovereign can aggravate an ongoing crisis. The third cause of the eurozone crisis is the loss of confidence in countries with high public debt and weak fundamentals. The market started to lose faith as the rating agencies (e.g., Moody, S&P) downgraded the sovereign bonds of the periphery eurozone countries, for example, S&P downgraded the Greek 10-year sovereign bond from A in 2009 to CCC in 2011. Many papers 50 study the influence of market assessments during the eurozone crisis. Santis (2012) shows a strong link between the rating downgrade in Greece and bond yields in other eurozone countries with weak fundamentals: Ireland, Portugal, Italy, Spain, Belgium and France. Grauwe and Ji (2013) find that the surges in the sovereign bond yields are “associated with negative self- fulfilling market sentiments that became very strong since the end of 2010”. The European Commission’s inadequate crisis management also aggravated the market outlook. The leaders of the European Commission lacked the strength and the resolution to solve the Greek crisis. For example, France and Germany signed the Deauville Agreement amid the Greek crisis in October 2010 (in their own favor). This agreement served three purposes. First, it delayed sanctions on countries whose debt to GDP ratio was above 60%. Second, it required private-sector involvement in future bailouts, meaning larger debt write-downs for private investors. Third, countries amid a crisis would lose the right to vote in the EU Councils. Agreement like this symbolized the divides between the core countries and periphery countries. The discords in the eurozone exacerbated the economic outlook of the indebted countries and drove investors further away. The above discussion concludes the three causes for the eurozone crisis. The following part will explain the nature and development of crisis in each country. Though the five crisis countries all received bailout loans, one should not consider the eurozone crisis as a pure sovereign debt crisis. Its nature is more akin to a balance of payments crisis (Higgins and Klitgaard, 2014). To help understand the crisis development, in Appendix A, Table A.2 shows a timeline of the eurozone crisis major events. Greece had the highest level of public debt of all eurozone countries before the onset of crisis, its government debt to GDP ratio was 127.8% in 2009. On top of that, the political instability added fuel to the fire. In November 2009, the newly elected government accused their predecessor of “fabricating” the budget deficits, that the true deficits to GDP ratio was 12.7%, 51 almost twice as their predecessor’s claim (Peet and La Guardia, 2014). This shocking revelation led to the downgrading of Greek bonds and soaring bond yields; the huge government debts could no longer be refinanced. The debt to GDP ratio amounted to 147.5% in 2010. Bond yield went from 6% in 2009 to an astounding 29% in 2012. The struck of the 2008-2009 global financial crisis, combined with the revelation of its true debt condition, placed Greece in a classical sovereign debt crisis. The government could no longer stay solvent on its own. At the early stage, the European Commission insisted on the “no bailout clause” and denied the possibility of asking IMF for help (Frankel 2015). These decisions inevitably exacerbated the situation. Speculations about a “Greek exit” emerged, under such pressure, in May 2010, the European Commission, ECB and the IMF (the three entities are named “the Troika”), issued a €110 billion euro bailout loan to Greece, which required the Greek government to implement austerity measures and structural reforms11. This marked the beginning of a series of crises in the eurozone. Ireland, whose government debt was below the required 60% threshold, had a different story. Ireland had the largest banks’ total asset to GDP ratio among all European countries (588% in 2008). It endured a housing bubble in the real estate market that was created by the loose credit conditions prior to the U.S. subprime crisis in 2008 (Whelan, 2014). When the bubble started to burst, the highly leveraged Irish banks started to encounter liquidity risk. The endangered banks were so large (pan-European) that the government must save them to avoid more severe consequences. The EU level authorities who had control over monetary policy failed to contain the emerging banking crisis (due to the lack of proper response mechanism, see Lane (2011) for a discussion). The Irish government, who was unable to maneuver any monetary policy, became insolvent themselves while saving the banks. Ireland had to receive its first bailout loan in 11 See IMF website for bailout program details: https://www.imf.org/en/News/Articles/2015/09/28/04/53/socar050210a 52 November 2010 (six months after Greece). Ireland only had a 42.4% government debt to GDP ratio in 2008, which was the second smallest of all eurozone countries. Ireland did not share the same fiscal distress as Greece. The nature of the Irish crisis was a banking crisis triggered by the global recession, then the banking crisis caused the sovereign crisis. Portugal had long-term slow growth and loss of competitiveness coming into the crisis. It had built up large current account deficits (11.8% of its GDP in 2008). Portugal also had weak fiscal fundamentals, its government debts to GDP ratio had risen from 75.6% in 2008 to 100.2% in 2010. When its neighbors Greece and Ireland fell into the abyss of sovereign insolvency, concerns of the Portuguese sovereign had triggered sudden stops of the capital inflows. As the market sentiment of the Portuguese government bonds shifted, the Portuguese banks faced a backlash of holding too much home country bonds (9.1% of all Portuguese bonds were held by domestic banks). This had resulted in a typical negative feedback loop situation (Angelini et al., 2014). Portugal received a bailout loan in May 2011, and its situation was akin to a non-typical balance of payment crisis with sudden stops of capital inflows, joined by a negative feedback loop between the banks and the sovereign. Spain had the highest level of current account deficits among all eurozone countries, measured at 145,274 billion dollars in 2008. The prolonged huge imbalance showed a loss of competitiveness of the Spanish economy (Baldwin et al., 2015). Similar to Ireland, Spain endured a housing bubble before the crisis, a large portion of banks’ loans went to mortgage (33% of total loans in 2007) and construction (8% of total loans) that fueled the bubble (Quaglia and Royo, 2015). The banks also held too much Spanish sovereign bonds (21.1% of total Spanish bonds in 2007). When the sudden stops of capital inflows struck, the housing bubble started to burst. The negative market sentiment of the struggling neighboring countries also aggravated the economic outlook for Spain. The banks’ large holding of the Spanish bonds also led to a negative feedback loop. The Spanish banks, received a bailout loan through the Spanish 53 government from the Troika in June 2012. The Spanish government stayed solvent throughout. Luckily, Spain had smaller banks, Spanish banks were of a similar size to core countries such as Germany and France, with a total asset to GDP ratio of 189.7% in 2008. It also had a healthy government debt to GDP ratio (39.7% in 2008). The Spanish banking crisis was not a fiscal matter; the culprit was the current account deficits (loss of competitiveness) and the banking sector. Cyprus, the smallest one in the periphery eurozone countries (0.2% of the eurozone economy), was almost impossible to remain intact while all of its neighbors were experiencing an economic downturn. The Cyprus crisis started with the banking sector; Cypriot banks held excessive Greek assets. Zenios (2013) measures that the Cypriot banks face a “total exposure to Greek loans and sovereign debt worth 160% of GDP” in 2011. When the Greek bond was downgraded by rating agencies in 2010, the Cypriot banks were severely impacted. When the Greece installed the private sector involvement (PSI) in 2011, the Cypriot banks endured a loss equivalent to 23.03% of its GDP (Zenios, 2013). The burdens became too heavy to bear for the banks and the Cypriot government, so they sought help from the Troika. Cyprus was granted a bailout loan in March 2013. The Cyprus crisis was the collateral damage of its underperforming neighbors. Their banks’ close linkage with the Greek economy served as the amplifier. Besides the banking sector, Cyprus had large private debts (118% of its GDP in 2010), and current account deficits (10.7% of GDP in 2010), but its public debts (55% of GDP in 2010) were well maintained, suggesting that the Cypriot crisis was certainly not a fiscal one. Italy had never joined a bailout program, but it had its share of the crisis experience. Being the third largest economy inside the eurozone, Italy held a large government debt (119% of GDP in 2009). Italy’s banking sector and the current account deficits were on a reasonable trajectory. But there is another layer of instability, which was Italy’s political weakness. In November 2011, the leadership change in the Italian government “marked the economic and political crisis 54 in Italy” (Romano, 2021). Orsi (2013) explains that, given the adverse fiscal and political environment, “the Italian state went bankrupt in summer 2011.” Though the Italian government never joined a bailout program, it benefited hugely from several EU programs. In August 2011, the ECB, started to purchase Italian and Spanish bond from the market. In December 2011, the ECB launched the Long Term Repo Operations (LTRO) mechanism, which injected lots of liquidity into the Italian financial market. These measures saved Italy from the fate of insolvency. Though Italy did not receive any bailout loans, Di Quirico (2010), Romano (2011) and other authors consider Italy as a crisis country, just as Greece. In 2012, the European Commission and the ECB had taken measures to rescue the sinking euro area. The turning point was December 2012, when the ECB president Mario Draghi made the famous speech that “the ECB is ready to do whatever it takes to preserve the euro”. The speech turned the negative market beliefs into positive. Since then, the collective actions of the crisis countries, EU authorities, and international organizations helped to save the eurozone countries. From the above discussion, we can see that all the six periphery eurozone countries had large current account deficits, they all share the problem of a loss of productivity and competitiveness. The crises resemble a typical balance of payment crisis in a standalone country, which is unimaginable for countries in a currency union. It is natural to raise the question of how those crises link to each other? The next section gives a review of the existing literature on the contagion and spillovers during the eurozone crisis. 2.3 Related literatures The primary focus of this paper is to use ML method to study the interdependences among the European countries during the eurozone sovereign crises. The most commonly used data to 55 study the sovereign crisis is the credit default swap (CDS) of sovereign bonds. This paper uses the daily spreads of CDS of the 5-year sovereign bond. The daily dataset can provide ample data points for a ML setup. Other than the CDS data, the existing literature also uses stock market price, government bond yield, and other macroeconomic data. In order to compare the empirical findings, this literature review only focuses on the studies that use the CDS data. Previous studies mainly use two approaches to study the contagion and spillovers, one is to measure the contagion and spillovers among different countries (e.g., Alter and Beyer, 2014; Glover and Richards-Shubik, 2014). The other is to study the determinants of the CDS (e.g., Beirne and Fratzscher, 2013; Arghyrou and Kontonikas, 2012). Common methodologies include ordinary least squares (OLS), vector autoregression (VAR), event studies, autoregressive conditional heteroskedasticity model (ARCH), nonlinear regressions, etc. Their findings can also be put into two groups, by whether contagion and spillovers are detected. The first group of researchers finds no contagion and spillovers. Caporin et al. (2018) use standard quantile regression and Bayesian quantile regression with heteroskedasticity to study the sovereign risk shift-contagion in major eurozone countries. They find almost no presence of shift-contagion in their sample periods. There is no correlation between risk spillover and the sign of the shock (Greek crisis), meaning that there is little contagion, even during the most intense crisis period. Bhanot et al. (2012) analyze the relations in the CDS spreads using ARCH that include time-varying volatilities and changes in fundamentals, their results show that there is no conditional correlation in the CDS spreads between Greece and PIIGS (Portugal, Ireland, Italy, and Spain). They find no evidence of contagion from Greece to PIIGS, nor to the eurozone core countries. Glover and Richards-Shubik (2014) use the CDS data and macroeconomic data to estimate a network model of credit risk to measure market expectations of the spillovers of sovereign defaults. After the model is calibrated, it is used to conduct simulation for the short-run effect 56 of a default. Their results show only tiny spillovers to other sovereigns, that each $1 of debt directly lost in default is linked with an expected loss of 2 cents from additional defaults in other countries. Koutmos (2018) employs a VAR model to study the dynamic interdependencies of CDS spread among several EU countries between October 2004 and July 2016. The author discovers that the interdependencies vary across different periods of time, but there is no empirical evidence to show that Greece has transmitted the sovereign crisis to other countries. Beirne and Fratzscher (2013) use a comprehensive linear model to study the drivers of sovereign risk (CDS spread) for 31 countries during the eurozone crisis, they found that the country specific variables on macro fundamentals play the biggest role, whereas the regional contagion and spillovers are inconsequential, even for the eurozone countries. The second group of researchers detects certain levels of contagion and spillovers during the eurozone crisis. Arghyrou and Kontonikas (2012) apply a convergence-trade model to study the pricing behaviors of CDS spreads. They discover that most eurozone countries have experienced contagion from Greece during the crisis period. Bampinas et al. (2020) investigate both the sovereign bond market and the CDS market to study cross-border and intra-market linkage in the eurozone countries from 2006 to 2018, they adopt the excess correlation concept of Bekaert et al. (2005) and use the local Gaussian correlation approach to study the contagion. Their results show that contagion occurs during different periods, in particular, Italian and Spanish CDS spreads spill towards all European CDS spreads around November 2011. Broto and Pérez-Quirós (2015) use a dynamic factor model to decompose the sovereign CDS spreads of ten OECD economies, they find three factors: a common factor, a factor driven by peripheral eurozone countries and a country specific factor. By utilizing the three factors in a novel methodology to characterize contagion, they discover that contagion has played a non- negligible role in the peripheral eurozone countries since the onset of the crisis. Buchholz and Tonzera (2016) use a multivariate GARCH model to study the sovereign CDS spreads of 17 57 countries from 2008 to 2012. They find strong evidence for both fundamentals and non- fundamentals based contagion among those countries. Gómez-Puig and Sosvilla-Rivero (2016) use the logit model to study whether the sovereign risk is transmitted through “pure” or “fundamentals-based contagion” during the eurozone crisis. Their findings confirm the coexistence of “pure” and “fundamentals-based contagion”. Kalbaska and Gątkowski (2012) study the CDS spreads of several European countries, they employ the EWMA correlation analysis and the Granger causality test, and find evidence of the contagion effect since August 2007. Missio and Watzka (2011) estimate a dynamic conditional correlation model to assess if contagion is identifiable during the eurozone crisis. Their findings confirm the existence of contagion in the euro area. There are other papers who use the government bond dataset to study the contagion and spillovers between the bond market and CDS market, most of them find evidence supporting the existence of contagion (e.g., Claeys and Vašíček, 2014; Croci et al., 2016; Cronin et al., 2016). Table 2.3 summarizes the related literatures. The second and third columns show the data and model used in the studies, the last four columns show their answers to the four questions. First, are there contagion and spillovers during the Eurozone crisis? Second, is Greece the origin of all the crises? Third, are there contagion and spillovers from the crisis countries to the non- crisis countries inside the eurozone? Fourth, are there contagion and spillovers from the eurozone countries to other countries in the European Union? The top panel shows the results of the first group of researchers, the bottom panel shows the results of the second group. The last row shows the results of this paper. Y stands for existence of contagion and spillovers, N stands for small or no contagion and spillovers, N/A means that this question has not been studied in the paper. 58 59 Table 2.3. Related studies on eurozone crisis contagion and spillovers Paper Data Model Q1 Q2 Q3 Q4 No contagion and spillovers Bhanot et al. (2012) Bond spreads and CDS spreads, daily Autoregressive conditional heteroskedasticity N N N N/A Beirne and Fratzscher (2013) Bond spreads, CDS spreads, and Comprehensive linear model N N N N macroeconomic data, quarterly Glover and Richards-Shubik CDS spreads and macroeconomic Financial network model N N N N (2014) data, quarterly Caporin et al. (2018) Bond spreads and CDS spreads, daily Standard and Bayesian quantile regression N N N N/A Koutmos (2018) CDS spreads, weekly Vector Autoregression Y N Y Y Contagion and spillovers exist Missio and Watzka (2011) Bond spreads, daily Dynamic conditional correlation models Y Y Y N/A Arghyrou and Kontonikas (2012) Bond spreads and macroeconomic Convergence-trade model Y Y Y N/A data, monthly Kalbaska and Gątkowski (2012) CDS spreads, weekly Exponentially Weighted Moving Average and Y Y N N Vector Autoregression Broto and Pérez-Quirós (2015) CDS spreads, weekly Dynamic factor model Y Y N N Buchholz and Tonzera (2016) CDS spreads and macroeconomic Multivariate Generalized Autoregressive Y Y Y Y data, daily Conditional Heteroskedasticity Gómez-Puig and Sosvilla (2016) Bond spreads, daily Logit model Y Y Y N/A Bampinas et al. (2020) Bond spreads and CDS spreads, daily Bootstrap test using local Gaussian correlation Y N Y N/A This paper CDS spreads, daily Temporal Causal Discovery Framework Y N Y Y In Table 2.3, there is an obvious divergence of opinions between the two groups of researchers. For the second question of whether Greece has triggered the crisis in other sovereigns, it is not surprising to see that the authors could not reach a consensus. As explained in section 2.2, the periphery eurozone countries have areas of commonality while going into the crisis (the high external debts and weak fundamentals), but the nature of their crises differs. The difference in the empirical findings has three reasons. First, the datasets are different. Though all the studies use the CDS data, some use the 5-year CDS while others use the 10-year CDS. The frequencies vary from daily, weekly, monthly and quarterly. The CDS data can be the market closing price, or a composite index deduced from prices in several markets. Other than the CDS data, some studies also use macroeconomics data to control for country specific factors, some use the sovereign bond spreads and the CDS spreads to fit a model. Second, the modeling strategies are different, some use the model-driven approach to build structural models (e.g., Glover and Richards-Shubik, 2014; Arghyrou and Kontonikas, 2012), some use regression models (e.g., Caporin et al., 2018; Koutmos 2018; etc.). Third, there are nuances in the measurement of contagion and spillovers across the studies. Some use Granger causality, while some use their self-defined quantifiable measures. All the researchers use traditional econometric methods to study the underlining linkage among European countries. This is partially because of data availability of the macro fundamental variables, most of them are monthly or quarterly data. The low frequency data has limited observations during the crisis periods, making it unsuitable for ML methods. Also, the research question on contagion and spillovers is causal, which is not the expertise of ML. For those reasons, the existing literature only has few applications of ML methods in the eurozone crisis, but their focus is on prediction, rather than contagion and spillovers. With recent development of ML techniques in the causal discovery literature, one can employ the deep learning framework to discover the causal relationships in time series data. This paper employs the Temporal Causal Discovery Framework (TCDF), which uses the 60 convolutional neural network as the core algorithm, to study the causal relationships in the European CDS market. 2.4 Data description It is a common practice to use the sovereign CDS data to study contagion and spillovers during sovereign crises, both in academia and the press (Augustin, 2014). This paper uses the daily 5- year sovereign CDS spreads in 13 European countries from 3 October 2005 to 31 December 2015, totaling country-daily 2,670 observations. The daily CDS data contains only weekday prices, excluding weekends. The CDS data is collected from Markit and Bloomberg. Like most other studies, this paper uses the 5-year sovereign CDS spreads. The 5-year sovereign CDS is the most liquid among all maturities in the market, it has the largest number of transactions from which the daily CDS spreads can be deduced. The CDS spreads are based on the USD-denominated CDS contract (U.S. dollar is the standard currency in the CDS market). The 13 countries include the six periphery eurozone countries: Cyprus Greece, Ireland, Italy, Portugal and Spain; six core eurozone countries: Austria, Belgium, Finland, France, Germany, the Netherlands; and one country outside the eurozone, the United Kingdom. Because the emphasis of this paper is to study the contagion inside the eurozone, and also due to data continuity, other EU countries such Denmark, Sweden and Norway are not included in the analysis. There are some missing data for Greece in the period from March 2012 to June 2013. During this time, the Greek bond yields and default risk were so high that there were almost no CDS transactions in the market. The Markit CDS data was a composite price calculated from market prices, so there were not sufficient transactions to get CDS data for this period. To solve this problem, we use the Bloomberg CDS data on Greek sovereign CDS for the missing data points. 61 Though Bloomberg uses the mid-day quote of the CDS, while Markit uses a composite price, their Greek sovereign CDS data follows almost identical trends in other periods. To observe the dynamic interdependencies between CDS spreads among the EU countries, we split the data into 4 phases: pre-crisis, crisis buildup, euro area recession and post-crisis. In our analytical models (TCDF and VAR), there is no time varying measure for contagion and spillovers, we cannot monitor the spillover dynamics using the whole sample, splitting the data allows us to study the pattern of contagion and spillovers in different crisis periods. Many other papers have also taken this approach (e.g., Kalbaska and Gątkowski, 2012; Koutmos, 2018). The first phase is the pre-crisis period, with the start date of 3 October 2005. This is the earliest date of the data. Most of the literatures on subprime crisis use 9 August 2007 as the starting date for the subprime crisis (e.g., Longstaff, 2010). It is the day when BNP Paribas announced that it would freeze $2.2 billion worth of funds in the U.S. subprime mortgage market. Hence, we choose 31 July 2007 as the pre-crisis end date. During this period, worldwide economies enjoyed the prosperity of stable economic growth. The second phase is the crisis buildup period. The subprime crisis is one of the three causes of the eurozone crisis. Starting from 2007, countries like Greece, Ireland and Spain endured sudden stops of foreign capital inflows, the credit crunch put pressure on the banking sector and the sovereigns. Therefore, we pick the start date of the subprime crisis as the starting date of the eurozone crisis buildup period, that is 9 August 2007. The end date of this period is the day before the first Greek bailout, that is 30 April 2010. The third phase is the euro area recession period, the start date is 3 May 2010, when Greece received its first bailout loan. This bailout marked the beginning of the series of sovereign crises inside the eurozone. The end date of this recession period is 29 March 2013. We pick this date according to the European Commission’s economic forecast in Spring 201312. The start and end 12 https://ec.europa.eu/economy_finance/publications/european_economy/2013/ee2_en.htmt 62 dates of the eurozone crisis are consistent with the OECD recession indicators for the eurozone13. The fourth phase is the post-crisis recovery period. The start date is 1 April 2013, following the last day of the recession period. The European Commission claimed that the euro area had restored its economic vitality back to the 2008 level by the end of the 201514, so we pick the end date of 31 December 2015. Table 2.4. Crisis phases and data availability Time Phase Data availability 3 October 2005 ~ Pre-crisis 12 countries (excl. U.K.), 417 31 July 2007 observations. 1 August 2007 ~ Crisis buildup 12 countries (excl. U.K.), 718 30 April 2010 observations 3 May 2010 ~ Euro area recession 13 countries. 760 observations. 29 March 2013 1 April 2013 ~ Post-crisis 13 countries. 715 observations. 31 December 2015 Our data partition is consistent with the studies that split the data into different crisis periods, but the exact periods vary across studies. For instance, Kalbaska and Gątkowski (2012) choose August 2007 as the start date of the eurozone crisis, while we consider it as the start of crisis buildup period (phase 2). To better compare our results with other studies, we analyze our data not just on phase 2, but also phase 2 and 3 together. Table 2.4 shows the 4 phases with the data availability. For phase 1 and 2, the CDS data of the U.K. is sparse, therefore the U.K. data is excluded from phase 1 and 2. 13 The OECD recession indicator: https://fred.stlouisfed.org/series/EUROREC 14 https://ec.europa.eu/commission/presscorner/detail/en/IP_17_2401 63 Figure 2.7. Time series plot of CDS spreads in basis points for Austria, Belgium, Cyprus, Finland, France, Germany, Greece, Ireland, Italy, Netherlands, Portugal and Spain. 64 Figure 2.7 continued. Figure 2.7 shows the time series plot of CDS spreads in basis points of the 12 eurozone countries from 3 October 2005 to 31 December 2015, the U.K. data is excluded due to data availability. The shaded background shows the 4 phases, respectively. In phase 1, all 12 countries’ CDS spreads remain flat, reflecting the global economic growth and stability during the pre-crisis period. This tranquility in all countries suggests that there is no shock transmission during phase 1. In phase 2, the crisis buildup period encompasses the 2008-2009 global financial crisis. All 12 countries saw a rise in the CDS spreads. The rise in Cyprus, Greece and Portugal is not conspicuous in the figure, but if one looks at the scale on the y-axis, the Cypriot CDS spreads had reached as high as 195 in basis point, 145 for Portugal and 295 for Greece. For all the periphery eurozone countries, the rises in phase 3 dominate the rises in phase 2. 65 In Appendix A, Table A.2 shows the descriptive statistics of the CDS spreads in basis points. Table A.3 shows the correlation matrix of the log first differences of the CDS spreads. In those two tables, Panel A reports results for phase 1, Panel B reports phase 2, Panel C reports results for phase 3, Panel D reports phase 4, Panel E reports results of the full sample. In Table A.2, all the panels show that the periphery eurozone countries have the highest mean and median among the 13 countries, in particular, Greece leads with a large margin. During the pre-crisis periods (phase 1), the CDS spreads of the soon-to-be crisis countries are already higher than that of other countries. In retrospect, this suggests that the CDS spread is a potent measurement of market risk of the sovereign bonds. The pairwise correlation among all the countries is near zero in phase 1, as shown in Table A.3 panel A, confirming the speculation in Figure 2.7 that there is no spillover during phase 1. Then, in phase 2, a significant increase in the correlation coefficients can be observed in panel B, which is an indicator for CDS spreads co-movement. This is unsurprising because all countries were affected by the 2008-2009 global financial crisis concurrently. In phase 3, the coefficient values start to show more variations. The correlation coefficient between Greece and Cyprus turns negative, which is the only negative coefficient in this panel. Greece, at the center of the vortex during phase 3, has an average pairwise correlation of 0.632 with other crisis countries. Its average correlation with non-crisis countries is 0.453, suggesting that the interdependency between Greece and non-crisis countries is lower than that of the crisis countries. In phase 4, the correlations numbers decrease in magnitude, implying stability during this period. Augmented Dickey-Fuller (ADF) test is conducted on the CDS spreads data. We use the log- difference and log-level of the CDS spreads, the lag length is based on Akaike Information Criterion (AIC). The results are shown in Table 2.5 and Table 2.6. The log-difference of CDS 66 spreads is stationary, while the log-level of CDS spreads fail to reject the null hypothesis of a unit root. We use the stationary log-difference CDS spreads in our data analysis. Table 2.5. Augmented Dickey-Fuller test statistics for the log-difference of CDS spreads Country Phase 1 Phase 2 Phase 3 Phase 4 Full sample Austria -5.498* -5.440* -6.975* -7.842* -11.656* Belgium -6.135* -6.053* -7.973* -9.133* -13.226* Cyprus -8.293* -10.985* -7.797* -10.834* -19.721* Finland -7.974* -6.964* -7.659* -10.398* -15.063* France -4.825* -6.782* -10.198* -9.740* -15.104* Germany -7.959* -7.531* -8.662* -8.276* -15.286* Greece -7.046* -9.307* -8.487* -7.101* -15.497* Ireland -6.900* -6.324* -10.021* -9.060* -15.248* Italy -3.230* -6.660* -11.366* -6.025* -14.624* Netherlands -6.527* -7.576* -9.081* -8.795* -15.051* Portugal -4.192* -7.593* -11.344* -7.646* -17.285* Spain -5.953* -6.544* -11.927* -7.042* -15.793* UK -9.817* -8.218* Augmented Dickey-Fuller test statistics for the log-difference of CDS spreads. Lag length is based on the AIC. An asterisk (*) indicates statistical significance at least at 5% level to reject the null hypothesis of a unit root. Table 2.6. Augmented Dickey-Fuller test statistics for the log-level of CDS spreads Country Phase 1 Phase 2 Phase 3 Phase 4 Full sample Austria -1.089 -1.089 -0.883 -0.139 -0.786 Belgium -2.177 -2.177 -1.506 0.301 -1.104 Cyprus -0.936 -0.936 -0.429 -0.355 0.219 Finland -0.595 -0.595 -1.709 -0.123 -0.420 France -0.854 -0.854 -2.384 0.085 -1.532 Germany -0.313 -0.313 -0.177 -0.098 -0.498 Greece -0.543 -0.543 -2.558 -3.571 -3.454 67 Ireland -1.475 -1.475 -1.040 1.532 0.292 Italy -0.717 -0.717 -2.699 0.369 -1.297 Netherlands -1.495 -1.495 -1.682 0.257 -1.735 Portugal -0.092 -0.092 -2.008 0.729 -1.380 Spain -1.509 -1.509 -2.333 1.147 -1.067 UK -0.955 -0.023 Augmented Dickey-Fuller test statistics for the log-level of CDS spreads. Lag length is based on the AIC. All series fail to reject the null hypothesis of a unit root. Some papers who study the CDS spreads data also use other explanatory variables, such as macro fundamental indicators, credit agency rating scores (e.g., Bampinas et al., 2020; Beirne and Fratzscher, 2013). Our paper follows Broto et el. (2015), Bartlett and Prica (2016), and others, who only use the CDS spreads time series data. The reason for excluding the fundamental variables is due to data frequency. We use ML methods (TCDF) to study the CDS market. The ML method requires a relatively larger dataset. All relevant macroeconomic data is compiled monthly or quarterly, which offers too few data points during the crisis periods. Extrapolating a monthly series into a daily series is far from ideal. Also, the TCDF framework fits well with univariate time series data. Therefore, we only use the CDS spreads for data analysis. 2.5 Analytical frameworks We use two methods for analyzing the contagion and spillovers during the eurozone crises. One is a ML method, the TCDF, the other is a traditional econometric method, Granger causality. By using both methods on the same data, we can compare the results and understanding the pros and cons of each method. 68 2.5.1 Temporal Causal Discovery Framework (TCDF) Temporal Causal Discovery Framework (TCDF) is a novel ML algorithm developed by Nauta et al. (2019), it is among the first group of ML algorithms to incorporate deep learning into the causal discovery framework. TCDF uses Attention-based Dilated Depthwise Separable Temporal Convolutional Networks (AD-DSTCNs) to predict time series, then uses the attention scores obtained from the predictions to perform Attention Interpretation. After that, it applies Causal Validation and Delay Discovery to infer cause-effect relationships in the time series. TCDF can be categorized as a deep learning algorithm in the ML methods, as its core algorithms are convolutional neural networks (CNN). At the same time, TCDF also belongs to the broader community of Causal Learning (CL) algorithms. CL is the recent development of combining ML and causal inference. Its goal is to reveal causal information by analyzing purely observational data. Causal learning goes “beyond machine learning due to its power of uncovering data generating processes” (Chen et al., 2022). We have briefly introduced some CL methods in section 1.3 (causal random forest developed by Wager and Athey (2018), etc.), here we will discuss the CL methods in more details. There are two fundamental tasks in CL, one is to learn causal effects, and the other is to learn causal structure. Causal effect learning tasks include average causal effect estimation, heterogenous treatment effect estimation, counterfactual explanation, etc. Those tasks usually include an intervention in natural experiment, RCTs, or quasi-experimental settings. The tasks are intended to understand how the intervention affects the targeted outcome variable when all other things are equal. This is the classical ceteris paribus question in economics extended to the field of ML. For example, EconML15 is a Python toolbox that can estimate heterogeneous 15 https://www.microsoft.com/en-us/research/project/econml/ 69 treatment effects from observational data using ML algorithms, such as orthogonal random forests, doubly robust learners and deep IV. Causal structure learning task, or causal discovery task, is to examine whether a certain set of causal relationships exists among the variables (Chen et al., 2022). Our model TCDF belongs to this group. A simplified description of causal structural learning can be characterized as the following. Given a set of random variables 𝑉 = {𝑋1, ⋯ 𝑋𝑛}, we want to discover the causal graph that represents the causal relation of all variables. If variable 𝑋𝑖 is modified (through techniques such as permutation), another variable 𝑋𝑗 would change significantly when all other variables were fixed at some values, then this implies 𝑋𝑖 is a direct cause of 𝑋𝑗 (Chen et al., 2022; Schölkopf, 2022). The causal graph can be used to identify the effect that would occur in other variables when the value of a variable is changed (a treatment or intervention) on experimental data, it can also be used on observational data to discover a set of cause-effect relations among all the variables. Here are some examples of causal structure learning tools. CausalNex16 is a Python library that leverages Bayesian Networks to identify causal relations, it can encode or augment domain expertise into the graph model to conduct counterfactual analysis. CausalDiscovery17 is another library that includes 17 algorithms for graph skeleton identification and 19 algorithms for causal directed graph prediction, including 10 graphical and 9 pairwise approaches. The goal of CausalDiscovery is “to learn the causal graph and the associated causal mechanisms from the samples of the joint probability distribution of the data” (Kalainathan and Goudet, 2019). The TCDF used in our paper is another example, which can be used on time series data to uncover pairwise cause-effect relations in observational data. 16 https://github.com/quantumblacklabs/causalnex 17 https://github.com/FenTechSolutions/CausalDiscoveryToolbox 70 Causal structure learning is an ideal ML tool for uncovering causal structures in a dataset. This data-driven approach is model-free and assumption-free in most cases. It is suitable for large and complex data, and also traditional structured data. For cross-sectional data, it can be used for counterfactual explanation. For time-series data, it can identify the temporal precedence in pairs of variables, which makes them the perfect candidate for studying instantaneous and lagged interdependences among a set of variables. The research question of this paper is to understand the contagion and spillovers during the eurozone sovereign crises. We define contagion as the instantaneous effect from a source country (in our case, crisis country) to another country, spillovers are the lagged effect. The effects are observed through co-movements in the CDS market, characterized as pairwise cause- effect relationships. Out of all the current ML toolbox, the causal structure learning algorithms fit this question the best. It is by far the most powerful set of tools in ML that have the ability for causal discovery. The reason that we use TCDF out of all other causal structure learning algorithms is because, at the time of writing this paper, TCDF is one of the first algorithms to incorporate deep learning into the temporal causal discovery framework. Previous models such as Bayesian causal structure learning algorithms, do not have the representation power of neural networks. TCDF uses convolutional neural networks (CNN) instead of the commonly used recurrent neural network (RNN), thus it evades the vanishing gradient problem often associated with RNN. Also, CNN can automatically detect the important features in the data through backpropagation (Yamashita et al., 2018). By interpreting the internal parameters of the CNN, TCDF can not only discover the instantaneous cause-effect relationships, but also the delayed cause-effect relationships. Another reason to use TCDF is that it is designed for continuous time-series data, while the majority of causal structure learning algorithms only apply to i.i.d cross-sectional data (e.g., 71 CausalNex, pcalg18). In temporal causal discovery methods, many methods cannot tolerate non- stationarity nor nonlinearity, whereas TCDF can handle those data issues. TCDF has also shown an outstanding performance on discovering causal relations in an experiment using financial time series data (Nauta et al., 2019), making it the perfect candidate to study the CDS data. Comparing to econometric methods, TCDF is model-free and assumption-free, it can pick up complex nonlinear and dynamic relationships in the data, it can also withstand non- stationarity and heteroscedasticity (those are the features of CNN). TCDF can discover the exact time lag between the cause and its effect, and potential confounding factors in the cause-effect relationships. In a nutshell, TCDF is a flexible and powerful ML method, and it aligns perfectly with the research questions of this paper. Figure 2.8 shows the steps of a TCDF algorithm. In order to learn a temporal causal graph from time series data, it first performs a time series prediction with 𝑁 independent CNNs 𝑁1, ⋯ 𝑁𝑛, all having time series 𝑋1, ⋯ 𝑋𝑛 as input. It then uses the attention scores obtained from the prediction to run Attention Interpretation, after that, it will run the following two steps in parallel: Causal Validation and Delay Discovery. In the first step, TCDF uses Attention-based Dilated Depthwise Separable Temporal Convolutional Networks (AD-DSTCNs) to predict the time series, the network 𝑁𝑗 is trained to predict 𝑋𝑗. During this step, one needs to select the number of hidden CNN layers and the kernel width to train the networks. The AD-DSTCNs have incorporated an attention mechanism which can produce a vector of attention scores 𝑎𝑗 for each input time series. In the second step, the attention scores are compared across all input time series to determine the inputs that are the potential causes for each input time series. Each time series now has a set of potential causes. 18 https://cran.r-project.org/web/packages/pcalg/index.html 72 Figure 2.8. Architecture of the TCDF method, from Nauta et al. (2019). In step three, TCDF uses the Permutation Importance Validation Method (PIVM) to run causal validation. For each target time series, PIVM creates an intervened dataset for each potential cause where the values of the potential cause are randomly permuted, then runs the trained network in step one on the intervened dataset to predict the target time series and measures the intervention loss. If loss exists, then it implies that the potential cause is the actual cause for the target time series. Parallelly, in step four, since TCDF uses a depthwise separable architecture, when a real cause is detected, the kernel weights of the AD-DSTCN for the cause time series and effect time series can be used to infer the lag in the cause-effect relationship. Finally, TCDF merges the results from these four steps to create a causal graph that shows the discovered causal relationships and their delays. TCDF can also detect the existence of hidden 73 confounder between two time series (see Nauta et al. (2019) for a detailed explanation of the TCDF algorithms). TCDF can discover cause-effect relationships for both instantaneous and delayed causes in time series data. In the context of the eurozone crisis, this cause-and-effect detection is translated into contagion and spillovers among the European countries. 2.5.2 Granger Causality In order to compare the ML methods with a benchmark econometric model, we also use pairwise Granger causality to study the contagion and spillovers among the European countries. In the eurozone crisis literature, it is common to use vector autoregression (VAR) to study the contagion and spillovers (e.g., Kalbaska and Gątkowski, 2012; Gómez-Puig and Sosvilla- Rivero). Some of them adopt a panel VAR approach, since the macroeconomics fundamental variables are included in their data. In the broader literature on contagion and spillovers, Granger causality is widely used. For example, Nagayasu (2001) uses Granger causality to study the spillovers between the foreign exchange market and the stock market in the Philippines and Thailand. Bekiros (2014) use Granger causality to study the volatility spillovers from the U.S. to the BRIC markets during the 2008-2009 financial crisis. We pick Granger causality because it is the benchmark tool in the contagion and spillovers literature. Granger causality is a term named after Nobel laureate Clive Granger. If one time series 𝑋 is useful in forecasting another time series 𝑌, then one can say that 𝑋 Granger cause 𝑌 (Granger, 1969). The Granger causality test can be conducted by regressing 𝑌 on lagged values of 𝑋 and lagged values of 𝑌, if lagged values of 𝑋 can provide statistically significant information about values of 𝑌, then 𝑋 can Granger cause 𝑌. This paper uses a bivariate vector autoregressive (VAR) model of time series 𝑋1 and 𝑋2 to test the pairwise Granger causality between two countries. 74 In equation 2.1 and 2.2, 𝑋1 and 𝑋2 stands for the two CDS spreads time series in a country pair, 𝑝 is the maximum number of lags (model order), the vectors 𝐴11, 𝐴12, 𝐴21, 𝐴22 contain the coefficients of the model, 𝐸1 and 𝐸2 are the residuals for each time series. 𝑋1 can Granger cause 𝑋2 if the coefficients in 𝐴21 are jointly significantly different from zero, similarly, 𝑋2 can Granger cause 𝑋1 if the coefficients in 𝐴12 are jointly significantly different from zero. After fitting the VAR model, one can implement two causality tests. The first is a 𝐹-test for pairwise Granger causality, whether the coefficient in 𝐴12 or 𝐴21 are jointly zero. The second is a Wald-type test characterized by testing for nonzero correlation between the error processes of the two time series (see Pfaff and Stigler (2021) for details). These two tests are correspondent to the definition of spillovers and contagion in this paper. The Granger causality test can detect whether the past information of a time series is useful in forecasting another. This is commensurable to our definition of spillovers, that is the lagged transmission of a shock. The Wald-type test can detect the instantaneous correlation between the error processes, but not the direction of causality. This instantaneous causality is not a perfect measure for contagion (instantaneous transmission of a shock), but it can supplement the Granger causality results. For VAR, it is important to choose the proper lag length. We estimated the optimal lag lengths using various selection criteria: Akaike information criterion (AIC), Schwarz information criterion (SIC), Hannan-Quinn criterion (HQC) and final prediction error (FPE). The optimal lengths are similar according those criteria, we use the AIC results in this paper. 75 The existing literature has different choices of the optimal lag lengths in VAR. For example, Koutmos (2018) uses weekly CDS data to test for pairwise Granger causality in bivariate VAR, the author report results of all country pairs with lag 1, 2, 3, and 4. Kalbaska and Gątkowski (2012) use a multivariate VAR that includes all countries on weekly CDS, they choose a lag of 3 for pre-crisis and 6 for the crisis period according to AIC. Different from the above papers, our paper will use the optimal lag lengths for each country pairs and report the results for phase 2 and 3. Phase 1 and phase 4 have too many Granger causal relationships so they are not reported. If we allow unequal lag lengths of 𝑋1 and 𝑋2 in the VAR, that is to say, we relax the assumption that 𝑝 = 𝑞 and 𝑟 = 𝑠 in equations 2.3 and 2.4. This becomes the Hsiao’s version of Granger causality (Hsiao, 1981). Hsiao’s version of Granger causality is a step-wise procedure based on Granger's concept of causality, this method can identify the optimal lag for each bivariate autoregression with different lags for the variables. By comparing the AIC or FPE of the univariate regression and bivariate regression, one can infer causality between the variables. This approach is an extension of the original Granger causality, but they differ in the testing methods. The results from the Hsiao’s version of Granger causality test vary greatly from the original Granger causality test. Since the focus of this chapter is on the comparison between TCDF and Granger causality, we will only report the optimal lags between all the country pairs, but not the test results of the Hsiao’s version. 76 2.6 Results 2.6.1 TCDF results In the TCDF, there are two hyperparameters that need to be fine-tuned for the CNN: the number of hidden layers and the kernel width. For the number of hidden layers, generally, a hidden layer of 2 works best if one has a small dataset, increasing the layer number may cause overfitting. For the kernel width, a common suggestion for CNN is 3. After training several networks with different layer numbers and kernel widths, we pick a hidden layer of 3 and kernel size of 3 according to domain knowledge. Specifically, this paper has trained 4 networks with the same level of hidden layer and kernel size to make the results comparable. Though there are 4 phases in the CDS dataset, in the pre-crisis period (phase 1), there is too little data variation, as shown in both Figure 2.7 and Table A.3 panel A, so phase 1 is excluded from the TCDF analysis. The first causal graph (Figure 2.9) shows the results for the eurozone countries from 1 August 2007 to 30 April 2010 (phase 2). This is the crisis buildup period. We have 12 countries for phase 2 (the U.K. is excluded due to data availability). The countries are put in circles, the lines with arrowheads between the circles show pairwise causal direction. The number on the line is the lag of the cause-effect relationship, the unit of the lag is day (our CDS data excludes weekends). A question one might ask is why not train separate networks for different groups of countries? For example, a causal graph for only the 5 crisis countries or the 6 core countries? This is because of the black box nature of ML methods. If one performs TCDF on those subgroups of countries during the same periods, the results between the groups are not comparable to each other. For the purpose of consistency, we only report causal graphs of all countries in a given period. 77 Figure 2.9. Temporal causal graph of 12 Eurozone countries from 1 August 2007 to 30 April 2010 (phase 2). In Figure 2.9, there are four cause-effect relationships. Greece affects Cyprus and the Netherlands, with a lag of 21 and 54. Belgium and Ireland both impacts Finland, the lag length of Belgium is 54, Ireland is 68. There is no detected cause-effect relationship or confounder in the other 8 countries. Within the periphery eurozone countries, only Greece has an impact on Cyprus during the pre-crisis period. It is not surprising that Cyprus is affected by Greece. The bank of Greece reports the net inflows of foreign direct investment into Greece during the period from 2002 to 2010 per country of origin19. Cyprus ranks 2nd out of all countries, the Netherlands ranks 5th. Cyprus also holds a large amount of Greek assets (mainly Greek sovereign bonds) during this period (Zenios, 2013). This may explain the lagged impact of Greek CDS on Cyprus and the Netherlands. About the impact from Belgium and Ireland to Finland, we could not find relevant studies on this subject, a possible channel for the causal impact is through the banking sector. Ireland 19 https://www.bankofgreece.gr/en/statistics/external-sector/direct-investment/direct-investment---flows 78 has the largest banking sector in the EU, Belgium’s banking sector is the 3rd largest (see Figure 2.3). Such pan-European banks can insert an effect on the Finnish CDS through holding of Finnish sovereign bonds. The second result is the causal graph (Figure 2.10) for all 13 countries from 3 May 2010 to 29 March 2013 (phase 3). Phase 3 is the eurozone crisis period; there is a lot more going on in this graph compared to phase 2. This increase in the number of cause-effect relationships during the recession period is an indication of contagion during the crisis period. Figure 2.10. Temporal causal graph of all 13 countries from 3 May 2010 to 29 March 2013 (phase 3). In Figure 2.10, there are 11 cause-effect relationships and 1 hidden confounder, 7 out of the 11 relationships involve a periphery eurozone country. The double arrow-headed line between Ireland and Portugal suggests that the co-movement in their CDS spreads is caused by a hidden confounder, the confounder is not any other countries’ sovereign CDS spread. Greece, who is 79 the center of this graph and the center of the crisis, has impacts on four other countries: Cyprus, Finland, Germany, and the U.K., the lags are 0, 78, 33, 18, respectively. The Greece-Cyprus relationship carries on from phase 2 to phase 3, but the lag decreases from 21 to 0, showing a change from delayed spillovers to an instantaneous contagion from Greece to Cyprus, this is likely due to the holding of Greek sovereign bonds in the Cypriot banks (Zenios, 2013). During the crisis periods, the shocks of the Greek sovereign bonds speed up the transmission, shorten the lag between Greece and Cyprus. Though Greece exhibits its influence in Europe during phase 3, it does not affect the other crisis countries besides Cyprus. Ireland affects Finland, Germany and Italy, with lags of 15, 15 and 3. Ireland’s failing banking sector has an impact on many other eurozone countries because of its size. Ireland has an almost instantaneous influence on Italy. Surprisingly, given the geographical proximity and economic integration between Ireland and the U.K., Ireland does not affect the U.K. during the crisis. This result shows that countries within the eurozone are more vulnerable to shocks from the member states. Italy, who is affected by Ireland, has an influence on Spain with a lag of 3. This is an interesting chain of cause and effect that Ireland affects Italy with a lag of 3, then Italy affects Spain with a lag of 3. The impact from Italy to Spain can also be found in other phases. The existing literature presents numerous evidences of the interdependency between Italy and Spain (e.g., Broto and Pérez-Quirós, 2015). In our CDS data, we can only observe the spillovers from Italy to Spain, but not the other direction. The Netherlands, who has solid fiscal fundamentals during phase 3, affects France instantaneously. It also affects Austria and the U.K. with lags of 26 and 24. The last piece of the causal graph is Austria, who affects Germany with a lag of 57. Those results are new findings to the existing literature, but it is challenging to understand the interconnection among those countries by using the CDS data alone. 80 Out of the 13 countries, Greece has the largest influence, Germany receives the most impact. Being the largest exporter inside the European Union, Germany is affected by Greece and Ireland. One might speculate that the channel of transmission could also be through the export market. Both Greece and Ireland affect Finland. This confirms the transmission from the periphery countries to the core countries inside the euro area. Though this paper only includes one country outside of the eurozone, the U.K. receives influence from Greece as well. This insinuates the transmission from the eurozone countries to the European Union countries outside of the eurozone. All the 6 periphery eurozone countries are in one or more cause-effect relationships. This is certainly the sign of contagion and spillovers among the periphery eurozone countries. In the eurozone crisis literature, an important question is whether Greece is the origin of other countries’ sovereign crises. Our results show that Greece only has an influence on Cyprus, but not on any other periphery eurozone country. This finding concurs with the crisis consensus in section 2.2 of this chapter. The nature of each country’s crisis is different. The only country that has a pure sovereign crisis is Greece. The Cypriot banking crisis and sovereign crisis have their roots in holding too many Greek assets, this explains the instantaneous cause-effect relationship between the two countries during phase 3. Ireland and Spain’s problems root in their banking sectors, which are not directly linked to the falling Greek sovereign. Portugal and Italy have long-term slow economic growth and loss of competitiveness, they have a special “balance of payment crisis in a monetary union”, which is almost independent from what happens in Greece. The third result is the causal graph (Figure 2.11) of all 12 eurozone countries from 1 August 2007 to 29 March 2013 (phase 2 and phase 3). This period encompasses two major crises: the subprime crisis and the eurozone crisis. Therefore, the cause-effect relationships detected in such a long period do not reflect a transmission during a crisis, but rather the close economic 81 interdependences between the countries. Hence, Figure 2.11 can serve as a reference to other causal graphs. Figure 2.11. Temporal causal graph of 12 Eurozone countries from 1 August 2007 to 29 March 2013 (phase 2 and phase 3). Germany and France are the largest and second largest country in the eurozone. As the center of the European Union, these two countries join action in many areas, which explains the instantaneous relationship from France to Germany in Figure 2.11. Another chain of causal relationship is from Italy to Spain, then to Portugal, with a lag of 3 in both. This is an indication of regional integration of the southern peripheral economies. The cause-effect relationship between Italy and Spain is the same as Figure 2.10, that Italy affects Spain with a lag of 3, this relationship carries on to phase 4 as well. The other cause-effect relationships all have larger lags, which makes them less important in this context. 82 Figure 2.12. Temporal causal graph of all 13 countries from 1 April 2013 to 31 December 2015 (phase 4). The fourth result is for the post-crisis recovery period after the eurozone crisis. Figure 2.12 shows the causal graph of all 13 countries from 1 April 2013 to 31 December 2015 (phase 4). Interestingly, there are 11 cause-effect relationships and 1 hidden confounder, just like Figure 2.10. Belgium and the Netherlands affect each other with a lag of 21. This suggests the existence of a hidden confounder for those two countries during the recovery period. In the post-crisis periods, there is another “self-fulfilling” story of recovery. In this round, the previously distressed crisis economies are injecting vigor into Europe. In Figure 2.12, Portugal is the most influential country in phase 4, it affects 4 non-crisis countries: Belgium, Germany, the Netherlands, and the U.K., with lags of 54, 45, 43 and 72. 83 Portugal indeed has a remarkable post-crisis recovering rate (Reis, 2015), the TCDF results show that the bourgeoning Portuguese economy has spilled its vitality into the core European countries with a lag of roughly 2 months. Cyprus has an impact on Greece and the U.K., with lags of 0 and 78. The change of direction between Cyprus and Greece is of particular interest. In phase 2 and phase 3, Greece affects Cyprus during the economic downturns, but Cyprus affects Greece during the economic recovery in phase 4. The lag between the two is 0 in both phase 3 and phase 4, their close link through the Greek assets in the Cypriot banks remains the same. In phase 4, Ireland affects the U.K. with a lag of 54, this is new information since Ireland does not affect the U.K. in phase 2 and 3. Italy affects Spain with a lag of 3 (same result as in Figure 2.10 and Figure 2.11). There are other cause-effect relationships among the core eurozone countries, but they are unrelated to the topic of contagion and spillovers. 2.6.2 Granger causality results This paper uses bivariate VAR to detect the Granger causality using a 𝐹-test, and a Wald-type test for instantaneous causality in country pairs. For each VAR, we use AIC, SIC, HQC and FPE for optimal lag selection, all the criteria give similar results. We choose the AIC results to conduct the 𝐹-test and the Wald test. Table 2.7 shows the optimal lag length for pairwise Granger causality using AIC during the crisis period (phase 3), the optimal lag length for different country pairs varies from 1 to 8. The lags between Greece and other countries are 1 or 2. The lags among the periphery eurozone countries are almost 1, 2 and 3, the only exception is between Ireland and Spain, which is 8. 84 Table 2.7. Optimal lag length for pairwise Granger causality using AIC in phase 3. AT BE CY FI FR DE EL IR IT NL PT ES UK Austria .. 1 2 1 1 1 1 3 3 1 2 5 1 Belgium 1 .. 3 2 3 2 1 3 4 1 2 4 3 Cyprus 2 3 .. 2 3 3 2 2 2 2 2 3 2 Finland 1 2 2 .. 1 1 1 3 3 2 2 3 1 France 1 3 3 1 .. 1 1 3 3 3 3 7 3 Germany 1 2 3 1 1 .. 1 6 4 2 2 6 1 Greece 1 1 2 1 1 1 .. 1 2 1 1 2 1 Ireland 3 3 2 3 3 6 1 .. 3 3 6 8 3 Italy 3 4 2 3 3 4 2 3 .. 2 2 2 3 Netherlands 1 1 2 2 3 2 1 3 2 .. 2 2 2 Portugal 2 2 2 2 3 2 1 6 2 2 .. 2 2 Spain 5 4 3 3 7 6 2 8 2 2 2 .. 3 The U.K. 1 3 2 1 3 1 1 3 3 2 2 3 .. Please see Table A.1 for the country codes in the appendix. Table 2.8 and Table 2.9 report the test results of phase 2 (12 eurozone countries) and phase 3 (all 13 countries). In phase 1 and phase 4, the results show that almost all countries can Granger cause all other countries. For simplicity, the results of phase 1 and phase 4 are not reported. The top row of the tables are the cause countries, the numbers in red indicate an at least 5% significance level that the cause country can Granger cause the countries listed in the first column. For instance, in column 2 of Table 2.8, Austria can Granger cause all other countries, in column 4, Cyprus can only Granger cause Finland. This result echoes with the correlation matrix in Table A.3. Cyprus, being the smallest economy of all, lacks the strengths to affect bigger countries. In phase 2, almost all countries (except Cyprus and Finland) can Granger cause all other countries. In the crisis period (Table 2.9), one of the important research questions is whether Greece is the origin of other countries’ sovereign crises. Our results show that Greece does not Granger cause any other country, including the crisis country and the non-crisis country. Italy and Spain 85 Table 2.8. Granger causality test for the crisis buildup period (phase 2) AU BE CY FI FR DE EL IR IT NL PT ES Austria NA 0.00 0.98 0.16 0.01 0.01 0.01 0.00 0.03 0.29 0.09 0.00 Belgium 0.00 NA 0.56 0.12 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 Cyprus 0.00 0.00 NA 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Finland 0.00 0.00 0.00 NA 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 France 0.00 0.00 0.40 0.15 NA 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Germany 0.00 0.00 0.41 0.03 0.00 NA 0.00 0.00 0.00 0.02 0.00 0.00 Greece 0.00 0.00 0.09 0.08 0.00 0.00 NA 0.00 0.01 0.03 0.00 0.00 Ireland 0.00 0.00 0.58 0.01 0.00 0.00 0.00 NA 0.00 0.01 0.00 0.00 Italy 0.00 0.00 0.12 0.81 0.00 0.00 0.00 0.00 NA 0.06 0.01 0.00 Netherlands 0.00 0.00 0.73 0.00 0.00 0.00 0.00 0.00 0.00 NA 0.00 0.00 Portugal 0.00 0.00 0.36 0.01 0.00 0.00 0.00 0.00 0.00 0.00 NA 0.00 Spain 0.00 0.00 0.73 0.05 0.00 0.00 0.00 0.00 0.01 0.04 0.01 NA Red indicates significance at 5% level. Table 2.9. Granger causality test for the crisis period (phase 3) AU BE CY FI FR DE EL IR IT NL PT ES UK Austria NA 0.00 0.01 0.05 0.00 0.00 0.23 0.04 0.00 0.00 0.31 0.00 0.01 Belgium 0.68 NA 0.02 0.01 0.00 0.11 0.16 0.97 0.02 0.15 0.66 0.00 0.33 Cyprus 0.00 0.00 NA 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 Finland 0.01 0.00 0.04 NA 0.00 0.00 0.05 0.22 0.00 0.00 0.02 0.00 0.00 France 0.13 0.13 0.03 0.55 NA 0.31 0.01 0.24 0.03 0.22 0.26 0.00 0.47 Germany 0.30 0.00 0.05 0.32 0.01 NA 0.09 0.91 0.02 0.02 0.02 0.08 0.17 Greece 0.00 0.00 0.25 0.00 0.00 0.01 NA 0.00 0.02 0.00 0.00 0.00 0.01 Ireland 0.01 0.10 0.15 0.21 0.00 0.01 0.97 NA 0.00 0.02 0.03 0.00 0.07 Italy 0.03 0.06 0.12 0.06 0.01 0.00 0.15 0.62 NA 0.85 0.17 0.04 0.09 Netherlands 0.52 0.00 0.06 0.00 0.00 0.00 0.10 0.05 0.00 NA 0.10 0.00 0.24 Portugal 0.02 0.00 0.13 0.39 0.01 0.07 0.48 0.30 0.00 0.04 NA 0.00 0.01 Spain 0.01 0.37 0.60 0.30 0.02 0.13 0.07 0.79 0.00 0.20 0.67 NA 0.00 The U.K. 0.83 0.02 0.00 0.02 0.00 0.00 0.13 0.06 0.00 0.17 0.04 0.00 NA Red indicates significance at 5% level. 86 can Granger cause all other countries including the U.K. This implies that there are spillovers within the periphery eurozone countries, from the periphery eurozone countries to the core eurozone countries, and to outside of the eurozone. The Granger causality results of the periphery eurozone countries are summarized in Table 2.10. Table 2.10. Granger causality results for the periphery eurozone countries in phase 3 Cause Country can Granger cause: Cyprus None Greece None Ireland Cyprus, Greece Italy Cyprus, Greece, Ireland, Portugal, Spain Portugal Cyprus, Greece, Ireland Spain Cyprus, Greece, Ireland, Italy, Portugal From the above tables, we notice that a lot of Granger causality has been detected compared to the TCDF results for the lagged spillovers. Though Granger Causality is not equivalent to causality, it is closely linked with causality in settings such as a VAR (White et al., 2011). It can provide potential points of interest for further investigation; some spurious relationships might be included as well (He and Maekawa, 2001). A multivariate panel VAR with macroeconomic variables would improve upon these preliminary results. To compare the TCDF and the Granger causality regarding contagion, we also report some instantaneous causality results from the VAR. The instantaneous causality is an indicator of immediate transmission of a shock during the crisis. Table 2.11 shows all the instantaneous cause-effect relationships detected in the TCDF, and reports the Wald test results for instantaneous causality. 2 out of the 7 instantaneous cause-effect relationships in the TCDF 87 cannot be found using the VAR. During the crisis period (phase 3), Greece has no instantaneous causality towards Cyprus. Table 2.11. Instantaneous causality Wald tests Phase 3 EL cause CY IR cause PT PT cause IR NL cause FR N Y Y Y Phase 2+3 FR cause DE Y Phase 4 CY cause DE BE cause FI N Y Table 2.12. Optimal lag for Hsiao’s version of Granger causality for the crisis period (phase 3) AU BE CY FI FR DE EL IR IT NL PT ES UK Austria .. 1, 1 5, 2 5, 2 1, 1 1, 1 1, 1 1, 1 1, 1 1, 1 1, 1 1, 1 1, 1 Belgium 1, 1 .. 3, 5 1, 5 1, 1 1, 1 1, 1 1, 1 2, 6 1, 1 1, 1 2, 6 1, 1 Cyprus 1, 5 1, 5 .. 1, 5 1, 5 1, 5 2, 5 1, 5 1, 5 1, 5 1, 5 3, 2 1, 5 Finland 1, 4 1, 5 1, 1 .. 1, 1 1, 1 1, 1 2, 5 5, 2 2, 5 5, 2 1, 3 3, 5 France 3, 1 1, 1 3, 5 1, 5 .. 1, 1 3, 1 1, 1 1, 1 1, 1 5, 2 1, 1 4, 2 Germany 2, 2 2, 2 5, 1 5, 1 1, 1 .. 1, 1 2, 2 4, 2 2, 1 2, 2 4, 1 2, 1 Greece 1, 5 1, 5 1, 5 1, 5 1, 5 1, 5 .. 1, 5 1, 5 1, 5 1, 1 1, 5 1, 5 Ireland 5, 3 3, 5 1, 6 5, 4 5, 2 10,4 5, 1 .. 5, 2 3, 5 3, 2 5, 3 3, 5 Italy 1, 5 3, 6 5, 1 1, 5 5, 1 3, 5 2, 5 3, 3 .. 1, 5 1, 3 2, 5 1, 5 Netherlands 1, 1 1, 1 3, 5 5, 2 1, 1 1, 1 1, 1 1, 2 1, 1 .. 1, 1 1, 1 1, 1 Portugal 1, 9 10,4 1,10 10,5 10,2 10,4 10,1 9, 2 10,2 2,10 .. 5, 2 2, 9 Spain 4, 5 8, 2 5, 1 2, 7 5, 1 7, 4 5, 2 1, 4 1, 4 7, 1 3, 2 .. 5, 3 The U.K. 1, 1 1, 1 7, 1 7, 3 7, 2 7, 1 7, 1 7, 2 1, 1 1, 7 2, 4 2, 4 .. For Hsiao’s version of Granger causality, the optimal lags of country pairs in the crisis period are reported in Table 2.12. The top row of the panels are the cause countries, the columns are the countries that receive the impact. For example, column 2 shows the lag lengths when Austria is the cause country, between Austria and Belgium, both lag length is 1, between Austria and Cyprus, the lag for Austria is 1, while the lag of Cyprus is 5. From Table 2.7 and Table 2.12, 88 we can see that the optimal lag lengths for country pairs vary greatly for Granger causality and Hsiao’s version of Granger causality. The results of the Hsiao’ tests show no causality between most country pairs. Since the results of Hsiao’s test and Granger causality test are quite different, we will focus on Granger causality and not report the Hsiao’s results. 2.6.3 Comparison There are four main differences between the TCDF results and Granger causality results. First, we can detect a lot more Granger causality pairs than the TCDF, especially in phase 2 and phase 4. About 90% of the country pairs show a Granger causality in phase 2, but there are only 4 cause-effect relationships in the TCDF results. An explanation for the large amount of causality is that the bivariate Granger causality test can be affected by an omitted variable, thus showing many spurious relationships (He and Maekawa, 2001). Second, in each phase, the Granger causality results differ greatly from the TCDF results. For example, during the crisis period, there is no Granger causality nor instantaneous causality from Greece to any crisis country. But in the TCDF results, Greece can affect Cyprus instantaneously. Also, Ireland can Granger cause Cyprus and Greece, but the TCDF results show that Ireland has an instantaneous effect on Portugal and a lagged effect on Italy. Third, the lags of the causality are different. The TCDF can detect long lags (e.g., 51, 78) between country pairs where the Granger causality optimal lag lengths are mostly between 1- 10. Fourth, in all lagged cause-effect relationships found in the TCDF, if we use the lag length of the TCDF to conduct a Granger causality test in those country pairs, we can only detect some Granger causality, but not all. In phase 2, all TCDF pairwise cause-effect relationships are significant in Granger causality results. In phase 3, 5 out of the 12 cause-effect relationships are not detected by Granger causality. In phase 2 and 3, 2 out of the 8 cause-effect relationships are not detected by Granger 89 causality. In phase 4, 7 out of the 11 cause-effect relationships are not detected by Granger causality. To sum up, the Granger causality test can pick up a larger amount of causality among the countries, but the larger amount only laps over part of the cause-effect relationships picked up by the TCDF. The TCDF results are more consistent with the domain knowledge. Many factors can contribute to the differences in the test results. For ML algorithms, they are well-known for their ability to pick the nonlinear patterns in the data, while the VAR can only pick up linear relationships. The TCDF algorithm uses the time series of all countries to fit the convolutional neural networks for the prediction task, while the pairwise Granger causality test only uses the time series of two countries. Because of the depthwise separable architecture of the TCDF, it can detect the lag between a cause and an effect, it can detect both the instantaneous (lag = 0) and delayed (lag > 0) causal relationships, while the Granger causality test leans to the lagged impact. The lag length selection of VAR is not automated, whereas the TCDF can calculate lag lengths from 0 up to the length of the input time series automatically. The TCDF can also pick up potential confounder between country pairs. For instance, there is a confounder between Ireland and Portugal during the crisis period, a confounder between Belgium and the Netherlands with a lag of 21 after the crisis. The causal validation step in the TCDF can filter out spurious relationships in country pairs, therefore, the results of TCDF are much more succinct and comprehensive than the Granger causality approach. 2.7 Discussion This chapter aims to study the contagion during the eurozone crisis using a novel ML method. It answers the four critical questions raised in the introduction of this chapter. The first question is “are there contagion and spillovers during the eurozone crisis?” The answer is yes. There is a 90 sharp increase in the number of cause-effect relationships from the crisis buildup period (phase 2) to the actual crisis period (phase 3). According to the first group’s definition of contagion in section 2.1, the increase itself marks the existence of contagion during the eurozone crisis. Also, in Figure 2.10, the 11 pairs of cause-effect relationships and 1 confounding factor exhibit the close links among the European economies, for instance, there is an instantaneous effect from Greece to Cyprus, and delayed effect from Ireland to Italy, then to Spain. The second question is whether Greece is the origin of all the crises in the eurozone and how the crisis countries affect each other. The TCDF results tell that Greece is not the black sheep of all the eurozone crises. Indeed, Greece has a great impact on Cyprus in the crisis buildup and actual crisis period, causing the banking crisis and sovereign crisis in Cyprus. The impact from Greece to Cyprus could also be observed from the decreased number of lags from the crisis buildup period to the actual crisis period. During the crisis period from 3 May 2010 to 29 March 2013 (phase 3), the Greek CDS spread can affect the Cypriot CDS spread instantaneously, showing a contagion between the two. Other than Cyprus, Greece does not impact any other periphery eurozone countries during the buildup and crisis period. This result confirms the finding in the eurozone crisis consensus that the sovereign crises in Ireland, Portugal and Spain are not of a fiscal nature. During the crisis period, contagion and spillovers exist among the 6 periphery eurozone countries and among the 5 crisis countries. There is a two-way causal relationship between Ireland and Portugal, the two countries affect each other instantaneously, suggesting the existence of a confounding factor. Ireland also affects Italy with a lag of 3, Italy then affects Spain with a lag of 3. Ireland was the second country to receive a bailout program in the eurozone, the market risk of the Irish bond was transmitted to Italy and then to Spain. The third question is whether there are contagion and spillovers from the crisis countries to the non-crisis country inside the eurozone. The answer is yes. In Figure 2.10, Greece shows its impact on Germany and Finland. Ireland has an influence on Finland. These findings confirm 91 the spillovers from the crisis countries to the core countries. Contagion and spillovers also exist among the core countries in the eurozone. The Netherlands, who has a high private debt like Portugal, also contributes to the risk spillovers in the CDS market. It affects France instantaneously and affects Austria and the U.K. with time lags. The fourth question is whether there exist contagion and spillovers from the eurozone countries to non-eurozone countries in the European Union. Since the data used in this paper only includes one non-eurozone country (the U.K.), the TCDF results can only provide a partial picture of this question. Figure 2.10 shows that the U.K. is affected by Greece and the Netherlands during the crisis period, the spillovers from Greece confirm the transmission of market risk from the crisis countries to countries outside of the eurozone. The answers to the four questions using TCDF are different with most previous studies, most of the existing literature suggest that there are no contagion and spillovers during the eurozone crisis, or that contagion and spillovers exist but the shock from Greece affects all other crisis countries. Those findings are in contrast with the TCDF results. There are two papers that have similar results to our paper (Koutmos, 2018; Bampinas et al., 2020). These two papers find evidence that Greece can affect some periphery countries but not all crisis countries, and there are contagion and spillovers from the periphery countries to the core countries. Koutmos (2018) uses the weekly CDS spreads to model the Granger Causality between the CDS time series data using a VAR. They find that there exists lots of pairwise Granger causalities among all the European countries. The author then uses the proportional percentage of each country to represent its power to transmit risk, the percentage of a country is defined as the ratio between the number of countries that one can Granger cause over the total number of countries. Their results show that Greece affects 63% of other countries during the crisis period when the lag length of VAR is 1, this percentage is on a par with France, but lower than that of Belgium and Portugal. 92 Bampinas et al. (2020) uses the daily sovereign bond spreads and CDS spreads to study the cross-border and intra-market linkage in the eurozone countries from 2006 to 2018, they use bootstrap test for local Gaussian correlation to determine the existence of contagion between the markets. Their findings show that contagion occurs from the periphery eurozone countries’ CDS spreads to their own bond spreads and to Belgium’s bond market; the shocks in Italian and Spanish CDS spreads spill towards all other European CDS spreads. Our TCDF results have consistent answers to the four questions with the above two papers, but the TCDF results include a lot more information that was not seen in previous literature. Mainly, there are four aspects. First, the TCDF can detect many cause-effect relationships that are not seen in previous works, for instance, we find that during the crisis period, there is a chain of causal effect from Ireland to Italy with a lag of 3, then from Italy to Spain with a lag of 3; the traditional econometric tools are not suitable for such causal discovery task. Second, we can observe the change of direction between some country pairs. The cause-effect relationship between Greece and Cyprus is found in all the causal graphs. Before and during the crisis, the shock of the Greek sovereign impacts Cyprus, but after the crisis, the impact is reversed that Cyprus affects Greece with its economic recovery. Third, the lags of many cause-effect relationships are new to the literature. For example, Austria affects Germany with a lag of 57 (roughly two months) during the eurozone crisis, the Netherlands affects France instantaneously. The lags are less interesting topics in the eurozone crisis literature, we cannot find many relevant studies on the lags of the non-crisis countries. The TCDF results can also find the changes in the lags in different phases. For instance, Greece affects Cyprus with a lag of 21 before the crisis, then the lag becomes 0 during the crisis. Between Ireland and Finland, the lag changes from 68 to 15 before and during the crisis. Some previous studies have similar results showing intensified spillovers during the crisis period (e.g., Bekaert et al., 2005), but the results are mostly for the crisis countries. Fourth, the TCDF can also detect confounding factors between country pairs. The omitted variables can lead to bias in 93 the estimations in traditional econometric methods and there is usually no test for potential confounding factors. The TCDF results show some confounding factors, such as a confounding factor between Ireland and Portugal during the crisis period. To sum up, the novel Temporal Causal Discovery Framework can detect instantaneous and delayed causal relationships in country pairs through a causality validation step. It offers a filtered yet richer understanding of the time series data. A lot of the cause-effect relationships detected by the TCDF are not seen in previous works. The TCDF results offer detailed causal graphs of the direction and lag of contagion and spillovers in the eurozone crisis. In a nutshell, the TCDF results add new findings to the existing literature, and these findings confirm the consensus view of the eurozone crisis. 2.8 Conclusion In the existing eurozone crisis literature, there is no consensus on whether contagion exists during the 2010-2013 crisis. Though it has been a decade since the onset of the eurozone crisis, this question is still of importance in understanding the crisis dynamics inside the eurozone to prevent future crisis. The results from TCDF present evidence of contagion and spillovers during the eurozone crisis. It also shows the lag of spillovers from a cause country to the effect country. The existence of confounder between two countries can also be detected by identifying two-way cause and effect relationships. This paper contributes to the literature in three ways. First, to the author’s knowledge, this is one of the first attempts that use ML methods to study the contagion and spillovers during the eurozone crisis. Second, this paper goes beyond the common prediction problem associated with ML, we use deep learning causal graphs in the field of macroeconomics. Third, by using the novel deep learning framework, this paper provides a granular report of the eurozone crisis 94 contagion and spillovers, adding new findings to the repository, this is of great importance to future crisis management and macro-prudential regulations in the European Union. The application of causal learning methods in the field of macroeconomics is still rare, as an early attempt, this paper has its limitations. Improvement can be made in the following directions. The first direction is data. Because of the lack of availability of high frequency macroeconomic variables, our paper only uses the daily CDS spreads data for causal discovery. The field of macroeconomics has seen more and more high frequency granular dataset, for example, researcher in MIT and Harvard start the Billion prices project20 which can provide daily consumer price and monthly inflation rate in major countries around the world. In the eurozone crisis context, future research could improve by adding more macroeconomic covariates to the TCDF framework. Also, there are many cause-effect relationships that are not presented before, for example, during the eurozone crisis, the Netherlands affects France instantaneously, affects Austria and the U.K. with lags of 26 and 24. Those are interesting data patterns that are picked up by the TCDF, our CDS data cannot support a deep-dive analysis on these findings. Future research can focus on those less visited topics by utilizing more country specific variables. Besides the macroeconomic variables, there are some high frequency data available, such as sovereign bond yield, stock market index. Such data can all be combined with the CDS data to expand the current time series observations into higher dimensional data. Moreover, our paper only studies the effects between two countries, but not among country groups. A further extension is to use country groups as features to analyze the contagion and spillovers from the crisis countries to other country groups (e.g., OECD countries, developing countries). One drawback of the TCDF framework is that the convolutional neural nets cannot identify the exact timing of the cause and effect, one can only observe the lags between the cause-effect pairs. This is the reason why we split the entire sample into four phases to study the crisis 20 http://www.thebillionpricesproject.com/ 95 dynamic. Other studies that do not have a temporal measure of the spillovers also follow this approach of sample splitting. Different splitting strategies could give different results, for instance, our results for phase 2, phase 3 and phase 2+3 display similar but distinct cause-effect relationships, we find a smaller number of cause-effect relationships in the longer period (phase 2+3). The current causal discovery learning tools such as TCDF, CausalNex and pcalg do not have this capability to measure the temporal effect of transmissions, future work can improve the current models by incorporating a temporal measurement. The TCDF belongs to causal structure learning tools, not the causal effect estimation learning tools. One can also apply the causal effect estimation learning on the CDS data. It should require a separate set of time series variable on countries outside of the EU as control group. This allows the discovery of heterogeneous effect of crisis for each country, or the crisis country as a whole. Besides measuring the magnitude of the crisis effect, the temporal dimension of the effects can also be obtained, i.e., how long does the impact last for each country? For instance, the Bayesian structural time series (bsts) model and CausalImpact can construct the counterfactual for each crisis country, then a Markov Chain Monte Carlo algorithm can be used for posterior inference to report the pointwise 95% predictive intervals of the crisis effect, the time series of pointwise intervals can provide further information of the temporal evolution of the crisis. Other than ML methods, traditional econometric tools should be explored on this subject as well, a comparison between non-parametric estimation and the ML estimation of the spillovers can be an interesting topic. Different ML models and econometrics models can be crossed examined to provide a panorama view on this subject. Our paper provides an overview of the eurozone crisis using ML methods, and there are many other topics in macroeconomics that are dominated by the model-driven research paradigm. We believe that with the rapid development of causal learning, more advanced ML techniques could find their place in the field of macroeconomics. 96 Appendix A Table A.1 Timeline of major events in the European Union. Feb 1992 Maastricht treaty signed Sep 1992 European exchange-rate mechanism (ERM) crisis Oct 1993 Maastricht treaty ratified Jun 1997 Stability and Growth Pact signed Jan 1999 European Monetary Union (EMU) begins with 11 countries Jan 2001 Greece joins EMU Jan 2002 Euro notes and coins introduced Nov 2003 Germany and France breach stability pact Aug 2007 ECB liquidity injection begins Jan 2008 Cyprus and Malta join the euro Sep 2008 Lehman Brothers collapses Jan 2009 Greece downgraded Nov 2009 New Greek government admits to a bigger budget deficit May 2010 First Greek bailout June 2010 The temporary European Financial Stability Facility (EFSF) is created Oct 2010 Deauville deal on private-sector involvement Nov 2010 Irish bailout May 2011 Portuguese bailout July 2011 Second Greek bailout Aug 2011 ECB buys Italian and Spanish bonds Oct 2011 Haircut on Greek debt Nov 2011 Mario Draghi becomes ECB president Dec 2011 ECB launches LTRO. Fiscal compact treaty agreed Feb 2012 New Spanish government admits higher budget deficit Jun 2012 Partial bailout for Spanish banks July 2012 Draghi gives “whatever it takes” speech Aug 2012 ECB agrees Outright Monetary Transactions (OMT) program 97 Sep 2012 The European Stability Mechanism is created Nov 2012 Greek debt burden spread out; interest rate cut Feb 2013 Indecisive Italian election Mar 2013 First Cyprus bailout, banks shut May 2013 Second Cyprus bailout Dec 2013 Ireland exits bailout program Jan 2014 Latvia joins euro, Spain exits bailout program June 2014 Portugal exits bailout program Nov 2014 ECB takes over supervision of the most important banks in the eurozone Aug 2015 Third Greek bailout Mar 2016 Cyprus exits bailout program Aug 2018 Greek exits bailout program Timeline of major events in the European Union from 1992 to 2018. Adapted from Peet and La Guardia (2012). Table A.2. Descriptive statistics of CDS spreads in basis points. Country Mean Median Std. Dev. Min Max Austria 2.06 2.00304 .367 1.493 2.698 Belgium 2.45 2.49106 .277 1.926 3.119 Cyprus 7.694 7.8675 1.158 5.485 9.718 Finland 1.792 1.625 .585 1.083 3.088 France 2.02 1.8305 .445 1.47 2.81 Germany 2.028 1.84057 .52 1.292 3.204 Greece 10.29 11.3263 3.6 4.717 15.357 Ireland 2.368 2.39323 .297 1.667 3.137 Italy 9.273 9.42577 2.156 5.291 13.514 Netherlands 1.794 1.81833 .337 1.129 2.575 Portugal 6.221 6.17095 1.575 3.864 8.99 Spain 2.941 2.8709 .42 2.348 5.566 Panel A. Phase 1, from 3 October 2005 to 31 July 2007, observations = 417 98 Country Mean Median Std. Dev. Min Max Austria 59.559 58.37813 56.94 1.92 268.879 Belgium 43.593 35.61889 32.985 2.6 155.526 Cyprus 70.735 63.5 51.096 5.25 196.868 Finland 23.865 21.68106 19.766 1.619 92.231 France 28.215 24.53916 22.055 2.216 97.875 Germany 23.369 22.13789 19.324 2.05 91.375 Greece 136.992 118.3939 122.562 7.584 821.622 Ireland 110.516 120.7518 91.919 3.085 384.344 Italy 76.534 71.83957 51.223 8.1 197.784 Netherlands 32.582 29.84436 30.638 1.727 127.831 Portugal 67.183 55.75266 51.5 6.075 382.591 Spain 67.702 67.64828 43.647 4.705 207.585 Panel B. Phase 2, from 1 August 2007 to 30 April 2010, observations = 718 Country Mean Median Std. Dev. Min Max Austria 103.678 86.82967 50.707 39.155 239.848 Belgium 174.867 154.0847 74.996 69.645 404.419 Cyprus 752.581 903.2707 477.772 124.391 1683.682 Finland 45.963 35.70949 19.762 23.854 90.174 France 120.77 92.96382 51.134 60.136 247.309 Germany 59.533 51.01689 23.466 24.11 115.667 Greece 5037.86 3818.011 4689.62 513.692 21464.406 Ireland 517.074 573.6684 230.018 158.286 1263.406 Italy 305.116 268.8052 130.811 124.84 590.624 Netherlands 68.27 54.03103 29.195 28.405 135.452 Portugal 709.558 562.2519 354.443 198.886 1656.674 Spain 333.239 308.1213 105.668 143.947 633.486 United Kingdom 65.313 65.33173 16.213 27.835 103.562 Panel C. Phase 3, from 3 May 2010 to 29 March 2013, observations = 760 99 Country Mean Median Std. Dev. Min Max Austria 29.736 27.977 6.364 19.996 45.608 Belgium 46.282 43.816 10.448 30.769 79.415 Cyprus 559.066 433.510 286.416 243.238 1356.544 Finland 22.569 22.259 3.043 17.061 32.471 France 46.466 43.765 14.946 23.68 82.587 Germany 20.321 20.242 6.145 11.479 37.053 Greece 1127.01 977.334 748.784 376.968 5622.042 Ireland 80.16 53.588 43.285 36.255 188.826 Italy 143.574 115.273 59.746 81.038 300.402 Netherlands 29.922 28.861 13.49 13.93 58.92 Portugal 234.944 179.417 111.271 105.783 547.749 Spain 123.696 91.111 64.694 54.372 295.584 United Kingdom 24.882 20.323 9.399 15.183 51.945 Panel D. Phase 4, from 1 April 2013 to 31 December 2015, observations = 715 Country Mean Median Std. Dev. Min Max Austria 53.859 34.264 54.442 1.493 268.879 Belgium 74.329 45.512 78.66 1.926 404.419 Cyprus 384.327 186.977 431.465 5.25 1683.682 Finland 25.865 22.966 21.003 1.083 92.231 France 54.768 42.868 53.664 1.47 247.309 Germany 29.034 22.690 26.292 1.292 115.667 Greece 1774.48 532.026 3290.981 4.717 21464.406 Ireland 198.79 117.388 243.772 1.667 1263.406 Italy 147.535 109.996 135.552 5.291 590.624 Netherlands 36.528 31.706 32.583 1.129 135.452 Portugal 284.065 157.163 344.094 3.864 1656.674 Spain 146.71 91.981 142.229 2.348 633.486 Panel E. Full sample, from 3 October 2005 to 31 December 2015, observations = 2,670 100 101 Table A.3. Correlation matrix of the log-changes of the CDS spreads. Country Austria Belgium Cyprus Finland Germany Greece Ireland Italy Netherlands Portugal Spain Austria 1.000 Belgium 0.371 1.000 Cyprus 0.014 -0.107 1.000 Finland 0.011 0.016 -0.187 1.000 Germany 0.105 0.103 -0.006 0.082 1.000 Greece 0.092 0.128 -0.059 0.096 0.265 1.000 Ireland 0.031 0.102 -0.136 0.112 -0.019 0.082 1.000 Italy 0.069 0.183 0.050 0.050 0.153 0.335 0.041 1.000 Netherlands -0.201 0.027 -0.248 0.172 0.074 0.003 0.179 0.063 1.000 Portugal 0.099 0.138 0.063 -0.018 0.168 0.275 0.071 0.304 -0.046 1.000 Spain 0.061 0.037 0.026 0.021 0.132 0.093 0.054 0.080 0.000 0.134 1.000 Panel A. Phase 1, from 3 October 2005 to 31 July 2007, observations = 417 Country Austria Belgium Cyprus Finland Germany Greece Ireland Italy Netherlands Portugal Spain Austria 1.000 Belgium 0.671 1.000 Cyprus 0.229 0.254 1.000 Finland 0.514 0.519 0.159 1.000 Germany 0.646 0.619 0.237 0.499 1.000 Greece 0.613 0.617 0.241 0.435 0.533 1.000 Ireland 0.599 0.618 0.212 0.413 0.520 0.543 1.000 Italy 0.718 0.707 0.214 0.557 0.625 0.742 0.591 1.000 Netherlands 0.596 0.581 0.233 0.516 0.581 0.431 0.502 0.607 1.000 Portugal 0.631 0.679 0.194 0.434 0.579 0.664 0.580 0.725 0.486 1.000 Spain 0.665 0.732 0.256 0.468 0.609 0.662 0.606 0.758 0.574 0.734 1.000 Panel B. Phase 2, from 1 August 2007 to 30 April 2010, observations = 718 102 Country Austria Belgium Cyprus Finland Germany Greece Ireland Italy Netherlands Portugal Spain UK Austria 1.000 Belgium 0.776 1.000 Cyprus 0.073 0.085 1.000 Finland 0.726 0.719 0.118 1.000 Germany 0.763 0.746 0.109 0.724 1.000 Greece 0.249 0.261 -0.049 0.243 0.210 1.000 Ireland 0.628 0.689 0.106 0.594 0.610 0.329 1.000 Italy 0.707 0.793 0.123 0.668 0.695 0.302 0.780 1.000 Netherlands 0.768 0.774 0.089 0.721 0.781 0.240 0.627 0.707 1.000 Portugal 0.559 0.641 0.106 0.542 0.556 0.339 0.816 0.742 0.572 1.000 Spain 0.693 0.788 0.142 0.664 0.667 0.302 0.793 0.913 0.701 0.767 1.000 UK 0.738 0.761 0.069 0.688 0.762 0.241 0.632 0.704 0.753 0.584 0.682 1.000 Panel C. Phase 3, from 3 May 2010 to 29 March 2013, observations = 760 Country Austria Belgium Cyprus Finland Germany Greece Ireland Italy Netherlands Portugal Spain UK Austria 1.000 Belgium 0.501 1.000 Cyprus 0.042 0.131 1.000 Finland 0.335 0.353 0.035 1.000 Germany 0.406 0.509 0.035 0.335 1.000 Greece 0.206 0.238 0.131 0.117 0.162 1.000 Ireland 0.405 0.580 0.186 0.303 0.415 0.362 1.000 Italy 0.316 0.541 0.160 0.264 0.365 0.445 0.703 1.000 Netherlands 0.328 0.343 0.059 0.250 0.328 0.177 0.329 0.238 1.000 Portugal 0.347 0.496 0.151 0.254 0.359 0.416 0.645 0.809 0.270 1.000 Spain 0.309 0.537 0.157 0.256 0.366 0.441 0.690 0.924 0.224 0.797 1.000 UK 0.247 0.334 0.027 0.269 0.334 0.161 0.338 0.289 0.160 0.276 0.290 1.000 Panel D. Phase 4, from 1 April 2013 to 31 December 2015, observations = 715 103 Country Austria Belgium Cyprus Finland Germany Greece Ireland Italy Netherlands Portugal Spain Austria 1.000 Belgium 0.674 1.000 Cyprus 0.161 0.181 1.000 Finland 0.426 0.438 0.090 1.000 Germany 0.601 0.598 0.171 0.424 1.000 Greece 0.322 0.338 0.095 0.214 0.283 1.000 Ireland 0.485 0.540 0.137 0.343 0.421 0.321 1.000 Italy 0.594 0.668 0.155 0.406 0.543 0.440 0.550 1.000 Netherlands 0.504 0.543 0.140 0.440 0.525 0.232 0.441 0.493 1.000 Portugal 0.528 0.603 0.155 0.329 0.495 0.431 0.554 0.718 0.406 1.000 Spain 0.567 0.665 0.187 0.368 0.528 0.414 0.557 0.818 0.471 0.722 1.000 Panel E. Full sample, from 3 October 2005 to 31 December 2015, observations = 2,670 Table A.4. Country codes for European countries Austria AT Belgium BE Cyprus CY Finland FI France FR Germany DE Greece EL Ireland IR Italy IT Netherlands NL Portugal PT Spain ES United Kingdom UK The list of country code is from Eurostat. 104 REFERENCES Alter, Adrian; Beyer, Andreas (2014): The dynamics of spillover effects during the European sovereign debt turmoil. In Journal of Banking & Finance 42, pp. 134–153. Angelini, Paolo; Grande, Giuseppe; Panetta, Fabio (2014): The negative feedback loop between banks and sovereigns. Arghyrou, Michael G.; Kontonikas, Alexandros (2012): The EMU sovereign-debt crisis: Fundamentals, expectations and contagion. In Journal of International Financial Markets, Institutions and Money 22 (4), pp. 658–677. Augustin, Patrick (2014): Sovereign credit default swap premia. In Journal of Investment Management. Baldwin, Richard; Beck, Thorsten; et al (2015): Rebooting the eurozone: Step 1-agreeing a crisis narrative. In CEPR Policy Insight No.85. Baldwin, Richard; Giavazzi Francesco (2015): The eurozone crisis: A consensus view of the causes and a few possible remedies: CEPR. Bampinas, Georgios; Panagiotidis, Theodore; Politsidis, Panagiotis (2020): Sovereign bond and CDS market contagion: A story from the Eurozone crisis. Bartlett, William; Prica, Ivana (2017): Interdependence between core and peripheries of the European economy: Secular stagnation and growth in the western balkans. In The European journal of comparative economics. Beirne, John; Caporale, Guglielmo Maria; Schulze-Ghattas, Marianne; Spagnolo, Nicola (2013): Volatility spillovers and contagion from mature to emerging stock markets. In Review of International Economics 21 (5), pp. 1060–1075. Beirne, John; Fratzscher, Marcel (2013): The pricing of sovereign risk and contagion during the European sovereign debt crisis. In Journal of International Money and Finance 34, pp. 60–82. Bhanot, Karan; Burns, Natasha; Hunter, Delroy; Williams, Michael (2012): Was there contagion in eurozone sovereign bond markets during the Greek debt crisis. Broto, Carmen; Pérez-Quirós, Gabriel (2015): Disentangling contagion among sovereign CDS spreads during the European debt crisis. In Journal of Empirical Finance 32, pp. 165–179. Bruyckere, Valerie de; Gerhardt, Maria; Schepens, Glenn; Vander Vennet, Rudi (2013): Bank/sovereign risk spillovers in the European debt crisis. In Journal of Banking & Finance 37 (12), pp. 4793–4809. Buchholz, Manuel; Tonzer, Lena (2016): Sovereign credit risk co-movements in the eurozone: Simple interdependence or contagion? In International Finance 19 (3), pp. 246–268. 105 Caporin, Massimiliano; Pelizzon, Loriana; Ravazzolo, Francesco; Rigobon, Roberto (2018): Measuring sovereign contagion in Europe. In Journal of Financial Stability 34, pp. 150–181. Chen, Ruo; Milesi-Ferretti, Gian Maria; Tressel, Thierry (2013): External imbalances in the eurozone. In Economic Policy 28 (73), pp. 101–142. Cheng, Lu; Guo, Ruocheng; Moraffah, Raha; Sheth, Paras; Candan, K. Selcuk; Liu, Huan (2022): Evaluation methods and measures for causal learning algorithms. In IEEE Transactions on Artificial Intelligence. Claeys, Peter; Vašíček, Bořek (2014): Measuring bilateral spillover and testing contagion on sovereign bond markets in Europe. In Journal of Banking & Finance 46, pp. 151–165. Constâncio, Vítor (2013): The European crisis and the role of the financial system, updated on 3/10/2021, checked on 3/10/2021. Cripps, Francis; Izurieta, Alex; Singh, Ajit (2011): Global imbalances, under-consumption and over-borrowing: The State of the world economy and future policies. In Development and Change 42 (1), pp. 228–261. Croci, Elisabetta Angelini; Farina, Francesco; Valentini, Enzo (2016): Contagion across eurozone’s sovereign spreads and the core-periphery divide. In Empirica 43 (1), pp. 197–213. Cronin, David; Flavin, Thomas J.; Sheenan, Lisa (2016): Contagion in eurozone sovereign bond markets? The good, the bad and the ugly. In Economics Letters 143, pp. 5–8. Di Quirico, Roberto (2010): Italy and the global economic crisis. In Bulletin of Italian Politics 2 (2). Dornbusch, R.; Park, Y. C.; Claessens, S. (2000): Contagion: Understanding how it spreads. In The World Bank Research Observer 15 (2), pp. 177–197. Eichengreen, Barry; Gupta, Poonam (2018): Managing sudden stops. In Central Banking, Analysis, and Economic Policies Book Series 25, pp. 9–47. Frankel, Jeffrey (2015): The euro crisis: Where to from here? In Journal of Policy Modeling 37 (3), pp. 428–444. Geffner, Hector; Dechter, Rina; Halpern, Joseph Y. (Eds.) (2022): Probabilistic and causal inference. The works of Judea Pearl. Association for Computing Machinery (ACM books, #36). Glover, Brent; Richards-Shubik, Seth: Contagion in the European sovereign debt crisis. In NBER Working Paper. Glymour, Clark; Zhang, Kun; Spirtes, Peter (2019): Review of causal discovery methods based on graphical models. In Frontiers Genetics 10, p. 524. Gómez-Puig, Marta; Sosvilla-Rivero, Simón (2014): Causality and contagion in EMU sovereign debt markets. In International Review of Economics & Finance 33, pp. 12–27. 106 Gómez-Puig, Marta; Sosvilla-Rivero, Simón (2016): Causes and hazards of the euro area sovereign debt crisis: Pure and fundamentals-based contagion. In Economic Modelling 56, pp. 133–147. Granger, C. W. J. (1969): Investigating causal relations by econometric models and cross- spectral methods. In Econometrica 37 (3), p. 424. Grauwe, Paul de; Ji, Yuemei (2013): Self-fulfilling crises in the eurozone: An empirical test. In Journal of International Money and Finance 34, pp. 15–36. Halbert White; Karim Chalak; Xun Lu (2011): Linking granger causality and the Pearl causal model with settable systems. In NIPS Mini-Symposium on Causality in Time Series, pp. 1–29. He, Zonglu; Maekawa, Koichi (2001): On spurious Granger causality. In Economics Letters 73 (3), pp. 307–313. Higgins, Matthew; Klitgaard, Thomas (2014): The balance of payments crisis in the euro area periphery. In Current Issues in Economics and Finance 20. Hobza, Alexander; Zeugner, Stefan (2014): Current accounts and financial flows in the euro area. In Journal of International Money and Finance 48 (Part B), pp. 291–313. Horváth, Bálint L.; Huizinga, Harry; Ioannidou, Vasso (2015): Determinants and valuation effects of the home bias in European banks’ sovereign debt portfolios. Hsiao, Cheng (1981): Autoregressive modelling and money-income causality detection. In Journal of Monetary Economics 7 (1), pp. 85–106. Kalainathan, Diviyan; Goudet, Olivier (2019): Causal discovery toolbox: Uncover causal relationships in Python. Kalbaska, A.; Gątkowski, M. (2012): Eurozone sovereign contagion: Evidence from the CDS market (2005–2010). In Journal of Economic Behavior & Organization 83 (3), pp. 657–673. Kaminsky, Graciela L.; Reinhart, Carmen M.; Végh, Carlos A. (2003): The unholy trinity of financial contagion. In Journal of Economic Perspectives 17 (4), pp. 51–74. Koutmos, Dimitrios (2018): Interdependencies between CDS spreads in the European Union: Is Greece the black sheep or black swan? In Annals of Operations Research 266 (1-2), pp. 441–498. Lane, Philip (2011): The Irish crisis. In The Euro Area and the Financial Crisis. Lane, Philip (2012): The European sovereign debt crisis. In Journal of Economic Perspectives 26 (3), pp. 49–68. Longstaff, Francis A. (2010): The subprime credit crisis and contagion in financial markets. In Journal of Financial Economics 97 (3), pp. 436–450. Masson, Paul R. (1998): Contagion: Monsoonal effects, spillovers, and jumps between 107 multiple equilibria. In IMF Working Papers 98 (142), p. 1. McKinnon, Ronald I.; Pill, Huw (1998): International overborrowing: A decomposition of credit and currency risks. In World Development 26 (7), pp. 1267–1282. Merler, Silvia; Pisani-Ferry, Jean (2012): Who's afraid of sovereign bonds? Brussels: Bruegel (Bruegel Policy Contribution, 2012/02). Mink, Mark; Haan, Jakob de (2013): Contagion during the Greek sovereign debt crisis. In Journal of International Money and Finance 34, pp. 102–113. Missio, Sebastian; Watzka, Sebastian (2011): Financial contagion and the European debt crisis, 9/1/2011. Nagayasu, Jun (2001): Currency crisis and contagion: evidence from exchange rates and sectoral stock indices of the Philippines and Thailand. In Journal of Asian Economics 12 (4), pp. 529–546. Nauta, Meike; Bucur, Doina; Seifert, Christin (2019): Causal discovery with attention-based convolutional neural networks. In Machine learning and knowledge extraction 1 (1), pp. 312– 340. Orsi, Roberto (2013): The quiet collapse of the Italian economy. Available online at https://blogs.lse.ac.uk/eurocrisispress/2013/04/23/the-quiet-collapse-of-the-italian-economy/, updated on 7/4/2015, checked on 7/8/2021. Peet, John; La Guardia, Anton (2014): Unhappy union. How the euro crisis - and Europe - can be fixed. New York: PublicAffairs. Pereira, Paulo T.; Wemans, Lara (2015): Portugal and the global financial crisis: short-sighted politics, deteriorating public finances and the bailout imperative. Pfaff, Bernhard; Stigler, Matthieu (2021): VAR Modelling. Package ‘vars’. Quaglia, Lucia; Royo, Sebastián (2015): Banks and the political economy of the sovereign debt crisis in Italy and Spain. In Review of International Political Economy 22 (3), pp. 485– 507. Reis, Ricardo (2015): Looking for a success in the euro crisis adjustment programs: The case of Portugal. In Brookings Papers on Economic Activity 2015 (2), pp. 433–458. Rigobon, Roberto (2019): Contagion, spillover, and interdependence. In Economía 19 (2), pp. 69–100. Romano, Simone (2021): The 2011 crisis in Italy: A story of deep-rooted (and still unresolved) economic and political weaknesses. Saka, Orkun (2020): Domestic banks as lightning rods? Home bias and information during the eurozone crisis. In Journal of Money, Credit and Banking 52 (S1), pp. 273–305. 108 Santis, Roberto A. de (2012): The euro area sovereign debt crisis: Safe haven, credit rating agencies and the spread of the fever from Greece, Ireland and Portugal. Schölkopf, Bernhard (2022): Causality for machine learning. In Probabilistic and causal inference. The works of Judea Pearl, vol. 27. First Edition, pp. 765–804. Uribe, Martín (2006): On overborrowing. In American Economic Review 96 (2), pp. 417–421. Véron, Nicolas (2007): Is Europe ready for a major banking crisis? In Policy Briefs (234). Wanna, John; Lindquist, Evert; Vries, Jouke de (Eds.) (2015): Ireland's economic crisis: the good, the bad and the ugly: Edward Elgar Publishing. Whelan, Karl (2014): Ireland’s economic crisis: The good, the bad and the ugly. In Journal of Macroeconomics 39, pp. 424–440. Zenios, Stavros A. (2013): The Cyprus debt: Perfect crisis and a way forward. In Cyprus Economic Policy Review. 109 CHAPTER 3 DEVELOPING EARLY WARNING SYSTEMS FOR FINANCIAL CRISIS USING MACHINE LEARNING METHODS 3.1 Introduction Since the creation of the first modern stock trading market in Amsterdam in 1611, there has been a long history of financial crisis (Aliber and Kindleberger, 2015; Reinhart and Rogoff, 2011). Financial crises not only damage the domestic economy of the crisis country, but can also transmit shocks through contagion and spillovers to other economies. Nowadays, it is not surprising to see global financial crisis in the interconnected economies. A recent example is the 2008-2009 global financial crisis, originated from the U.S. subprime crisis, it spread over to the European continent and triggered the eurozone crisis from 2010 to 2013. With the level of integration in today’s global economies, it is imperative to establish Early Warning Systems (hereafter EWS) for financial crisis. An EWS can detect impending financial crisis and give warning information to the authorities. It helps policy makers to prevent or prepare for a potential financial crisis in sufficient time, thus to avoid loss and harm to the economy and individuals. The first EWS is proposed by Kaminsky et el. (1998), who use an indicator signaling approach to predict financial crisis. Since then, there has been a growing literature on designing new EWS. Essentially, an EWS is a prediction mechanism which uses current and past information about the economy to predict whether there will be a crisis in the near future. In chapter 1, we have given an overview of the powerful and flexible ML algorithms. An EWS can be viewed 110 as a supervised learning task. Given the historical data on predictors (indicators) and labeled outcomes (dummy variable on crisis), an EWS can learn from the past and generate into unseen data. The ML methods have demonstrated their premium power in such prediction problems, making them the perfect candidate for building an EWS. There were some studies that applied ML algorithms for EWS in the past few years, but this area remained under-explored. This chapter aims to construct EWS using machine learning (hereafter ML) methods for global financial crisis. Specifically, we want to show the strength of XGBoost (Extreme Gradient Boosting). In existing EWS literatures, the popular ML options are random forest, support vector machines and neural network, etc., XGBoost is seldomly used to establish an EWS. In applied ML, XGBoost is a popular algorithm that has been used on many real-world classification and regression problems (Nobre and Neves, 2019). XGBoost has been the champion of major competitions including the Kaggle Competitions (Chen and Guestrin, 2016). It has received rave from the ML practitioners, especially in analyzing and forecasting the stock market (Gumus and Kiran, 2017). However, in academia, there are very few relevant studies that use XGBoost. In some studies, XGBoost has been proven to be less useful than other ML methods. For example, Bluwstein et al. (2020) show that XGBoost perform less well to the other tree-based methods (e.g., random forest) in predicting financial crisis. In this chapter, our goal is to show the excellent predicting capability of XGBoost, benchmarking with random forest and the conventional logit model. By using XGBoost and random forest on the same set of indicators (predictors), we can also rank the feature importance of the indicators. Random forest has the build in function to obtain Gini coefficient based on impurity measures. We also calculate the Shapley values, a measure borrowed from the game theory literature, to identify the predictors that are guiding the prediction. Although feature importance is not causal importance, it can still provide valuable information on crisis cycles and linkages. 111 Besides using the novel XGBoost algorithm, we have also assembled a crisis dataset, extended from the Jordà-Schularick-Taylor Macrohistory database (Jordà et al. 2017), by referencing several other well-known crisis datasets. Our dataset contains annual data of 17 advanced economies from 1870 to 2016. We use three models to predict crisis using the dataset: two ML methods, random forest and XGBoost, and the traditional logit model. The remaining of this chapter proceeds as follows: section 3.2 reviews the related literature, section 3.3 presents the dataset, section 3.4 describes the three models used in this paper, section 3.5 shows prediction results, section 3.6 presents the variable significance, section 3.7 concludes. 3.2 Related literatures There are three streams of studies that are relevant to the topic of this chapter. The first group is studies and reports on crisis datasets. The second group is development and findings in the EWS. The third one is recent ML applications in the EWS. First, we will look at the recent crisis datasets. The crisis datasets differ in many aspects in terms of region, crisis types, frequency, etc. The most comprehensive dataset is the Global Crises Data by Country maintained by Carmen Reinhart and her colleagues at Harvard University. The Global Crises Data includes over 70 countries from 1800 to present on an annual basis, it has binary variables representing different types of crises, such as sovereign crisis, currency crisis, inflation crisis and banking crisis. This dataset is widely used in the global crisis literature. Reinhart and Rogoff (2011) used an early version of this dataset to provide a quantitative overview of financial crisis over 8 centuries. Laeven and Valencia (2020) present the IMF database on systemic banking crises, which records 151 systemic banking crises episodes around the globe from 1970 to 2017. The authors 112 focus on the banking crisis in modern times. The dataset’s previous versions (Laeven and Valencia 2013) and current version are used in many crises research, such as Schularick and Taylor (2012) and Laeven and Valencia (2018). Other global crisis datasets include Frankel and Saravelos (2012), who construct a dataset consist of 50 annual macroeconomic and financial variables for 96 crises. Compared to global datasets, there are more regional crisis datasets that are available. Most of the regional crisis datasets are about advanced economies. For example, Babecký et al. (2012) develop a quarterly dataset for economic crises in the European Union (EU) and OECD countries from 1970 to 2010. The European Central Bank (ECB) has recently developed the Macro-prudential Database that record the financial crises in European countries (Lo Duca, 2017). The OECD constructs a monthly dataset that shows the recession indicators for the eurozone countries from 1960 to 2021. Another dataset is the Jordà-Schularick-Taylor Macrohistory database (Jordà et al., 2017). “It is the result of an extensive data collection effort over several years. In one place it brings together macroeconomic data that previously had been dispersed across a variety of sources.” It is the main dataset used in this paper, which contains 44 macroeconomic and financial indicators and a binary financial crisis variable for 18 advanced economics for 147 years. The researchers have used their dataset for several empirical papers, such as Jordà et al. (2016) and Jordà et al. (2018). The dataset is also used by many other economists, such as Bluwstein et al. (2020) and Tölö (2020). The above datasets are the ones that are frequently used in the existing literature. Next, we will examine the second stream of papers on the development and findings in the EWS. The early EWS papers mainly use two approaches, the signaling approach and dependent regression analysis (logit or probit regression). Almost all the earliest literature on EWS takes the signaling approach. (Kaminsky et el. (1998) is the pioneer). In this approach, economists construct 113 indicators and the thresholds for such indicators. If an indicator surpasses its threshold, this equals to a warning signal for potential crisis. Papers adopting the signaling approach include Berg and Pattillo (1999), Cooper et al. (2000) and others. The early works set the stage for the regression-based models. With the indicators selected by prior works, the regression models use a binary variable as the outcome variable of a regression (indicative of a crisis), then regress it on potential macroeconomic and financial indicators. Many papers follow this path, to name a few, Borio and Lowe (2002), Reinhart and Rogoff (2007), Borio and Drehmann (2009), Schularick and Taylor (2012), Laeven and Valencia (2013), and Babecký et al. (2013). Demirgüç-Kunt and Detragiache (2005) survey the early works on signaling approach and regression analysis, the authors conclude that the logit regression models are more suitable for an EWS. Davis and Karim (2008) assess the logit regression and the signal extraction approach, their findings suggest that the logit model is the most appropriate for global EWS, while the signaling approach works better with country specific EWS. Recent development in the regression approach is to use multinomial logit model instead of binomial models. For example, Bussiere and Fratzscher (2006) propose a multinomial logit model with a post-crisis bias component and find evidence that it outperforms binomial models. The notion that the multinomial logit model can outperform binomial models in predicting systemic banking crises is supported in later works (e.g., Caggiano et al., 2016). These previous studies provide an abundance of early warning indicators that are proven to have practical prediction power in real-world applications. Tölö et al. (2018) construct a summary table of early warning indicators used in previous papers. Given the large pool of indicators, researchers start to introduce the predictive power of ML methods into the EWS literature. An early example is Ahn et al. (2011), they use support vector machine (SVM) to study the Korean financial market and find that the SVM outperforms the logit model. More recent works 114 are listed below. Alessi and Detken (2018) use random forest to predict banking crisis with an emphasis in excessive credit growth using the quarterly dataset of EU and OECD countries constructed by Babecký et al. (2012), their early warning tree highlights similar indicators as in the earlier signaling EWS literature. Beutel et al. (2019) employ K-nearest neighbor (KNN), decision trees, random forests and other ML methods to predict the banking crisis in 15 advanced economies from 1970 to 2016 (their data is from Leaven (2018)). Even though the ML algorithms can obtain good in-sample fit, they perform poorly out-of-sample, compared to the conventional logit model. Bluwstein et al. (2020) use extreme trees, random forest, neural networks and other machine learning methods to predict financial crisis using the Jordà-Schularick-Taylor Macrohistory database, they find that almost all the ML models outperform the logit model, especially the extreme trees and random forest, but XGBoost performs less well compared with other ML methods. Coulombe et al. (2020) take a somewhat different approach, they first design experiments to identify the “treatment” effect of different ML features, through the experiments, they study the usefulness of the underlying features that drive ML gains over conventional methods. They employ kernel ridge regression and random forest to study the FRED-MD database, which is a monthly macroeconomic database with an indicator for recession period in the U.S. Their results show that ML’s ability to detect non-linearity in the data is the game- changer to the EWS literature. Jarmulska (2020) uses random forest and logit model to study the fiscal stress events in 43 advanced economies from 1992 to 2018. The comparison between the two models suggests a clear advantage of the ML methods in forecasting. Tölö (2020) employs LSTM and GRU neural nets that use the representation power of deep learning algorithms on the Jordà-Schularick- Taylor Macrohistory database, their novel ML methods outperform basic neural networks and the logit model. 115 In this line of research to employ ML methods in EWS, only very few studies have used XGBoost and recognized its importance. Chatzis et al. (2018) build EWS to forecast stock market crisis using daily stock, bond and currency data from 39 countries, they employed ML methods such as support vector machines, random forest, XGBoost and deep neural networks. The authors claim that they are the first one to apply XGBoost and deep learning in the context of financial crisis forecasting. Their results show that all the ML methods demonstrate excellent predictive power, especially the deep neural networks. Huang (2020) applies logit regression, random forest and XGBoost on Germany’s credit default records, the ML methods perform better than the logit model, XGBoost has reached about 80% accuracy. In most of the studies that adopt ML methods in macroeconomic forecasting, researchers find clear evidence to support the use of ML. The only exception is Beutel et al. (2019), in which the authors find the logit model out performs ML methods in out-of-sample fit. Among all the ML methods that are employed, the most commonly used and praised ML algorithm is random forest, whereas XGBoost is rarely used. In the following sections, we will show that XGBoost can perform just as well as random forest in building an EWS. 3.3 Data description The Jordà-Schularick-Taylor Macrohistory database (hereafter JST dataset) is a comprehensive and harmonized dataset on 17 advanced economies from 1870 to 2016, it includes a binary variable for systemic financial crisis, and 44 macroeconomic and financial variables such as GDP, import and export, and equity dividend return. The 17 countries are Australia, Belgium, Canada, Denmark, Finland, France, Germany, Italy, Japan, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom and United States (see Jordà et al., (2017) for details on the dataset). 116 Comparing to other crisis datasets, the systemic financial crisis binary variable in the JST dataset shows some differences, some crisis periods in the JST datasets are categorized as non- crisis periods in other datasets, some non-crisis periods are marked as crisis period in other datasets. Also, out of the 2499 observations, there are only 90 crisis periods (3.6%), such a low incidence rate is much smaller than most other datasets. Therefore, this paper extends the systemic financial crisis binary variable in the JST dataset by referencing other crisis datasets. The crisis variable used in this paper is mostly based on the Global Crises Data by Country maintained (hereafter GCD) by Carmen Reinhart and her colleagues. The crises in the GCD are categorized into 4 groups, which are banking crisis, balance of payment crisis, sovereign crisis and inflation crisis. Because our paper aims to establish an EWS for financial crisis using ML methods, and the three types of crises (banking crisis, balance of payment crisis, sovereign crisis) are collectively recognized as financial crisis (Laeven and Valencia, 2008). Therefore, we dropped the inflation crisis binary variable in the GCD and merged the three binary variables for banking crisis, balance of payment crisis and sovereign crisis into one binary variable for financial crisis. In almost all cases, when there is an inflation crisis in the GCD, the crisis country simultaneously has a sovereign crisis or balance of payment crisis. In rare cases when there is a standalone inflation crisis, we turn to other crisis datasets to determine whether it is a financial crisis. For all crisis incidents, we cross-examine all available datasets to refine the financial crisis binary variable so that it is consistent with the majority of the datasets. In the EWS literature, there are two ways to treat the crisis binary variable for prediction problems. One way is to predict the whole crisis periods with forward looking, i.e., the crisis dummy is indicative of the actual crisis periods, the indicators are lagged 2 years with regard to the crisis dummy (e.g., Jarmulska (2020)). The other way is to predict the crisis one or two years prior to a crisis, the crisis dummy is in fact a signal dummy. The actual crisis periods are 117 excluded from the data (e.g., Bluwstein et al. (2020)). We take the first approach because of our data structure. The JST dataset is highly imbalanced (only 3.6% of all observations are crisis periods), even after we extended the crisis dummy in the JST dataset, there are still too many non-crisis periods versus the crisis periods (only 20% of the data are marked as crisis period). If the ongoing crisis periods are all excluded as described in the second approach, the dataset will be even more imbalanced (only 9% of the data have the positive value of one). Therefore, we follow the first approach by keeping all the crisis periods. Another treatment of the dataset is to exclude the extreme periods, this is a common practice in the EWS literature (Bluwstein et al., 2020). the JST dataset spans from 1870 to 2016, which encompasses the two world wars and the Great Depression during the 1930s. During such extreme economic and political times, financial crises happened in most countries, therefore, the period of 1914-1918 (The World War I), 1933-1939 (The Great Depression) and 1939-1945 (The World War II) are excluded from the data. Also, following the first approach, a two-year lag is introduced to the dataset, so that the crisis can be predicted in advance. For example, for crises in 2010-2012, we use the indicators in 2008-2010 as explanatory variables. The final adjustment to the dataset is to clear out the observations with missing values in one or more macroeconomic variables. Given all the adjustments, the final dataset contains 1570 observations for 17 countries, 322 out of the 1570 observations (20%) are crisis periods. Next, we will discuss the macroeconomic and financial variables in the dataset. There are many indicators that have been proven to be predicative in previous studies (e.g., Tölö et al., 2018). However, because the JST dataset goes back to as far as 1870, it is very difficult to find data in the 19th century. Therefore, the explanatory variable selection is restricted to the variables in the original JST dataset. Following previous studies that also use the JST dataset (e.g., Bluwstein et al., 2020; Tölö, 2020), we pick out 15 explanatory variables, which can be put into three categories: domestic 118 economy, competitiveness and global economy. The 13 variables in the domestic and competitiveness categories are all listed in the table of “survey of Early Warning Indicators” in Tölö (2018). They are widely used in previous studies and at least have shown some predictive power. For the two variables in the global economy category, they are derived from credit and yield curve in the domestic economy category, the global credit is the mean of all other countries’ credits, the global yield curve is the mean of all other countries’ yield curves. The first category, domestic economy, includes 10 variables. They represent the macroeconomic fundamentals of an economy. The nominal GDP and GDP per capita measure the overall output of an economy. A large GDP decline is an indicator for debt crises (Babecký et al., 2014). The reason that we include both GDP and GDP per capita is because GDP is highly correlated with other variables, therefore GDP is excluded from the logit model. CPI is the adjusted domestic consumer price index in U.S. dollars, the growth rate of CPI measures the domestic inflation level, high inflation is an indicator for sovereign crisis and balance of payment crisis (Christofides et al., 2016). Money is the domestic broad money supply, it is one of the monetary policies that can be used to adjust the interest rate, an increase in money supply is a sign of injecting credit into the market in times of economic downturn (Reinhart et al., 1998). Consumption and investment are key indicators of aggregated economic activities, they directly reflect the contraction or expansion of an economy. Credit is the total loans to non- financial private sector created through the private banks, and controlled by the central banks. Credit boom periods tend to be followed by unusually low returns to equities (Davis and Taylor, 2019). The yield curve is defined as the domestic long-term interest rate subtracting the short-term interest rate, this indicator is the same as Bluwstein et al. (2020), the authors show that the yield curve is a key indicator for financial crisis prediction, that the yield curve often steepen at the onset of a recession. Public debt is the growth rate of debt to GDP ratio. High levels of public 119 debt usually signal a deteriorating fiscal condition and a potential sovereign crisis. The debt service ratio is the product of credit and long-term interest rate, then divided by nominal GDP. It measures the economy’s ability to repay its private debts, a high debt service ratio suggests vulnerability in the banking sector. This simple measure of debt service ratio follows Bluwstein et al. (2020), it is constructed only with the data in the JST dataset due to data availability, thus it cannot reflect features such as short-term lending rates or the maturity structure of the debt. The second category is competitiveness, which stands for a country’s competitiveness in both goods and services in international trade. The current account is represented by the growth rate of the current account to GDP ratio, which measures a country’s earnings and spending abroad. Sustained large current account deficit is a sign of loss of competitiveness that might lead to a balance of payment crisis. The level of current account deficits is also robustly associated with the severity of crises (Babecký et al., 2013). Export is an important measure of a country’s overall performance on the global markets, export growth suggests a comparative advantage and an export decline implies a loss of competitiveness. Import is not included here, because it is collinear with export and current account deficits. The USD exchange rate is an important indicator for balance of payment crisis, exchange rate can interact with domestic and foreign prices to determine the capability of a country’s export and import. The third category is the global economy. We follow Bluwstein et al. (2020) and Jarmulska (2020) to include two global indicators: the global yield curve, and the global credit. They are calculated by averaging the values of all other countries. Those two global indicators can represent the cross-border contagion and spillovers in the global market. The limitation of the two indicators is that they only represent the 17 advanced economies in the JST datasets, so the shocks from other countries (e.g., shocks from Greece during the eurozone crisis) cannot be picked up. 120 Table 3.1. Explanatory variables summary Description Domestic economy GDP GDP (nominal, local currency), y-o-y growth GDP per capita Real GDP per capita (PPP), y-o-y growth CPI Consumer prices (index, 1990=100), y-o-y growth Money Broad money (nominal, local currency), y-o-y growth Consumption Real consumption per capita (index, 2006=100), y-o-y growth Investment Investment-to-GDP ratio, y-o-y growth Credit Total loans to non-financial private sector (nominal, local currency), y-o-y growth Yield curve Long-term interest rate - Short-term interest rate, in levels Public debt Public debt-to-GDP ratio, y-o-y growth Debt service ratio Credit × long-term interest rate over GDP, y-o-y growth Competitiveness Current account Current account-to-GDP ratio, y-o-y growth Export Exports (nominal, local currency), y-o-y growth USD exchange rate USD exchange rate (local currency/USD), y-o-y growth Global economy Global yield curve Mean of all other countries’ yield curves Global credit Mean of all other countries’ credit Explanatory variables used in this paper, all data is from the JST dataset, y-o-y growth is the year-to- year growth rate in basis points. Table 3.1 explains the makeup and structure of the variables. 13 variables use the growth rate, 2 variables (yield curve and global yield curve) are in levels. A two-year lag is introduced in the dataset. Table 3.2 shows the descriptive statistics of the variables for the crisis subgroup and the non-crisis subgroup. A 𝑡-test is performed to check the difference in mean in those variables, 9 out of the 15 variables have a significant difference in their means. 121 Table 3.2. Descriptive statistics of the explanatory variables Crisis Crisis Non-crisis Non-crisis Difference Mean Std. Dev. Mean Std. Dev. in mean Domestic economy GDP 6.507 7.929 7.109 7.658 -0.602 GDP per capita 1.52 3.672 2.525 3.215 -1.005*** CPI 4.203 6.238 3.236 5.175 0.966*** Money 0.771 3.754 0.409 3.059 0.361* Consumption 1.241 3.766 2.363 3.331 -1.122*** Investment -0.066 2.293 0.1 1.66 -0.166 Credit 9.651 10.02 9.557 8.621 0.094 Yield curve 0.509 2.219 0.734 1.74 -0.224* Public debt 1.151 5.362 -0.244 5.019 1.395*** Debt service ratio 4.455 3.161 3.488 2.523 0.967*** Competitiveness Current account -0.022 2.636 0.064 2.158 -0.086 Export 9.013 21.337 10.149 40.366 -1.136 USD exchange rate 2.405 16.552 0.527 9.619 1.878*** Global economy Global yield curve 0.74 0.969 .813 .874 -0.073 Global credit 9.259 6.17 10.097 5.379 -0.837** The difference in mean 𝑡-test results. Significant levels are ∗ 𝑝<0.1; ∗∗ 𝑝<0.05; ∗∗∗ 𝑝<0.01. Figure 3.1 shows the result of a Principal Coordinate Analysis (PCoA) of the data for crisis and non-crisis subgroups, the variability in the whole data set is not negligible as the points spread out over the horizontal coordinate, but the points for crisis and non-crisis subgroups display little dissimilarity between them. 122 Figure 3.1. Principal Coordinate Analysis (PCoA) of the data for crisis and non-crisis subgroups. The correlation matrix of the explanatory variables is shown in Figure 3.2. GDP is the only variable that exhibits high correlation with other variables. This is because some variables are calculated by the growth rate of its ratio to the GDP. Therefore, we perform a VIF (variable inflation factor) to measure the amount of multicollinearity among the variables. As expected, the GDP has a VIF above 9, suggesting high collinearity. The two ML methods (random forest and XGBoost) are naturally immune to multicollinearity, because the algorithms only pick one of the collinear variables when deciding a split at a node in the trees, the multicollinearity does not affect prediction performance. But with the detection of multicollinearity, we will report results of the algorithms both with the GDP and without GDP. In the benchmark logit model, the GDP is excluded from the regression. 123 Figure 3.2. Correlations matrix of the explanatory variables. Blue means a positive correlation, red means negative. The magnitude of the correlation is shown by color intensity, the darker the color, the larger the absolute value. 3.4 Methodology This chapter uses three models to establish the EWS, XGBoost, random forest, and the benchmark logit model. The first one is XGBoost, as discussed in section 3.1, XGBoost is a very popular method in applied ML because of its impressive results in real-world applications, specifically in finance. XGBoost combines the advantages of gradient boosting with random forest, making it a powerful and efficient tool for prediction problems. The EWS literature has only seen very few XGBoost applications. The second model is random forest. Previous studies 124 have demonstrated its outstanding prediction performance in crisis detection, random forest also outperforms a lot of other ML methods such as decision trees and support vector machines. Besides the two ML methods, the conventional logit model is used as the benchmark model. Logit regression is the standard model of the EWS in crisis prediction. It has been shown to outperform many ML algorithms including random forest (Beutel et al., 2019). This paper aims to illustrate the predictive power of XGBoost in establishing EWS, we will compare the performance of the three models in the following sections. The next section presents the three models in detail. 3.4.1 XGBoost XGBoost is a recent development in ML built upon decision trees and gradient boosting (Chen and Guestrin, 2016). XGBoost is a tree-based ensemble ML algorithm which uses a gradient boosting framework. It is best suited to smaller panel data (in ML, panel data is also referred as tabular data) for a variety of applications such as regression and classification. XGBoost improves over the previous algorithms through parallel processing, tree pruning and handling missing values and regularizations. XGBoost algorithms starts out with building the base tree predictions using parallelized implementation. After the trees are built, XGBoost specifies the maximum depth of each tree and prune the trees backwards. XGBoost also has a built-in regularization component that penalizes more complex models through both LASSO and ridge to prevent overfitting. In a XGBoost framework, given a dataset with 𝑁 observations, the explanatory variables 𝑥𝑖 has 𝑚 features so that 𝑥 𝑚𝑖 ∈ 𝑅 . For each 𝑥𝑖 , there is an outcome variable 𝑦𝑖 , 𝑦𝑖 ∈ 𝑅. In the classification problems, 𝑦𝑖 ∈ (0, 1). First, a tree ensemble model is performed to predict the outcome variable using the explanatory variables 𝑥𝑖 and the following 𝐾 additive functions 125 (3.1) where 𝑓𝑘 corresponds to an independent tree structure with leaf weights, 𝐹 is the tree space. The goal of the regularized learning is to minimize equation 3.2 (3.2) where 𝑙 is a differentiable convex loss function that measures the difference between the prediction and . is an additional regularization term for penalizing the complexity of the model. (3.3) where 𝑇 is the number of leaves, 𝜔𝑖 is the weight of the 𝑖 𝑡ℎ leaf. Define 𝐼𝑗 as the instance set of leaf 𝑗. The optimal values for 𝜔𝑗 can be obtained by solving equation 3.1 to 3.5. (3.4) (3.5) Normally it is impossible to enumerate all the possible tree structures. Assume that 𝐼 = 𝐼𝐿 ∪ 𝐼𝑅 where 𝐼𝐿 and 𝐼𝑅 are the instance sets of child left and right nodes. The following formula is usually used in practice. 126 (3.6) More details about the XGBoost algorithm can be found in the authors’ paper (Chen and Guestrin, 2016). 3.4.2 Random forest Random forest is a combination of decision tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. None of the trees in the forest can see the entire data, therefore avoiding the problem of overfitting (Breiman, 2001). In a random forest algorithm, the data is recursively split into partitions, at each non-terminal node in the forest, the split is done by asking a question with binary answers, the answer then determines the questions that will be asked at the next node. This process repeats until the data reaches the terminal node, where a categorical outcome is produced. The criterion for splitting the data at each node is based on impurity measures such as Gini impurity and entropy. Here, we will use the Gini impurity, which is a function of measuring the quality of split in each node, to stratify the predictor space during the recursive binary splitting process. The Gini impurity for node 𝑁 is defined as (3.7) 127 where 𝜔𝑖 is the weight for the 𝑖 𝑡ℎ leaf, 𝑃(𝜔𝑖) is the proportion of the population of class 𝑖 in node 𝑁. The goal of random forest is to minimize the impurity by choosing the best split in each node. The best split is thus defined as the highest reduction in impurity or the highest gain in information. The information gain of choosing a split is defined as (3.8) where the decrease in the impurity measure 𝐼(𝑁) equals the current level of 𝐼(𝑁) minus the expectation of the two child nodes of node 𝑁. 𝑃𝐿 is the proportion of data in node 𝑁 that goes to the left child node, correspondingly, 𝑃𝑁 is the proportion of data in node 𝑁 that goes to the right child node, 𝐼(𝑁𝐿) and 𝐼(𝑁𝑅) are the impurity measure for the two child nodes. The above optimization problem at each node set the threshold values for the trees in the random forest. Random forest uses the ensemble methods of bagging (as known as Bootstrap aggregating) to aggregate the decision trees predictors, which is often associated with large variance and low out-of-sample prediction accuracy. To sum up, random forest is a nonparametric way to estimate the set of outcome variables 𝑦𝑖 from a set of explanatory variables 𝑥𝑖. 3.4.3 Logit model In the binomial logit model, the outcome variable 𝛱𝑖 for observation 𝑖 is predicted from a vector of covariates 𝑥1𝑖 , ⋯ , 𝑥𝑚𝑖. The logit model is described as below (3.9) 128 where 𝛱𝑖 is the probability of a crisis period, 𝑥𝑘𝑖 is the value of 𝑘 𝑡ℎcovariates, 𝑘 is from 1 to 𝑚. 𝛽𝑖 is the coefficient of each individual covariate. The value of 𝛽𝑖 is estimated using the method of maximal likelihood. 3.4.4 Performance measure AUROC curve (areas under the receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters, the True Positive Rate (TPR) on the vertical axis and the False Positive Rate (FPR) on the horizontal axis. The area under the ROC curve (AUROC) is a common measure used in the EWS literature. All recent works have adopted this measure in their model assessments (e.g., Tölö, 2020; Jarmulska, 2020). Therefore, we also use the AUROC to evaluate the three models under study. For any thresholds, the two parameters in a ROC curve are defined as (3.10) (3.11) The area under the ROC curve can be interpreted as the probability that the distribution of thresholds during the crisis is stochastically larger than during normal times (Drehmann and Juselius, 2014). Therefore, the AUROC provides a convenient and interpretable measure of the EWSs. The values of AUROC lie between 0 and 1. A pure random prediction which assigns the sample into 2 groups would result in an average AUROC of 0.5. When the AUROC is below 129 0.5, it suggests that the model is uninformative, when the value is above 0.5, the model is informative, when the value is 1, it is fully informative. 3.5 Results This part shows the AUROC of the three models. For XGBoost and random forest, two AUROCs are reported, one is when GDP is included in the explanatory variables, one is without. For the logit model, only one AUROC is reported without GDP as an explanatory variable. Figure 3.3 and Figure 3.4 show the AUROC for XGBoost when GDP is included, and not included, respectively. Both the AUROC have a value above 0.5, the first one has a value of 0.812, while the second one has a value of 0.994, which is almost a fully informative model. Figure 3.3. AUROC for XGBoost with GDP included in the explanatory variables. 130 Figure 3.4. AUROC for XGBoost with GDP excluded in the explanatory variables. Figure 3.5 and Figure 3.6 show the AUROC for random forest when GDP is included, and not included, respectively. Both the AUROC have a value above 0.5, the first one has a value of 0.809, while the second one has a value of 1, which is a fully informative model. Figure 3.5. AUROC for random forest with GDP included in the explanatory variables. 131 Figure 3.6. AUROC for random forest with GDP excluded in the explanatory variables. Figure 3.7 shows the AUROC for the logit model with GDP excluded in the explanatory variables. It has a value of 0.724 which is above 0.5; the model is informative. Table 3.3 summarizes the finding of the AUROCs. Figure 3.7. AUROC for the logit model with GDP excluded in the explanatory variables. 132 Table 3.3. AUROCs for the three models. Model AUROC with GDP AUROC without GDP Random Forests 0.809 1 XGBoost 0.812 0.994 Logit 0.724 The results of the AUROC show that the ML algorithms perform extremely well in the test set. The XGBoost and random forest have reached a value greater than 0.99, while the benchmark logit model has a value of 0.724. This finding shows that the ML methods outperform the logit model in terms of AUROC. In theory, the multicollinearity associated with the GDP should have little effect on the predictive performance of XGBoost and random forest. However, as shown in Table 3.3, the AUROC is higher in the predictions without GDP in both XGBoost and random forest, suggesting that GDP should be removed from the explanatory variables given this specific dataset. Between XGBoost and random forest, the values of AUROC are almost identical, showing equal predictive power in this regard. 3.6 Variable significance In an EWS, besides prediction accuracy, another important task is to understand the importance of indicators and their thresholds. The indicators are not only important in forecasting crisis, they can also point out potential areas of vulnerability of an economy. This section uses the Shapley values borrowed from the game theory literature, which can identify the variables that are guiding the prediction. 133 The Shapley value, introduced in Shapley (1953), is a method for assigning payouts to players depending on their contribution to the total payout in a coalition game. In ML prediction tasks, the Shapley value of a variable is the average of all the marginal contributions of this variable to all possible coalitions with other explanatory variables, that is the difference between the actual prediction of a given observation and the average of other “coalition” predictions with other explanatory variables (Jarmulska, 2020). The Shapley value is an ideal measure for variable importance in the ML prediction tasks (Molnar, 2019), the existing literature has used the Shapley value extensively as a criterion for feature importance (e.g., Bluwstein et al., 2020; Jarmulska, 2020; Tölö et al., 2020). The Shapley value can interpret a model’s prediction with regard to attribution of the various features, dependence on the feature’s value, and the most important features (Lopez de Prado, 2020). For detailed information about how the Shapley values are formulated, please see Strumbelj and Kononenko (2014). Rather than using a typical bar chart to rank the values, this paper reports the Shapley values density scatter plot for each explanatory variable. The plot can show how much impact each variable has on the prediction in the test set. Due to package incompatibility, the Shapley plot for XGBoost differs from that of random forest and the logit model, the difference will be explained below. The Shapley summary plot of the XGBoost reports the Shapley values by a scatter plot, which is the distribution of contributions for each explanatory variable. The mean of each variable’s Shapley value is reported in numbers and ranked from the largest to the smallest. The color of each dot represents the level of impact of the variable on each observation in the test set (Lundberg and Lee, 2017). Figure 3.8 shows that the two most influential variables in XGBoost are the global yield curve and CPI. The global yield curve affects almost all the predictions in the test set with similar impact. The yield curve, ranking 3rd on the list, effects a few predictions with a larger impact. 134 Figure 3.8. Shapley summary plot for XGBoost, GDP is excluded from the model. Figure B.1 in Appendix B shows the Shapley summary plot for XGBoost when GDP is included. In theory, including the GDP should have minor effects on the XGBoost’s prediction. But in practice, the Shapley values change a lot just as the AUROCs. The 2 most influential variables now become the global yield curve and the domestic yield curve, both of them have a sizeable impact, followed by consumption and CPI. 135 Figure 3.9. Shapley summary plot for random forest, GDP is excluded from the model. The second graph (Figure 3.9) is the Shapley plot for random forests when GDP is excluded from the explanatory variables. In this plot, the red and green bars show the means of the Shapley values for each variable, the box plots summarize the distribution of contributions for each explanatory variable (Biecek and Burzykowski, 2020). The list of explanatory variables is sorted by their means (from the largest to the smallest). In Figure 3.9, the two most impactful variables are consumption and the global yield curve. Almost all the variables have negative effects on the prediction, while the domestic yield curve and credit have a positive effect. The domestic yield curve ranks 4th in the list. 136 The Shapley summary plot of the random forest with GDP included in the model is reported in the Appendix. Figure B.2 shows that consumption and GDP per capita are the two most impactful variables. The global yield curve now ranks 6th on the list, though it affects some of the predictions, its effect is minor since its average Shapley value is around 0. The influence of the global yield curve becomes positive in this figure. Two other variables, CPI and the current account, also turn positive. It is interesting to see such dissimilar results in variable importance by introducing a collinear variable. Figure 3.10. Gini index for random forest, excluding GDP. Another variable importance metric is the Gini-based importance (Gini index) obtained by training the random forest algorithm. For each variable, its Gini impurity decreases every time that variable is chosen to split a node across every tree of the forest. The accumulated decrease 137 is divided by the number of trees in the forest to give an average, which is the value of the Gini index. The larger the Gini index, the more important the variable is (the scale of the Gini index is irrelevant). In Figure 3.10, the two most important variables are global yield curve and debt service ratio. The third Shapley summary plot is for the logit model excluding the GDP. This plot has the same attributes as Figure 3.9. In Figure 3.11, the two most influential variables are consumption and GDP per capita. In order to compare the Shapley value results with the traditional variable significance, the logit regression coefficients are also reported in Table 3.4. Figure 3.11. Shapley summary plot for logit model, GDP is excluded from the model. 138 Table 3.4. Logit regression results Variable Estimate Std.Error z statistic Pr(>|z|) GDP per capita -9.829 6.624 -1.484 0.137 CPI 7.180 5.737 1.251 0.210 Money 2.089 4.841 0.432 0.666 Consumption -24.923 6.618 -3.766 0.000*** Investment -4.287 5.005 -0.856 0.391 Credit 11.287 6.135 1.840 0.065* Yield curve -10.173 4.854 -2.096 0.036** Public debt 5.310 4.960 1.071 0.284 Debt service ratio 13.973 4.459 3.133 0.001*** Current account -12.852 4.660 -2.758 0.005*** Export 28.467 13.026 2.185 0.028** USD exchange rate -3.402 4.269 -0.797 0.425 Global yield curve 0.307 4.402 0.070 0.944 Global credit -13.107 5.258 -2.493 0.012 z-test significant levels are ∗ p< 0.1; ∗∗ p<0.05; ∗∗∗ p<0.01. The variables that are significant are consumption, debt service ratio, current account, export, yield curve and credit. The top six variables on the Shapley list are consumption, GDP per capita, yield curve, credit, current account, and global credit. Four of them are the same, including the most significant variable, consumption. In both measures, consumption has a large negative impact on the predictions, which is consistent with the findings in random forest (Figure 3.9). Table 3.5 summarizes the variable importance for all the metrics discussed above. The Shapley value and the Gini index are reported with rank order while the logit coefficients are reported in value with significance level. The two ML methods recognize the global yield curve as an important indicator for predicting a financial crisis. The domestic yield curve is also an important indicator but with a 139 smaller impact. The domestic yield curve is also a significant variable in the logit model. Therefore, the yield curves are the most important indicators in this dataset. Intuitively and empirically, an inverted domestic yield curve is one of the most reliable leading indicators of an impending recession (Reinhart and Rogoff, 2011). Recent studies who use the ML methods to identify the early warnings also find the yield curves as an important measure (e.g., Bluwstein et al., 2020; Tölö, 2020). Table 3.5. Variable importance summary across the three methods Variable Shapley Shapley Shapley Coefficient Gini index R. forests XGBoost Logit Logit R. forests GDP per capita 5 11 2 -9.829 9 CPI 8 2 11 7.180 4 Money 14 14 14 2.089 14 Consumption 1 5 1 -24.923*** 3 Investment 13 10 7 -4.287 13 Credit 11 4 4 11.287* 10 Yield curve 4 3 3 -10.173** 6 Public debt 9 8 10 5.310 8 Debt service ratio 3 7 8 13.973*** 2 Current account 10 6 5 -12.852*** 11 Export 12 13 9 28.467** 12 USD ex. rate 7 9 12 -3.402 7 Global yield curve 2 1 13 0.307 1 Global credit 6 12 6 -13.107 5 Logit regression results, significant levels are ∗ p< 0.1; ∗∗ p<0.05; ∗∗∗ p<0.01. The global yield curve is a global indicator for all countries, reflecting changes in the global economies, which serves as a warning of potential contagion and spillovers during a global financial crisis. Our dataset contains only 17 advanced economies in this dataset, 13 of them are 140 European countries, 11 are eurozone countries. Given the economic and political interdependencies among the European countries, the domestic economic condition of a country would have effects on its neighbors. However, without further investigation, this finding should not be generalized in macroeconomic forecasting to a larger scale, due to limitation of the dataset. Besides the yield curves, the second group of important indicators are consumption and CPI. Consumption is the most important variable in random forest and the logit model. It ranks 5th for the XGBoost. For the logit model, the importance of consumption is confirmed by both the Shapley value and the coefficient significance. Just like the yield curve, abrupt change in consumption is a classic indicator for imminent financial crisis, CPI is a measure of inflation and is often seen as the pressure indicator for overall economic health. The three variables in the competitiveness category, current account, USD exchange rate and export have low prediction power in all three models. This could be due to the structure of the dataset, in the 19th century and the early 20th century, the volume of global international trade only makes up a small share of the total economy, and some of the countries have a peg with the U.S. dollar, hence there is not much variation in the data. Table 3.2 confirms that the difference between crisis periods and non-crisis periods in the current account and export are not significant. A consistency exists in all three models, that money has the lowest prediction power, there could be many reasons why this is the case, one explanation is the 2-year lag in the explanatory variables. Since the money supply can be controlled by the authorities, this policy-oriented metric is not a barometer for the economic health, but rather reflects the expectation of the authorities. Another reason is the formation of the European Union. Eleven out of the 17 countries in our dataset joined the eurozone in 1999 with a unified monetary policy, therefore the money supply indicator in our dataset may not capture the variations in these economies. 141 3.7 Conclusion This paper uses two popular ML algorithms to study the extended JST dataset with comparison to a benchmark logit model. Our goal is to show the predictive power of the ML algorithms, especially the XGBoost. XGBoost not only outperforms the logit model, its predictions are almost fully informative, suggesting high levels of accuracy. The performance of XGBoost is on par with the random forest. The variable importance is measured by the Shapley value in this study and the results show coherent findings with the existing literature. Three major contributions are delivered in this study. First, we extended the JST dataset, referencing other famous crisis datasets. Second, through comparing with random forest and the logit model, we have shown that the XGBoost has excellent prediction accuracy for EWS. Between the two ML methods, random forest has been proven to have excellent predictive power in previous studies (e.g., Chatzis et al., 2018; Coulombe et al., 2020). XGBoost, which is rarely used in previous literatures, also shows outstanding prediction performance using our dataset. Third, by using the Shapley value, the variable importance can be identified. The yield curves and consumption have the most influence on the prediction, where some of these findings might only reflect the features of the JST dataset. For future work, with the growing availability of big data in the macroeconomics, a larger dataset can help to train the algorithms and provide richer information about the economies. Also, ML is a fast-growing field, newly developed algorithms are emerging every day, so future works can better leverage the more advanced ML algorithms. Lastly, even though the early warning system for financial crisis is a pure prediction problem, causal inference could still be introduced to the framework. 142 Appendix B Figure B.1. Shapley summary plot for XGBoost, GDP is included. 143 Figure B.2. Shapley summary plot for random forest, GDP included. 144 REFERENCES Ahn, Jae Joon; Oh, Kyong Joo; Kim, Tae Yoon; Kim, Dong Ha (2011): Usefulness of support vector machine to develop an early warning system for financial crisis. In Expert Systems with Applications 38 (4), pp. 2966–2973. Alessi, Lucia; Antunes, Antonio; Babeckk, Jan; Baltussen, Simon; Behn, Markus; Bonfim, Diana et al. (2015): Comparing different early warning systems: Results from a horse race competition among members of the macro-prudential research Network. In SSRN Journal. Alessi, Lucia; Detken, Carsten (2018): Identifying excessive credit growth and leverage. In Journal of Financial Stability 35, pp. 215–225. Aliber, Robert Z.; Kindleberger, Charles P. (2015): Manias, panics, and crashes. A history of financial crises, seventh edition. Seventh edition 2015. London: Palgrave Macmillan. Babecký, Jan; Havranek, Tomas; Mateju, Jakub; Rusnák, Marek; Smidkova, Katerina; Vasicek, Borek (2012): Banking, debt and currency crises: early warning indicators for developed countries. In ECB Working Paper Series. Babecký, Jan; Havránek, Tomáš; Matějů, Jakub; Rusnák, Marek; Šmídková, Kateřina; Vašíček, Bořek (2013): Leading indicators of crisis incidence: Evidence from developed countries. In Journal of International Money and Finance 35, pp. 1–19. Babecký, Jan; Havránek, Tomáš; Matějů, Jakub; Rusnák, Marek; Šmídková, Kateřina; Vašíček, Bořek (2014): Banking, debt, and currency crises in developed countries: Stylized facts and early warning indicators. In Journal of Financial Stability 15, pp. 1–17. Berg Andrew; Pattillo, Catherine (1999): Are currency crises predictable? A test. In IMF Economic Review 46 (2), pp. 107–138. Beutel, Johannes; List, Sophia; Schweinitz, Gregor von (2019): Does machine learning help us predict banking crises? In Journal of Financial Stability 45, p. 100693. Biecek, Przemyslaw; Burzykowski, Tomasz (2020): Explanatory model analysis. Explore, explain, and examine predictive models. With examples in R and Python. Bluwstein, Kristina; Buckmann, Marcus; Joseph, Andreas; Kang, Miao; Kapadia, Sujit; Simsek, Özgür (2020): Credit growth, the yield curve and financial crisis prediction: evidence from a machine learning approach. In Bank of England working papers (848). Borio, Claudio; Drehmann, Mathias (2009): Assessing the risk of banking crises - revisited. In BIS Quarterly Review. Borio, Claudio; Lowe, Philip (2002): Assessing the risk of banking crises. In BIS Quarterly Review. 145 Breiman, Leo (2001): Random forests. In Machine Learning 45 (1), pp. 5–32. Bussiere, Matthieu; Fratzscher, Marcel (2006): Towards a new early warning system of financial crises. In Journal of International Money and Finance 25 (6), pp. 953–973. Caggiano, Giovanni; Calice, Pietro; Leonida, Leone; Kapetanios, George (2016): Comparing logit-based early warning systems: Does the duration of systemic banking crises matter? In Journal of Empirical Finance 37, pp. 104–116. Chatzis, Sotirios P.; Siakoulis, Vassilis; Petropoulos, Anastasios; Stavroulakis, Evangelos; Vlachogiannakis, Nikos (2018a): Forecasting stock market crisis events using deep and statistical machine learning techniques. In Expert Systems with Applications 112, pp. 353–371. Chen, Tianqi; Guestrin, Carlos (2016): XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, pp. 785– 794. Christofides, Charis; Eicher, Theo S.; Papageorgiou, Chris (2016): Did established early warning signals predict the 2008 crises? In European Economic Review 81, pp. 103–114. Cooper, Richard N.; Goldstein, Morris; Kaminsky, Graciela L.; Reinhart, Carmen M. (2000): Assessing financial vulnerability: An early warning system for emerging markets. In Foreign Affairs 79 (6), p. 176. Coulombe, Philippe Goulet; Leroux, Maxime; Stevanovic, Dalibor; Surprenant, Stéphane. (2020): How is machine learning useful for macroeconomic forecasting? Davis, E. Philip; Karim, Dilruba (2008): Comparing early warning systems for banking crises. In Journal of Financial Stability 4 (2), pp. 89–120. Davis, Josh; Taylor, Alan (2019): The leverage factor: Credit cycles and asset returns. Cambridge, MA. Demirgüç-Kunt, Asli; Detragiache, Enrica (2005): Cross-country empirical studies of systemic bank distress: A survey. In National Institute of economic review 192, pp. 68–83. Domenico; Reichlin, Lucrezia; Small, David (2008): Nowcasting: The real-time informational content of macroeconomic data. In Journal of Monetary Economics 55 (4), pp. 665–676. Drehmann, Mathias; Juselius, Mikael (2014): Evaluating early warning indicators of banking crises: Satisfying policy requirements. In International Journal of Forecasting 30 (3), pp. 759–780. Erik; Kononenko, Igor (2014): Explaining prediction models and individual predictions with feature contributions. In Knowledge and Information Systems 41 (3), pp. 647–665. Frankel, Jeffrey; Saravelos, George (2012): Can leading indicators assess country vulnerability? Evidence from the 2008–09 global financial crisis. In Journal of International 146 Economics 87 (2), pp. 216–231. Gumus, Mesut; Kiran, Mustafa S. (2017): Crude oil price forecasting using XGBoost. Jarmulska, Barbara (2020): Random forest versus logit models: which offers better early warning of fiscal stress? In ECB Working Paper Series (2408). Jordà, Òscar; Schularick, Moritz; Taylor, Alan; Ward, Felix (2018): Global financial cycles and risk premiums. In National Bureau of Economic Research. Jordà, Òscar; Schularick, Moritz; Taylor, Alan M. (2016): Sovereigns versus banks: credit, crises, and consequences. In Journal of the European Economic Association 14 (1), pp. 45–79. Jordà, Òscar; Schularick, Moritz; Taylor, Alan M. (2017): Macrofinancial history and the new business cycle facts. In NBER Macroeconomics Annual 31 (1), pp. 213–263. Junyu, Huang (2020): Prediction of financial crisis based on machine learning. Kaminsky, Graciela; Lizondo, Saul; Reinhart, Carmen M. (1998): Leading indicators of currency crises. In Staff Papers - International Monetary Fund 45 (1), p. 1. Laeven, Luc; Valencia, Fabian: Systemic banking crises: A new database. In IMF Working Papers (2008/224). Laeven, Luc; Valencia, Fabian (2018): Systemic banking crises databases revisited. In IMF Working Papers 18 (206), p. 1. Laeven, Luc; Valencia, Fabian (2020): Systemic banking crises database: A timely update in COVID-19 times. In CEPR Discussion Papers (14569). Laeven, Luc; Valencia, Fabián (2013): Systemic banking crises database. In IMF Economic Review 61 (2), pp. 225–270. Lo Duca, Marco; Koban, Anne; Basten, Marisa; Bengtsson, Elias; Klaus, Benjamin; Kusmierczyk, Piotr et al. (2017): A new database for financial crises in European countries: ECB/ESRB EU crises database. In ECB Occasional Paper Series. Lopez de Prado, Marcos (2020): Interpretable machine earning: Shapley. Lundberg, Scott M.; Lee, Su-In (2017): A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30. Molnar, Christoph (2019): Interpretable machine learning. A guide for making black box models explainable. Available online at https://christophm.github.io/interpretable-ml-book/. Reinhart, Carmen M.; Rogoff, Kenneth S. (2011): This time is different. Eight centuries of financial folly. Princeton University Press. 147 Schularick, Moritz; Taylor, Alan M. (2012): Credit booms gone bust: Monetary policy, leverage cycles, and financial crises, 1870–2008. In American Economic Review 102 (2), pp. 1029–1061. Shapley, L. S. (1953): Stochastic games. In Proceedings of the National Academy of Sciences 39 (10), pp. 1095–1100. Strumbelj, Erik; Kononenko, Igor (2010): An efficient explanation of individual classifications using game theory. In The Journal of Machine Learning Research 11, pp. 1–18. Tölö, Eero (2020): Predicting systemic financial crises with recurrent neural networks. In Journal of Financial Stability 49, p. 100746. Tölö, Eero; Laakkonen, Helinä; Kalatie, Simo (2018): Evaluating indicators for use in setting the countercyclical capital buffer. In International Journal of Central Banking 14 (2), pp. 51– 112. 148