MODELING AND MACHINE LEARNING STUDIES OF STRUCTURE-PROPERTY RELATIONSHIP IN ORGANIC SYSTEMS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Nikita Sengar August 2021 © 2021 Nikita Sengar ALL RIGHTS RESERVED MODELING AND MACHINE LEARNING STUDIES OF STRUCTURE-PROPERTY RELATIONSHIP IN ORGANIC SYSTEMS Nikita Sengar, Ph.D. Cornell University 2021 Organic materials with a judicious choice of functionalization have emerged as attractive candidates for use as active layers in new electronic technologies. This includes applications such as flexible displays, wearable electronics and storage devices for gas separation and the capture of solutes such as chemi- cal warfare agents. However, their use in the electronics industry is somewhat limited due to their tendency to pack into multiple, structurally distinct forms (a phenomenon known as polymorphism). For applications in energy storage technologies, due to the versatility of synthesis of organic framework materials, there remains an ongoing need to both elucidate and optimize the principles that govern performance with respect to size- and chemical- selectivity towards organic solutes. To address these challenges, we conducted detailed compu- tational studies to develop a better understanding of the relationship between nanoscale structure and macroscale properties. Understanding polymorphism in organic semiconductors (OS) is critically important since any slight variation in π orbital overlap can lead to drastic dif- ferences in the charge carrier mobility. But finding polymorphs is a challeng- ing task, because they are prone to structural reversibility, and have tradition- ally involved an iterative sampling with the possible structural space driven by those structures that lead to the lowest energy polymorphs. [1] We have ad- dressed this issue here by incorporating Bayesian Optimization into Molecu- lar Dynamics (MD) simulations to predict polymorphs. Our test case was a high-performing organic semiconductor, bis(trimethylsilyl) [1]benzothieno[3,2- b]- benzothiophene (diTMS-BTBT). Our novel approach uncovered the relationship between minimizing the to- tal energy as a function of a chosen design parameter, and allowed us to identify the optimal structures by running time-consuming, expensive simulations for only a fraction (∼15-20 percent) of the entire set of possible candidates (con- sisting of over 500 structures). Next, we expanded our investigation to use density functional theory to elucidate the molecular-scale mechanism behind the polymorphic transition in two related organic semiconductors, ditert-butyl [1]benzothieno[3,2-b]-benzothiophene (ditBu-BTBT) and diTMS-BTBT. By com- paring their packing environment, we established a molecular ”design rule” for selectively accessing both the so-called “nucleation and growth” and “coop- erative” transition pathways in organic crystals. Finally, we characterized the structural and physical properties of two exemplars of organic woven materials, COF-506 and HKUST-1 MOF functionalized with large (10 nm-dia.) Palladium nanoparticles. Using MD, we explored the propensity of both these materials to be suitable for small-molecule gas diffusion within their densely interwoven matrix of structural entities. Our multiscale computational studies improve our current understanding of structure-property relationships in organic systems, providing key insight into the accelerated development of next-generation electronic materials. BIOGRAPHICAL SKETCH Nikita Sengar was born and raised in India, where she earned her Bachelors’s degree in Chemical Engineering from Birla Institute of Technology and Science, Pilani, in 2014. During the summer of her junior year of undergrad, she got the opportunity to intern with Prof. Paulette Clancy’s group at Cornell. During this internship, she developed a particular interest in scientific research, especially in the field of computational nanoscience, and decided to shift her focus to pursue research in this field. She moved to Ithaca in 2014 and earned her Master’s degree in Chemical Engineering in 2016, and continued her research into the doctoral program under the tutelage of Prof. Paulette Clancy. While studying at Cornell, Nikita was selected to engage in highly competi- tive Machine Learning oriented summer schools. In 2015, she received an NSF travel grant to attend the CECAM Macromolecular Simulation workshop in Juelich, Germany, where she won a simulation-based “Hackathon” event. In the Fall of 2018, she participated in the data-driven “MATERIALS 4.0” summer school at TU Dresden, Germany as the invited speaker. In the summer of 2019, she engaged in Uber’s 2nd Science symposium in San Francisco, CA. During her time at Cornell, Nikita conducted internships with Global- Foundries and Corning Inc. and developed an interest in pursuing research in an industrial setting. Post Ph.D., she will be joining the Optical Lithogra- phy group at Intel Corp. as an Imaging Data Scientist to develop AI tools for accelerated predictions of defects in wafers. Outside research, Nikita has been involved in multiple outreach programs aimed at increasing diversity and inclusion at the workplace. She led the team to formulate and successfully launch a travel grant aimed at promoting diverse participation in conferences and workshops. iii To my parents, who taught me the values of hard work and integrity. To my brother, who continues to inspire me. Without their love and support, I would not have made it through. iv ACKNOWLEDGEMENTS First and foremost, I would like to express my deep-felt gratitude to my advisor, Prof. Paulette Clancy, for her unwavering support and incredible mentorship throughout the length of this work. Her open-mindedness and positive outlook towards complex problems have been instrumental in helping me grow as a researcher. Throughout my graduate career, she has time and again enabled me to pursue my research passions and follow my career aspirations, for which I will forever be grateful. I am very fortunate to have pursued my Ph.D. program under her leadership. I would also like to thank my committee members, Prof. Perrine Pepiot and Prof. Tobias Hanrath, for their continued support and mentorship over the last few years. Further, I would like to thank my experimental collaborators: Prof. Ying Diao (University of Illinois Urbana-Champaign) and Prof. Jimenez (Uni- versity of Cambridge, UK), who performed complementary studies related to my theoretical simulations; I found my discussions with them to be inspiring and enlightening. Finally, I would like to thank Prof. Peter Frazier and Dr. Matthias Poloczek, who helped me view things from a Machine Learning per- spective and expanded my knowledge of Bayesian approaches. Next, I would like to thank all the members of the Clancy group who main- tained a warm and welcoming environment within the group, allowing me to feel at home at work. I would especially like to thank Dr. Yaset Acevedo, Dr. Victoria Sorg, Dr. Mardochee Reveil, Dr. Henry Herbol, Ryan Heden, Andrew Ruttinger, Blaire Sorenson, Divya Sharma, Aaron Chen, Seun Romiluyi, and Isaiah Chen for countless helpful discussions, exchange of ideas, caring, and tremendous emotional support throughout this research effort. I will forever cherish this camaraderie. v I would also like to acknowledge the Institute of Computational Science and Engineering, Cornell University, and Maryland Advanced Research Computing Center for providing me with the necessary computational resources required for my research endeavor. I have been fortunate to have made wonderful friends along the way, and I am grateful for their tremendous moral support, care, and encouragement throughout my Ph.D. studies. Special thanks to Dr. Neeraj Kulkarni, Dr. Pragya Shah, Dr. Prateek Sehgal, Dr. Arzoo Katiyar, Dr. Pragya Sharma, Mayuri Ukidwe, Harnish Naik, Ameya Harmalkar, and Pranay Ladiwala. Last but not least, I would like to thank my parents and brother for being my rock-solid support. Without their continued guidance and motivation, I never would have made it through. vi TABLE OF CONTENTS Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction 1 1.1 Advancement in Organic Electronics . . . . . . . . . . . . . . . . . 1 1.2 Polymorphism in Organic Semiconductors . . . . . . . . . . . . . 2 1.3 Organic Materials for Storage Applications . . . . . . . . . . . . . 6 1.3.1 Covalent & Metal Organic Framework Materials . . . . . . 7 1.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 A Bayesian Optimization-Accelerated Route to Predicting Polymor- phism in Small-Molecule Organic Semiconductors 10 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Methods Employed . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . 19 2.3 System and Workflow Details . . . . . . . . . . . . . . . . . . . . . 20 2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.1 Polymorph Prediction . . . . . . . . . . . . . . . . . . . . . 27 2.4.2 Identifying Polymorphs . . . . . . . . . . . . . . . . . . . . 31 2.4.3 Correlation with Energy . . . . . . . . . . . . . . . . . . . . 40 2.4.4 Correlation among Lattice Constants . . . . . . . . . . . . 45 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3 Molecular Design Rules for Accessing Different Polymorphic Transi- tion Mechanisms in Organic Electronic Crystals 52 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 Transition Pathways of Nucleation and Growth versus Coopera- tive Transition in Organic Electronics . . . . . . . . . . . . . . . . . 54 3.3 Computational Techniques . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 Density Functional Theory . . . . . . . . . . . . . . . . . . 58 3.3.2 System Details . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 vii 4 Elucidation of Structure-Property Relationships in a Woven Class of Materials 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Molecular Morphology of COF-506 . . . . . . . . . . . . . . . . . . 73 4.2.1 Computational Strategy and System Details . . . . . . . . 74 4.2.2 Multiple Time Origin-based Calculations of Self-Diffusion in COF-506 . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.3 Autocorrelation Functions . . . . . . . . . . . . . . . . . . . 81 4.2.4 Diffusion Results . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.5 Pore Size Distribution Algorithm . . . . . . . . . . . . . . . 94 4.2.6 Structure of COF-506-Cu . . . . . . . . . . . . . . . . . . . . 102 4.3 Structural Characterization of the HKUST-1 MOF Functionalized with Pd Nanoparticles as a Composite . . . . . . . . . . . . . . . . 103 4.3.1 System Details . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.2 Pore Size Distribution of the Pd-MOF Composite . . . . . 105 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5 Conclusions and Future Directions 113 5.1 Accomplishments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.2.1 Bayesian Optimization for Polymorph Prediction . . . . . 119 5.2.2 Covalent and Metal Organic Framework Materials . . . . 121 A 3D t-SNE and k-means Clustering of Polymorphs 123 B Python Code for Pore Size Distribution Algorithm Discussed in Sec- tion 4.2.5 124 Bibliography 151 viii LIST OF TABLES 2.1 Lattice dimensions a, b, c, are given in Å; angles α, β, γ are given in degrees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Unit cell information for polymorphs of diTMS and ditBu. . . . . 57 4.1 Comparison of self-diffusivities of He, CO2, and THF within metallated COF-506 over a range of temperatures at 1 bar pressure. 89 4.2 Comparison of self-diffusivities of He, CO2, and THF within metallated COF-506 over a range of temperatures at 1000 bar pressure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.3 Activation energy and pre-exponential factor calculated for dif- fusion of Helium in metallated COF-506. A negative value in the activation energy for 1 bar is attributed to artifacts in simulation. 93 4.4 Activation energy and pre-exponential factor calculated for dif- fusion of CO2 in metallated COF-506. . . . . . . . . . . . . . . . . 93 4.5 Activation energy and pre-exponential factor calculated for dif- fusion of THF in metallated COF-506. . . . . . . . . . . . . . . . . 93 4.6 Atomic radii of each atom type in the COF-506-Cu system used to calculate the porous spaces. . . . . . . . . . . . . . . . . . . . . 95 ix LIST OF FIGURES 1.1 Publications on polymorphism in the fields of chemistry, mate- rials science, and physics from the Web of Science. Reprinted from [2]. Copyright 2016 Royal Society of Chemistry. . . . . . . . 4 2.1 (a) The diTMS-BTBT molecule featuring two bulky tri-methyl side-chains. (b) and (c) depict molecular arrangements of atoms inside periodic unit cells of two experimentally identified poly- morphs of diTMS-BTBT: (b) the alpha phase is the themodynam- ically metastable one, while (c) beta phase is the thermodynami- cally stable polymorph. These polymorphs are characterized by a six-dimensional vector x = {a, b, c, α, β, γ}where a, b, c, α, β, γ are lattice parameters whose values are shown in Table 2.1. Data is obtained from Chung et al. [3] Periodic boundary conditions are used on the simulation cell to emulate the experimental situa- tion by modeling an infinite crystal. Color key: Carbon in blue, sulphur in yellow, silicon in gray, and hydrogen in white. . . . . 13 2.2 Flowchart describing the process that combines MD and BayesOpt to make a predictive distribution of the energy for each candidate structure. The algorithm starts by randomly creating a set of input parameters, x. Structures characterized by these in- puts, x, are then sent as input to the MD simulations. The output from these MD runs, in the form of the total energy (of the train- ing set), is used to train the Bayesian model, which calculates the posterior distribution for the energy function. If the errors in the prediction are less than Eerror (a user-defined value), this process stops. Otherwise, this predictive distribution suggests the next best candidate to test based on a criterion known as “expected improvement.” [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 Predicted distribution of the total energy for all possible candi- dates of diTMS-BTBT polymorphs. The solid blue line represents energy values predicted using the BO. The black squares indicate candidate points that were used to train the model. The blue shaded region represents one standard deviation above and be- low the BayesOpt-predicted values. . . . . . . . . . . . . . . . . . 29 2.4 Probability distribution of the standard deviations observed in the values of energy for diTMS-BTBT polymorphs shown in Fig- ure 1.3. The sum of all the bar heights equals 1.0 . . . . . . . . . . 29 x 2.5 Diagnostic plots obtained via (a) ‘leave one out or n-fold’ and (b) ‘4- fold’ cross-validation. The x-axis contains the energy values di- rectly obtained from MD simulations. The y-axis denotes values predicted from the Bayesian model. The vertical bars indicate the error (one standard deviation) in predicting the energy val- ues. The correspondence of actual to predicted energies shows that our model works well, with a low root mean squared error, around 0.75 - 0.80 kcal/mol and mean absolute error less than 1 kcal/mol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 (a) Use of the elbow method to determine the appropriate num- ber of clusters in the data set. The plot shows the variation within the cluster as a function of the number of clusters. The elbow of the curve is located at the point where the variation starts to level off and represents the optimal number of clusters to use. (b) “Silhouette analysis” is used to measure how similar a data point is to its own cluster (“cohesion”) compared to other clusters (“separation”). Black dotted lines show the average sil- houette score. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7 Student’s t-distributed stochastic neighbor embedding (t-SNE) of our six-dimensional data represented on a two-dimensional plane for different values of perplexities. Gray dots represent candidate crystal structures, red and yellow dots, respectively, denote the thermodynamically metastable alpha and thermody- namically stable beta phases. . . . . . . . . . . . . . . . . . . . . . 37 2.8 t-SNE low-dimensional embedding of our six-dimensional data, color coded using the same scheme applied to clusters discov- ered by k-means clustering algorithm: (a) with two clusters and (b) with three clusters. . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.9 Energy distribution in kcal/mole for candidate structures plot- ted using t-SNE for two-dimensional embedding. . . . . . . . . . 42 2.10 Probability density of total energies for the case in which we use the number of clusters to be 2 in the k-means clustering algo- rithm. The top figure shows results for the cluster (polymorph) 1 and the bottom shows that for the cluster (polymorph) 2. . . . . 43 2.11 Probability density of total energies for the case in which we use the number of clusters as 3 in our k-means clustering algorithm. Top, middle, and bottom figures show results for the derivative of polymorph 1, as well as for polymorphs 1 and polymorph 2, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.12 Distribution of lattice constants of polymorphs found by the k- means clustering algorithm using the number of clusters as two. (a) Cluster 1 (top subfigure) contains the experimentally found stable beta phase, and (b) cluster 2 (bottom subfigure) contains the experimentally found metastable alpha phase. . . . . . . . . . 47 xi 2.13 Distribution of lattice constants of polymorphs found by the k-means clustering algorithm using the number of clusters as three. (a) Derivative polymorph (top subfigure), (b) cluster 1 (middle subfigure) that contains the experimentally found sta- ble beta phase and (c) cluster 2 (bottom subfigure) contains the experimentally found metastable alpha phase. . . . . . . . . . . . 49 3.1 Two polymorphic transition pathways of nucleation and growth and cooperative transition in single crystals of diTMS and ditBu. (a) DSC of diTMS powder showing reversible polymorphic tran- sition. The molecular structure is shown in the inset. (b) DSC of ditBu powder showing reversible polymorphic transition. The molecular structure is shown in the inset. (c) Cross-polarized optical microscopy (CPOM) images of a single crystal of diTMS α form. The α form transitions to the β form upon heating after approximately 415 s. Upon cooling, the β form does not transi- tion back. (d) CPOM images of a single crystal of ditBu LT form (blue birefringence). The LT form rapidly transitions to the HT form (brown birefringence) upon heating in approximately 1.25 s. Upon cooling, the HT form reversibly transitions to the LT form in 0.38 s. Reprinted from [3]. Copyright 2019 American Chemical Society. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Energy barrier associated with rotating a side-chain in a sin- gle crystal of diTMS-α. Simulation snapshots of key rotational points are shown on the graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms in the side-chain that were rotated in this study; gray, yellow, pink, and white indicate carbon, sulphur, silicon, and hydrogen atoms, respectively. . . . 64 3.3 Energy barrier associated with rotating a side chain in a sin- gle crystal of diTMS-β. Simulation snapshots of key rotational points are shown on the graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms in the side-chain that were rotated in this study; gray, yellow, pink, and white indicate carbon, sulphur, silicon, and hydrogen atoms, respectively. . . . 65 3.4 Energy barrier associated with rotating a side-chain in a sin- gle crystal of ditBu-LT. Simulation snapshots of key rotational points are shown on the graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms in the side-chain that were rotated in this study; gray, yellow, and white indicate car- bon, sulphur, and hydrogen atoms, respectively. . . . . . . . . . . 66 xii 3.5 Energy barrier associated with rotating a side-chain in a sin- gle crystal of ditBu-HT. Simulation snapshots of key rotational points are shown on the graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms in the side-chain that were rotated in this study; gray, yellow, and white indicate car- bon, sulphur, and hydrogen atoms, respectively. . . . . . . . . . . 67 3.6 Comparison of energy barrier of rotation for diTMS α and β phases (top) and ditBu low temperature (LT) and high tempera- ture (HT) forms (bottom). The more interlocked side-chain envi- ronment of diTMS (which has a Si atom in the side-chain) shows a much higher energy barrier compared to the more flexible en- vironment of ditBu (which has C in the side-chain). . . . . . . . . 68 4.1 Pictorial decomposition of the two interwoven strands compris- ing COF-505. Each color represents a different strand. Copper atoms are shown as red spheres. In the diagrams labeled “mesh 1” and “mesh 2” there are eight independent organic chains, leading to sixteen chains in the image labeled as “COF-505”. [5] . 75 4.2 (a)Metallated COF-505. Snapshot was taken from an MD simu- lation. (b) Unit cell of COF-506 achieved by replicating the unit cell twice in the b- and c- directions. [5] . . . . . . . . . . . . . . . 76 4.3 Representation of a data set storing values at a total number of L times, ti → {i = 0, L − 1}. The method of multiple time origins loops over all available time origins, tk → {k = 1,M} to calcu- late the time correlation function. This case only represents the schematic of one time delay of t = 3 ∆t. However, in practise, the method averages data for multiple values of time delay ranging from ∆t = {1, L − 1} [6]. . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.4 Mean squared displacements (MSD) of solutes (a) Helium, (b) CO2 and (c) THF with respect to time. The red curves indicate the MSD measured using a single time-origin, known not to be rep- resentative of the average result. The blue graphs improve the statistics by measuring the MSD using multiple (100s of) time- origins (explained in section 4.2.2). The drop-off at the end of the simulations reflects the fewer origins left by that time and are not representative of accurate values at that point. . . . . . . 86 4.5 (a) Self-diffusivity of He in COF-506-Cu from 200 to 500 K at two pressures (1 atm (blue) and 1000 atm (green)). (b) Self-diffusivity of CO2 in COF-506-Cu from 200 to 500 K and two pressure values as denoted by the inset color key. (c) Self-diffusivities of THF in COF-506-Cu at 1 atm as a function of temperature (200 to 500 K). Error bars represent the standard error. Dot-dashed lines join the data points as a guide to the eye. . . . . . . . . . . . . . . . . . . . 88 xiii 4.6 Plots depicting the Arrhenius relation between the self-diffusion coefficient and the reciprocal of temperature for (a) He, (b) CO2, and (c) THF diffusing in COF-506-Cu. The temperature varies from 200 to 500 K in 100 K intervals. Dotted lines denote the best fit to the Arrhenius data. . . . . . . . . . . . . . . . . . . . . . . . . 92 4.7 (a) A simulation snapshot of COF-506-Cu from MD calculations showing patches of porous spaces inside the weave. Color key: Carbon in red, hydrogen in blue, nitrogen in yellow, copper in white, boron in cyan, fluorine in green, helium in orange. (b) A schematic depicting a probe in blue that rasters the simulation box with atoms (in yellow) to identify the porous regions. . . . . 96 4.8 A two-dimensional representation of the probe mesh-grid used in this study. Probes of radius 1.5 Å are depicted as grids 1 and 2 shown in colors blue and red, respectively. The mesh grid is designed this way to ensure that no region inside the simulation box is left untested while searching for porous spaces. . . . . . . 97 4.9 This Figure shows a small section of a bigger (actual) simulation box. (a) A single porous sphere of radius 1.5 Å. This porous space is identified by the probe as a volume not occupied by any atom. (b) Next, adjacent and overlapping porous spaces are identified, as shown in the red, yellow, blue, and green spheres. Since the probe size chosen for this method is 1.5 Å, these identi- fied spheres have equal size and have a radius of 1.5 Å. (c) Then the extent of the space covered in aggregate by the overlapping and adjacent porous spaces (found in (b)) is identified. Yellow dots define the outer surface of this enclosed porous volume. (d) Once this region has been identified, as in (c), a sphere is fitted inside this space in order to calculate the radius of this spherical pore. Radius from all such discontiguous regions are calculated, analyzed, and plotted as a histogram in Fig. 4.10. . . . . . . . . . 100 4.10 (a) Representation of pores located inside a COF-506-Cu weave predicted by computer simulation. (b) Pore size distribution for the same system as illustrated in (a). Pore diameters were esti- mated assuming spherical pores. The histogram shows that most pores are less than 10 Angstroms in diameter. . . . . . . . . . . . 101 4.11 (a) Side and (b) top views of a simulation snapshot of the HKUST-1 and Pd nanoparticle system. Palladium shown in blue is placed inside a HKUST-1 crystal to mimic the experimental setup. Color key: Carbon shown in gray, copper in brown, oxy- gen in red, hydrogen in white, and palladium in blue. . . . . . . 106 xiv 4.12 Colors representing the pores identified by the PSD algorithm created in this thesis and described in Section 4.2.5. Due to the exceedingly penetrable nature of the HKUST-1 and Pd nanocomposite, the algorithm incorrectly combines the contigu- ous porous spaces. Therefore, in this study, we have employed a more robust approach based on Monte Carlo sampling to con- duct pore size calculations. . . . . . . . . . . . . . . . . . . . . . . 108 4.13 Geometric representation of the polyhedral surface mesh (shown in white) around HKUST-1 and a 5 nm-dia. Pd nanoparticle. The Zeo++ software uses Voronoi decomposition to identify surface mesh and accessible volume in the system of atoms and uses this information to calculate the pore size distribution. . . . . . . . . . 109 4.14 Comparison of pore size distribution in the HKUST-1/Pd nanocomposite for different values (shown as Samples in the in- set) of Monte Carlo samples per unit cell. The distribution starts to converge as the number exceeds 5000. . . . . . . . . . . . . . . 110 4.15 Distribution of pore size analysis in the HKUST-1/Pd nanocom- posite conducted over Monte Carlo samples of 5000 and 50000 (but not 500) per unit cell. . . . . . . . . . . . . . . . . . . . . . . . 110 A.1 t-SNE low-dimensional embedding of our six-dimensional data. Data points are color coded based on the clusters discovered by k-means clustering algorithm. Orange denotes the cluster (polymorph) 1 that contains the experimentally found stable beta phase, blue indicates cluster (polymorph) 2 that contains the ex- perimentally found metastable alpha phase, and green indicates the derivative polymorph of cluster 1. . . . . . . . . . . . . . . . . 123 xv CHAPTER 1 INTRODUCTION 1.1 Advancement in Organic Electronics Organic electronics have garnered a lot of attention from the industry and the research community due to the advantages and versatility they offer for a wide variety of applications. These include smartphone displays, biochemical sen- sors, radio frequency identification (RFID) tags, flexible light sources, portable solar cells, and curved television screens. [7–11] Use of organic semiconductors often allows solution processability, enabling lower fabrication and energy costs as well as large-area deposition on flexible substrates. [12] The almost limitless possibilities of functionalizing organic materials also give easy access to be able to functionally fine-tune their chemical structure to match desirable properties like improved solubility in organic solvents, electronic structure design, and chemical stability. Furthermore, innovations in the discovery of new materials in the last thirty years have led to tremendous improvement of over six orders of magnitude in the charge carrier mobility, a signature metric for high perfor- mance. In all, organic semiconductors compare well to, essentially at par with, polycrystalline silicon and metal-oxide transistors in terms of hole mobility. [13] Although first discovered in the 1940s, a major breakthrough in the field of organic semiconductors came in the 1960s when Martin Pope and col- leagues studied the electronic structure of anthracene. [14] Thereafter, discov- eries of metallic-like conductivity in tetrathiafulvalene I and tetracyanoquin- odimethane II,14 complexes, ensured the viability of small organic semicon- ducting molecules becoming the next generation of new materials. [7, 12, 14] 1 The organic field-effect transistor (OFET), which is an integral building block for organic circuits, soon became an active focus of study. [15] In 1996, Bao and colleagues [16] developed the first solution-printed high-performing transistor (40.01 cm2 V-1 s-1), based on the use of poly(3-alkylthiophene) (P3HT). Since then, with research breakthroughs continuing to improve the charge carrier mobility, [17] the hole mobilities of organic thin-film transistors have exceeded those of benchmark thin-film amorphous silicon devices [18] that range from 0.5–1 cm2 V-1 s-1. 1.2 Polymorphism in Organic Semiconductors Small-molecule organic semiconductors typically contain conjugated π systems, overlapping π orbitals bridging the adjacent single bonds. This arrangement allows for delocalization of the π-electrons across all the aligned π orbitals of the molecule. These materials are highly sensitive to intermolecular π orbital overlap for charge transport properties. Any slight variation in the π orbital overlap can lead to orders of magnitude difference in their ability to transport charge. Other dominant forces between particles arise from interactions in or- ganic semiconductors that include van der Waals and weak electrostatic (i.e., quadrupole) forces which can, nonetheless, be structure-directing. These forces are much weaker than ionic or covalent interactions and non-specific compared to hydrogen bonding. Consequently, these materials are prone to adopting mul- tiple crystalline packing forms at near ambient conditions, a phenomenon called polymorphism. Polymorphism is quite prevalent in organic crystals. Kofler, 2 Kuhnert–Brandstätter, and Burger, from the Innsbruck school, indicated that ap- proximately one-third of organic substances exhibit polymorphism under nor- mal pressure conditions. [19] McCrone [20] gave a more expansive view; he wrote that every compound has polymorphic forms, and that the number of polymorphic forms, in general, is proportional to the time and money spent in research on that compound! Many leading organic semiconductors have been shown to exhibit polymorphism, including pentacene, [21] rubrene, [22] TIPS-pentacene, [23] sexithiophene (6T), [24] and BTBT derivatives including C8-BTBT [25]. It becomes critically important to control polymorphism in organic semi- conductors since different structures often have distinctly different physical properties, such as the preferred crystal habit, melting point, electronic, optical, and mechanical properties. This increased interest in studying polymorphism, specifically in the field of chemistry, material science, and physics, is evident in the substantial increase in the number of publications from less than 200 in 1995 to more than 900 in 2014, as seen in Fig. 1.1. Control of polymorphs in the pharmaceutical industry is a huge regulatory concern. Charge carrier mobility is directly associated with differences in the crystal packing, where even slight variations can lead to a significant impact on the de- vice performance. For example, the hole mobility in TIPS-pentacene thin film transistors was measured to be as high as 11 cm2 V-1 s-1 in its non-equilibrium polymorph, whereas its equilibrium form yielded hole mobility on the order of 1 cm2 V-1 s-1. [1] Given this huge impact of structural changes on the elec- tronic properties, polymorphism can open the door to modulating crystal pack- ing without changing the chemical composition of the compound as a means to 3 Figure 1.1: Publications on polymorphism in the fields of chemistry, materials science, and physics from the Web of Science. Reprinted from [2]. Copyright 2016 Royal Society of Chemistry. maximize device performance. In this way, polymorphism provides an excel- lent tool to investigate the effects of solid-state packing on charge transport. [26] It allows access to multiple distinctly packed states without the need for chemi- cal modification of the parent compound and enables researchers to establish a direct relationship between packing and charge transport. [27] Bernstein emphasized the role of polymorphism in studying charge trans- port in organic crystals. [28] For polymorphic systems, the only variable is their crystal arrangement and not the chemical composition; as a result, differences in their electronic properties can be directly related to structural differences. In organic electronics, polymorphism is of great interest due to the direct impact of structural arrangement on the charge transport in thin-film field-effect devices. In these devices, the charge transport typically occurs near the dielectric surface within a few molecular layers of the organic semiconductor. In this region near the substrate, the molecular packing differs from the bulk phase and is known as a substrate-induced phase (SIP). [29] SIP is a polymorphic form that is only 4 found near the vicinity of the substrate and not observed in the bulk of the crys- tal. The mechanism behind the formation of these polymorphs is still under investigation. “Thin-film polymorphs” often discovered in thin-film geometry could, in fact, be a SIP or a bulk-phase polymorph that became kinetically fa- vored in the thin-film geometry. It is important to investigate the structural arrangement of SIPs as charge transport is significantly influenced mostly by the crystal packing in the vicinity of the substrate. [30] However, conducting a full structural resolution of SIPs is challenging as conventional techniques like single crystal or powder X-ray diffraction methods cannot be applied. Instead, a combination of methods like grazing incidence X-ray diffraction, surface- enhanced Raman spectroscopy, and computational approaches have been used to obtain the structural representation of SIPs. [31] The core structure of [1]benzothieno[3,2-b][1]-benzothiophene (BTBT) has emerged as a promising template for functionalization to generate high- performing air-stable organic semiconductor molecules. C8-BTBT, in particular, has demonstrated an average hole mobility between 3–16 cm2 V-1 s-1 with an ultra-high mobility of 43 cm2 V-1 s-1. [25, 32] Introduction of solubilizing alkyl chains of various lengths in the core’s long-axial direction has enhanced lateral inter-molecular interactions. [33] This also reduced inter-molecular distances and facilitated rapid growth of crystal and easier fabrication of polycrystalline thin films from solution. Yuan et al., [25] isolated a metastable form of C8-BTBT through an off-center spin-coating method and fabricated thin-film transistors as compared to the sta- ble form obtained via drop-casting. For C8-BTBT, many variables, including the degree of alignment in the film, film thickness, the nature of inter-grain connec- 5 tivity, and metastable molecular packing, impacted its charge transport proper- ties. In particular, the authors pinpointed the role of crystal packing on charge transport to be challenging. They discovered a large drop in hole mobility from 26 cm2 V-1 s-1 to below 4.1 cm2 V-1 s-1 after relaxation of the metastable struc- ture, but the process disrupted the crystal alignment. The crystal structure of the metastable form was difficult to isolate in its pure phase. 1.3 Organic Materials for Storage Applications Next, we examine organic materials for their applications in gas uptake and storage. Increasing global demands for clean and high-density gas storage con- tinue to push storage technologies like batteries and super-capacitors to their limits. These technologies form the current most promising electrical energy storage methods due to the ease of their portability and compact size for on- demand usage. However, despite their advantages, the chemical and physical limitations of existing building-block materials pose a challenge to improve- ment in performance and require new, creative solutions. For example, con- ductive polymers offer cheap, scalable, and functionally tunable materials but lack chemical and physical stability for device implementation. In contrast, in- organic materials like silicon and metal oxides are robust and offer redox-active sites. However, slow-moving ion diffusion restrains charge/ discharge rate ca- pabilities and leads to mechanical instability. These limitations have triggered the development of new materials with advanced energy storage capacities. 6 1.3.1 Covalent & Metal Organic Framework Materials Covalent and Metal-Organic Frameworks (COFs and MOFs) are emerging as a promising class of materials to meet the growing need for next-generation en- ergy storage technologies. They have garnered a great deal of attention due to the enormous potential in the design space they offer in terms of the spatial assembly of molecular organic building threads. [34, 35] With porous networks built from organic linkers, these materials allow positional control over build- ing blocks in two or three dimensions, unlike polymers. This allows for a fine- tuning of chemical and physical properties and selective synthesis with high regularity. The nanoscale confinements and networks of porous channels and spaces created in the COF and MOF scaffolds present ideal sites for solute gas storage, as well as separation and delivery tasks. Active sites and large surface areas are useful for catalysis and sensing applications. Additionally, the regular connectivity of the organic linkers makes these materials strong candidates for applications in charge carrier transport, including optoelectronics and electro- chemical energy storage. With judicious selection of linkers and metal nodes in the case of MOFs, these materials offer exceptional ability to impart functionality. Multiple re- views have highlighted the abundance of available synthetic routes to modulate chemical structure, stability, pore size distribution, and flexibility of the frame- work. [36, 37] Certain crystallographic phases and size/morphology of MOFs, in particular, can be controlled to modify the surface chemistry. [38] The abil- ity to fine-tune chemical and physical properties is a defining strength of these framework classes of porous materials, as it provides precise control over solute molecules’ chemistry for energy storage applications. Additionally, the electro- 7 chemical stability of framework materials can be improved by a judicious selec- tion of synthetic parameters. For example, MOFs made of redox-inactive nodes and smaller but more rigid linkers demonstrate greater thermal and chemical stability. In contrast, high porosity and flexibility enable greater ion storage and transport. Mechanical properties in these materials can be further tuned by selective use of flexible linkers, tuning of host-guest interaction strength, the introduction of multiple metallic nodes in case of MOFs, and control of crys- tal size. [39, 40] For example, the use of these porous materials as electrodes, structural rigidity is necessary to avoid dendrite formation and prevent short- circuiting. On the other hand, malleability would be important in applications where the active material undergoes volume changes in order to maintain struc- tural integrity in devices. 1.4 Dissertation Outline In this thesis, we study organic systems with two main objectives: • To investigate and predict in advance polymorphic behavior in organic semiconductor systems. • To uncover atomistic-level phenomena responsible for the extraordinary storage uptake and diffusive behavior of solute molecules in Covalent and Metal-Organic Framework materials. We address these open questions using computational techniques to probe the underlying atomistic mechanisms responsible for the observed phenomena. In Chapter 2, we investigate the occurrence of polymorphism in diTMS-BTBT, a 8 high-performing organic semiconductor and a member of the BTBT family. We developed a novel Machine Learning-based approach that combines atomistic simulations to accelerate the prediction of polymorphs in advance. We show that our approach is significantly computationally superior in terms of effi- ciency with reasonable accuracy compared to more expensive methods com- monly used to conduct such studies. These studies then lead into Chapter 3, where we extend our problem statement even further, more specifically to un- derstanding the molecular-scale mechanism behind the polymorphic transition. We study transition mechanisms in ditBu-BTBT and diTMS-BTBT (both mem- bers of the BTBT family) crystals and establish a molecular design rule to access both nucleation and growth and cooperative polymorphic transition pathways in organic crystals. In Chapter 4, we broaden our investigation to explore or- ganic framework materials for their use in storage applications. We investi- gate their structural morphology and elucidate how that influences the diffu- sion of solute molecules through these porous materials. Finally, in Chapter 5, we present our final conclusions and the accomplishments of this thesis and discuss possible future work. 9 CHAPTER 2 A BAYESIAN OPTIMIZATION-ACCELERATED ROUTE TO PREDICTING POLYMORPHISM IN SMALL-MOLECULE ORGANIC SEMICONDUCTORS Determining the most stable structures of polymorphs for an organic semi- conductor and any associated kinetically metastable structures is a difficult process experimentally; energy differences between these structures are of- ten well below kT. Computational predictions also involve a slow, painstak- ing process since there are typically six or seven variables (unit cell param- eters and processing conditions) to consider in determining the structure. In this study, we demonstrate the predictive capability of Bayesian Optimization techniques to shortcut this process. Bayesian Optimization builds a predictive model based on the relationship between a material’s structure and its prop- erties and guides the choice of material design using as few “experiments” as possible. Here, we used such a machine learning technique to quickly identify all low-energy structures of a high-performing organic semiconductor, bis(trimethylsilyl) benzothieno[3,2-b][1]1benzothiophene (diTMS-BTBT). This down-selection process allowed us to perform expensive Molecular Dynamics simulations for only a small fraction (15-20%) of the entire set of possible can- didates (comprising over 500 structures). Our approach identified both of the experimentally known polymorphs found by Chung et al. [3]. It also suggested the likelihood of a third polymorph, which has not yet been observed exper- imentally. This approach represents a fast and efficient means to predict and screen thermodynamically stable and metastable polymorphs of organic semi- conductors. Indeed, the method is general enough to be equally applicable to other polymorphic systems. 10 2.1 Introduction Commercial applications of organic semiconductors date as far back as the 1980s when they found use as photo-receptor materials for photocopiers and laser printers. More recently, they have been used in nanoscale electronics, alterna- tive energy sources, low-cost and large-area flexible plastic circuits, displays, and disposable sensors [41–43]. This growing interest in organic semiconduc- tors stems from the ability to deposit thin films on a variety of low-cost sub- strate materials and tune their mechanical and electrical properties, especially charge mobility, by tailoring their molecular structure. As a materials class, organic semiconductors have demonstrated acceptably good hole mobilities, 2 2 greater than ∼ 10 cmV s [32, 44] (c.f. that for crystalline silicon ∼ 450 cm V s and for 2 amorphous silicon ∼ 1 cmV s ). However, a major difficulty in deploying organic semiconductors is their predilection to pack into multiple, structurally distinct crystal structures, a phenomenon known as polymorphism. A review by Brog, Chanez, Chrochet et al. [45] emphasized the importance of identifying polymor- phism in a broader class of materials, including organic compounds and inor- ganic/metallic materials, not only in the context of science but also to control the structure of the compound to guarantee its properties. In organic semiconduc- tors, these polymorphs may have widely differing electrical or other properties, potentially threatening the reproducibility of their synthesis and the reliability of the devices they form. Identifying and controlling which organic electronic polymorph will be present under a given set of processing conditions is critically important since any slight variation in π-orbital overlap can lead to orders of magnitude dif- ference in charge carrier mobility. Despite this importance, very few experi- 11 mental studies exist that investigate the origin and mechanism behind poly- morphism. [1,46–48] Major challenges arise from the low energy barriers, ultra- fast kinetics, and structural reversibility involved. But the incentive to obtain a deeper understanding of the link between processing conditions (e.g., strain and temperature) and energetically preferred structure lies in the promise that this could open the door to stabilize metastable polymorphs into a particularly high mobility polymorphic structure, or one that maximizes (or minimizes) any other desired property. We chose to predict all the polymorphs of single crystals of a par- ticularly high-performing p-type organic semiconductor, bis(trimethylsilyl) benzothieno[3,2-b][1]1benzothiophene (diTMS-BTBT). [49] This is a member of an interesting class of benzothiophene molecule organic semiconductors whose bulky end-groups can be altered to tune the crystal packing of the molecule. [49–51] We conducted a detailed computational study to predict stable and metastable polymorphs of diTMS-BTBT, which was found by the Diao group 2 at UIUC to exhibit remarkably high mobilities [49], ∼ 7.1 cmV s , and polymorphic behavior. [3] Figure 2.1 shows the two experimentally identified polymorphs they identified, and Table 2.1 lists their lattice parameters. We will use this in- formation as validation of the predictive capability of our calculations. To re- iterate, this experimental information was not used in the identification of the computational predictions of the polymorphs. This is a pure prediction from the Bayesian optimization algorithm. A traditional experimental approach to “materials discovery” typically in- volves synthesizing new materials and using chemical intuition and experience to improve upon the molecular design by testing the physical properties of a 12 13 (a) (b) (c) Figure 2.1: (a) The diTMS-BTBT molecule featuring two bulky tri-methyl side- chains. (b) and (c) depict molecular arrangements of atoms inside periodic unit cells of two experimentally identified polymorphs of diTMS-BTBT: (b) the al- pha phase is the themodynamically metastable one, while (c) beta phase is the thermodynamically stable polymorph. These polymorphs are characterized by a six-dimensional vector x = {a, b, c, α, β, γ}where a, b, c, α, β, γ are lattice param- eters whose values are shown in Table 2.1. Data is obtained from Chung et al. [3] Periodic boundary conditions are used on the simulation cell to emulate the ex- perimental situation by modeling an infinite crystal. Color key: Carbon in blue, sulphur in yellow, silicon in gray, and hydrogen in white. 14 Lattice Constants Alpha Phase Beta Phase a (Å) 7.55 6.35 b (Å) 10.10 11.36 c (Å) 13.59 14.31 α (°) 82.75 87.65 β (°) 82.17 90 γ (°) 88.52 90 Table 2.1: Lattice dimensions a, b, c, are given in Å; angles α, β, γ are given in degrees. tested candidate. Computationally, this task is also slow and tedious. It essen- tially requires us to find the combination of candidate polymorphs (optimiz- ing six dimensions of unit cell lengths and angles for a given temperature) that leads to the lowest energy structure, which we identify as leading to the most thermodynamically stable of the roughly 500 possible combinations of lattice parameters. Without using some sort of accelerated search process, this would necessarily have to involve a systematic search of all possible lattice parameters. This is very wasteful in terms of time, effort, and resources. Although a number of accelerated methods exist, and are popular, including random search [52,53], quasirandom search [54, 55], simulated annealing [56], Monte Carlo parallel tempering algorithms [57], basin hopping [58], minima hopping [59, 60], and the evolutionary algorithm [61,62], finding the global energy minimum in large complex systems remains a major challenge in the field of structure prediction. In this study, we used a combination of Molecular Dynamics (MD) and Bayesian Optimization (BayesOpt), the latter a well-established global opti- mization technique, to narrow the search space to a tractable number of promis- ing options without changing the fundamental nature of the problem [63]. Ma- chine Learning models, like BayesOpt, are being increasingly applied, in combi- nation with molecular simulation and other computational methods, as a means to predict material properties [64,65], model interatomic potentials [66,67], pre- dict crystal and lattice structures [68] and understand property-structure rela- tionships in amorphous materials [69]. Closest to our study, a recent paper by Yamashita et al. [70] studied the effec- tiveness of a BayesOpt-based approach for crystal structure prediction in two systems, NaCl and Y2Co17 whose crystal structures were known. They demon- 15 strated that their BayesOpt method was able to identify the rock salt structure as the most stable arrangement for NaCl in 30-40% fewer trials, compared to a purely random search. In contrast to their work, we do not have a repository of available crystal structures that diTMS-BTBT would like to adopt; instead, our goal is to predict them a priori. A Bayesian paradigm works best for such cases. It encodes our intuition with respect to the values that the unknown quantity of interest is going to adopt in the so-called prior distribution, and then uses this “prior,” together with training set information, to make predictions of the unknowns. To our knowledge, this is the first time that a BayesOpt method has been employed to predict the crystal structures of organic semiconductors and can be straightforwardly extended to study any molecular systems in which the existence of polymorphism is suspected to be a prevalent phenomenon. 2.2 Methods Employed 2.2.1 Molecular Dynamics Molecular Dynamics (MD) is a computational technique to simulate the dynam- ical evolution of atoms and molecules in space and time. The trajectories of these interacting species are determined by numerically solving Newton’s equa- tions of motion, where forces between the particles and potential energy are described by molecular mechanics force fields. Solving these trajectories also allows us to determine macroscopic thermodynamic properties of the system based on the principle of ergodicity, which states: the statistical ensemble aver- ages are equal to time averages of the system collected in a simulation. There- 16 fore, MD is a deterministic simulation technique where trajectories of the atoms are completely determined given an initial set of atomic positions and veloci- ties. This way, MD has emerged as a powerful tool for predicting macroscopic properties of a system by allowing insight into molecular motion at an atomic scale. MD solves for Newton’s laws of equation, most notably: Fi = miai (2.1) for each atom (i) in a system of N atoms. Here, mi is the atom mass, ai its acceleration, and Fi is the force acting upon the atom due to the interactions with other atoms. In order to calculate the forces acting on the atoms, a suit- able model representing potential energy as a function of atomic positions of the form V(r1,r2,. . . rN) must be defined. This function is translationally and rotationally invariant and is dependent on the relative position of atoms with respect to each other. The functional forms of these potentials are provided by previously published force field models. [71–73] Forces can then be derived as gradients of the potential with respect to atomic displacements: Fi = −∇riV(r1, ....., rN) (2.2) Of the various available time integration algorithms in MD, the Velocity Verlet algorithm is the most popularly used method. This algorithm derives the posi- tion next in time by adding two third-order Taylor expansions for the positions r(t), one forward and one backward in time as: 1 1 r(t + δt) = r(t) + v(t)∆t + a(t)∆t2 + s(t)∆t3 + O(∆t4) (2.3) 2 6 1 1 r(t − δt) = r(t) − v(t)∆t + a(t)∆t2 − s(t)∆t3 + O(∆t4) (2.4) 2 6 17 Eqs. 2.3 and 2.4 are added to get the most basic representation of Verlet algo- rithm: r(t + ∆t) = 2r(t) − r(t − ∆t) + a(t)∆t2 + O(∆t4) (2.5) Here, a is the acceleration, v is velocity and s the third derivative of r with respect to time, t. Acceleration is defined using Eq. 2.2 as: −∆V(r(t))a(t) = (2.6) m Although MD is a powerful technique to simulate thermodynamic proper- ties of bulk systems, it is limited in terms of the length and time scales that the system can access due, essentially, to constraints imposed by the computational resources to accommodate the tiny time steps in the simulation. The way bulk crystals and very large systems are simulated in MD is by in- voking the concept of “periodic boundary conditions” (PBC). In PBC, the sim- ulation box is replicated indefinitely by rigid translation in all three Cartesian directions. If a particle is located at position r in the box, then this particle as- sumes positions at r + la + mb + nc, where l, m, n are integers and range from -∞ to ∞, and a, b, c are the vectors associated with the edges of the box. All these “image” particles move together, and only one is represented in the MD simulations, thereby cutting down the computational cost while simulating an infinite system. Here, each particle not only interacts with other atoms in the current box but also with their images in nearby boxes. This way, PBC virtually eliminates surface effects in the system and also makes the forces invariant to the translation of the box with the positions of the particles. 18 2.2.2 Bayesian Optimization Bayesian optimization is a powerful technique for solving the extrema of so- called “black box” objective functions that are expensive to evaluate. This tech- nique becomes especially useful when there is not a closed-form expression for the objective function, but where one can evaluate this function at sample points, possibly with noise. It is particularly advantageous when evaluation of the objective function is costly, and its functional form is non-convex. It is one of the most efficient approaches used in the Machine Learning community in terms of the number of function evaluations required. [4, 74, 75] This is due to its ability to incorporate prior belief about the objective problem, which helps in directing future sampling, and for trading off between exploration and ex- ploitation of the search space. It is built on “Bayes’ theorem”, which states that the posterior probability of a model M, given data, E, is proportional to the like- lihood of E given M times by the prior probability of M: P(M|E) ∝ P(E|M)P(M) (2.7) In Bayesian optimization, the prior probability encodes our belief about the functional space of possible objective functions. It essentially encodes prior knowledge on properties like function smoothness, which makes some poten- tial objective functions more plausible than others. Let f(xi) be the observation of the objective function at sample point, xi. As more data is collected, D1:t = {x1:t, f(x1:t} (where t denotes sequences of data), the likelihood function becomes P(D1:t|f ). The likelihood function asks, given our prior, how likely is the data we have observed? If our prior belief encodes that our objective function is a smoothly varying function with no noise, then the 19 observations with high variance and oscillations are considered less likely than those which narrowly deviate from the mean. Next, we can combine our prior and likelihood function to calculate our posterior as: P( f |D1:t) ∝ P(D1:t| f )P( f ) (2.8) The posterior updates our beliefs about the unknown objective function. Next, to efficiently direct future sampling, Bayesian optimization uses a so- called “acquisition function” to determine the location of xt+1 to sample next. It determines this point in sample space through an intelligent trade-off between exploration (where the objective function is very uncertain) and exploitation (where the objective function is expected to be high (or low), depending on whether the algorithm is maximizing or minimizing the objective). This trade- off, along with the prior feature of Bayesian optimization techniques, aims to reduce the number of evaluations of the objective function. Moreover, this also assists in setting when the objective function is non-convex with multiple local maxima or minima. More details of the prior, posterior, and acquisition function are discussed in section 2.3. 2.3 System and Workflow Details In this study, the objective function aims to identify sets of unit cell characteris- tics (optimizing six dimensions of unit cell lengths and angles) that lead to the lowest energy of the simulation cell, (y). Traditionally, each combination of unit cell characteristics is subjected to an energy minimization simulation run fol- lowed by a long NVT run. The energy corresponding to each candidate choice 20 is then collected to generate an energy landscape, whose minima typically cor- respond to lower energy or stable polymorphs. Figure 2.2 describes the overall iterative optimization process. Step 1 in- cludes generating training set data. Step 2 performs MD simulations on the training set. Step 3 models the total energy function using Gaussian process regression utilizing the data acquired thus far and generates predicted energy distribution. Step 4 recommends the next design setting to test (x) by optimizing an “acquisition function” derived from the Gaussian process. Simulations are performed on the suggested new material design, and the corresponding total energy value (y) is collected. This new observation pair, (x, y), is appended to the data set. Steps 2-4 are repeated in an iterative loop until the uncertainties in the energy fall below pre-determined, user-defined values. Steps 1, 3, and 4 are described in the following sections. Step 1: Training Set Generation As noted above, design of the diTMS-BTBT crystal unit is parameterized by a 6-dimensional vector x = [a, b, c, α, β, γ]. We constructed the computational de- scription of training set unit cells by randomly selecting values for a, b, c, α, β, γ. We constrained the values of these six lattice parameters to stay within the range of the lattice parameters of the two experimentally known polymorphs of diTMS-BTBT, as shown in Table 2.1, while also allowing a ± 2 units “safety” in- terval in case the computational predictions fell outside the experimental ones. Note that these constraints only help aid the speed of the computational search, but are not necessary to find the characteristics of the polymorphs. 21 22 Start Step 1: Gener- ate training set: x = [x1, x2, x3. . . ..xn] Step 2: Conduct MD simulations xi, Ei Step 3: Perform Gaus- sian process regression Energy distribution predicted on the test set ≥ E Step 4: Acquisition Errors in prediction error function suggests next best design to test, xk < Eerror Bayesian Model Stop Figure 2.2: Flowchart describing the process that combines MD and BayesOpt to make a predictive distribution of the energy for each candidate structure. The algorithm starts by randomly creating a set of input parameters, x. Structures characterized by these inputs, x, are then sent as input to the MD simulations. The output from these MD runs, in the form of the total energy (of the training set), is used to train the Bayesian model, which calculates the posterior distribu- tion for the energy function. If the errors in the prediction are less than Eerror (a user-defined value), this process stops. Otherwise, this predictive distribution suggests the next best candidate to test based on a criterion known as “expected improvement.” [4] To determine the location of any other possible polymorphs, we scanned the energy landscape as we varied the lattice constants a, b, c, α, β, γ in a non- systematic manner as dictated by BayesOpt. We explored the effect of varying these six parameters without assuming any prior knowledge of the locations of the polymorphs. A total of about 500 combinations of test and training points were generated: We used 100 points to train the model to be able to make pre- dictions for the remaining 400 points. Training unit cell candidates, each with one choice of [a, b, c, α, β, γ] parameters, were submitted to an energy minimiza- tion calculation and an MD simulation using NVT ensemble (Step 2) to train the Bayesian model. The total energy of each MD-generated structure, and the corresponding unit cell structures, were then used as a training set to train our Bayesian model. Step 2: Molecular Dynamics Simulations In our a priori prediction of the structure of polymorphs of organic crystals, our approach essentially involved providing a suitable inter-and intra-molecular potential for diTMS-BTBT molecules and then using this potential (force field) to predict the energy of the system as the dimensions of the unit cell are var- ied (unsystematically) under the control of the BayesOpt method in order to discover low-energy structures. We modeled diTMS-BTBT molecules using Allinger’s MM3 force field field [76] since we have found MM3 to work well as a model of small organic semiconducting molecules. [1] Indeed, that paper fea- tured MD simulations performed using an open-source software package called TINKER [77] to predict polymorphs of TIPS-pentacene, but without using an accelerated search method. The MM3 force field is not available as a force field 23 option in LAMMPS [78], so we again used TINKER for our test case here. We employed a canonical (NVT) ensemble, in which the number of molecules N, volume V, and temperature (set at T = 428 K) were kept constant. This temper- ature was chosen since experiments showed that both the α and β polymorphs, and the transition between them, occur at ∼428 K. Periodic boundary conditions were used in all three Cartesian directions. A unit cell comprised of 96 atoms connected across the periodic boundaries to approximate modeling an infinite crystal and hence emulate the larger scale of experimental studies. All the MD simulations were run for 5 ns with a time step of 1 fs (5 million iterations). Each simulation was allowed to equilibrate for 2 ns, while production data were col- lected from the last 3 ns. This length of time for data collection was sufficient to reduce the error in the energy of the unit cell to approximately ±0.5 kcal/mol, which is small in comparison to a typical magnitude of the energy of the unit cell relevant to our study (∼ 270 kcal/mole). Bayesian Optimization Step 3: Gaussian Process Regression Here, the unknown quantity of interest is the total energy value associated with each polymorph candidate. Since energy is a function of the spatial arrange- ment of atoms inside the crystal cell, we denote it as f (x), corresponding to a design parameter, x. The goal here is to find a set of designs, x, for which f (x) is small, which is essentially our objective function. However, in reality, our observations are often obscured by noise. Therefore, we extend our objective 24 function to model observations of the form: y(x) = f (x) +  (2.9) where the error ε is normally distributed, with mean 0 and variance λ2. λ2 is treated as one of the hyperparameters, and is learned adaptively in the process. BayesOpt is an excellent tool to optimize expensive and complex objective functions and performs inference by placing a prior probability distribution on unknown quantities of interest. This prior distribution encodes our intuition or domain expertise regarding the most likely values that the unknown quantity of interest will adopt. A common method to model an unknown function, y (used in this work), is using a Gaussian process as a prior, that is, y ∼ GP(0,K(x, x)), such that the prior probability of function values (y1, ......., yn) is a multivariate Gaussian as given in Eq. 2.10: ( ) | | −1p(y ) = 2πK exp yT −11:n 1:n K y1:n (2.10)2 where y1:n are the energy values computed via MD simulations and corre- sponding to the design parameters, x1:n. K is a “kernel” function that measures the similarity between two data points based on their locations in the parameter space. It should be able to encode the belief that points xi and x j, near to each other, must have similar values of f (xi) and f (x j), and vice versa. We chose a Squared Exponential kernel as it performs well in modeling smoothly varying functions. It takes the form shown in Eq. 2.11 and is employed in this work to compute the distance/similarity between data points.  k(xi, x j) = φ2 exp ∑d (x 2i,k − x j,k)  (2.11) k 1 2l 2  = k 25 where lk is the characteristic length scale associated with each dimension of input x and defines the region of influence of a point within the parame- ter space. φ2 is the output variance and controls the expected deviation of the function from its average value. This way, our kernel is parameterized by d + 1 hyperparameters [φ2, l1, ...., ld]. Next, we compute the inference or “posterior distribution” on the unknown target values yn+1 at test points xn+1. We do this by employing Bayes’ rule on the observed data and prior distribution function [79] such that it takes the form shown in Eq.2.12. ( ) y 2n+1|y1:n ∼ N µn+1(xn+1| x1:n, y1:n) σn+1(xn+1| x1:n, y1:n) µn+1 = kT (K + λ2I)−1 y1:n (2.12) σ2 T 2 −1n+1 = k(xn+1, xn+1) − k (K + λ I) k = [k(xn+1, x1), .......k(xn+1, xn+1)] From this calculation, we obtain a posterior mean (the prediction), called µn+1, and a posterior standard deviation, σn+1. Step 4: Acquisition function In order to improve the quality of the prediction, BayesOpt guides the choice of experiments and suggests a new design parameter (x) to test using an acqui- sition function. This function trades-off exploitation of the high predicted mean with exploration using a high predicted variance. We have used the well known Expected Improvement method [80] as our acquisition function. It defines the improvement over the current best value using the form shown below: 26 [ f (x) − f ∗ ( ) ( )]n f (x) − f ∗ f (x) − f ∗EI(x) = σ(x) φ n n+ + ϕ (2.13) σ(x) σ(x) σ(x) where f ∗n is the best observed value, f (x) is the predicted value, and φ and ϕ are standard Gaussian cumulative distribution functions and probability distri- bution function, respectively. We used a BayesOpt-based pySMAC optimizer [81] to maximize the acqui- sition function such that the resulting maximum is the next best recommended simulation setting. 2.4 Results 2.4.1 Polymorph Prediction Our overarching task is to determine how to identify the more stable poly- morphs. We begin with an investigation of predicting the energies of various posited structures in order of decreasing energy. Figure 2.3 shows the result of our predicted energy distributions sorted in order of increasingly higher energy for candidate structures of diTMS-BTBT. Each point on the x-axis represents some combination of a, b, c, α, β, γ and its corresponding energy value is plot- ted on the y-axis. We start with ∼ 20 training points and iterate our model 100 times, optimizing hyperparameters after every 20 iterations. At each of these 100 iterations, we include a new point, as suggested by the acquisition function. Our model made predictions for all the ∼500 candidate points using a small training set of ∼120 candidate structures (i.e., running only 120 expensive and 27 time-consuming simulations). Taking into consideration that we prepared an exhaustive space of as many as 500 structures, this result can be seen to be very efficient, finding the optimal solution using less than a quarter of the search space. We used estimates of the standard deviation in our results for predicting energy values associated with the design parameters to provide us with a mea- sure of the accuracy of our results. In Figure 2.4, we plot a histogram to study which values of uncertainties in the predicted energies are more likely to oc- cur. Our analysis revealed that the majority of the standard deviations in the energy lie roughly within the range of 1-2 kcal/mole, although a few percent exhibit standard deviations of ∼3 kcal/mol. Deviations of just 1-2 kcal/mole are comparable to intrinsic errors within MD itself and are small in comparison to the magnitude of total energies generated from MD relevant to this system. This was a satisfactory and encouraging test of the accuracy of the BayesOpt- generated predictions. As a second test to validate our results, we performed a standard “leave one out” or “n-fold” cross-validation. In this technique, we iterate through all the data points, x1:t, y1:t, and for each i  {1, ..., t}, we train our BayesOpt model on all of the data points except xi, yi, and then use this information, together with xi, to predict what the value of yi should be. Hyperparameters were then re-estimated and optimized each time, leaving out the data point (xi,yi). Figure 2.5 is plotted with actual energy values on the x-axis, and predicted values (with error bars) on the y-axis. In order to make the comparison between the actual and predicted values, we ran MD simulations on all the data points to get the actual energies. Figure 2.5(a) shows that the fit between actual and predicted points is excellent, 28 29 290 Training Points BO 285 280 275 270 265 0 100 200 300 400 500 All Possible Combinations: x Figure 2.3: Predicted distribution of the total energy for all possible candidates of diTMS-BTBT polymorphs. The solid blue line represents energy values pre- dicted using the BO. The black squares indicate candidate points that were used to train the model. The blue shaded region represents one standard deviation above and below the BayesOpt-predicted values. 1.0 0.8 0.6 0.4 0.2 0.00 1 2 3 4 Standard Deviation(Kcal/mol) Figure 2.4: Probability distribution of the standard deviations observed in the values of energy for diTMS-BTBT polymorphs shown in Figure 1.3. The sum of all the bar heights equals 1.0 Total Energy (Kcal/mol) Probability 30 290 R2 = 0.9908 285 RMS = 0.754 kcal/molMAE = 0.528 kcal/mol 280 275 270 265 n-fold 265 270 275 280 285 290 Actual from MD runs (Kcal/mol) (a) 290 R2 = 0.9892 285 RMS = 0.812 kcal/molMAE = 0.579 kcal/mol 280 275 270 265 4-fold 265 270 275 280 285 290 Actual from MD runs (Kcal/mol) (b) Figure 2.5: Diagnostic plots obtained via (a) ‘leave one out or n-fold’ and (b) ‘4-fold’ cross-validation. The x-axis contains the energy values directly obtained from MD simulations. The y-axis denotes values predicted from the Bayesian model. The vertical bars indicate the error (one standard deviation) in predicting the energy values. The correspondence of actual to predicted energies shows that our model works well, with a low root mean squared error, around 0.75 - 0.80 kcal/mol and mean absolute error less than 1 kcal/mol. Predicted from BO (Kcal/mol) Predicted from BO (Kcal/mol) with very small values for the root mean squared (RMS) error (0.75 kcal/mol) and mean absolute error (MAE) (0.53 kcal/mole). The R2 value, between actual and predicted curves, is an encouraging ∼ 0.97. We also performed “S-fold” cross-validation with S=4, see Figure 2.5(b)). It involves taking the available data and partitioning it into S groups (in the sim- plest case, these are of equal size). Then S-1 of the groups is used to train a set of models that are then evaluated on the remaining group. This procedure is then repeated for all S possible choices for the held-out group, and the performance scores from the S runs are then averaged. S-fold cross-validation is a less biased way of estimating how the model is performing in making predictions on data (not used during the training of the model) using a limited sample set. 2.4.2 Identifying Polymorphs Our model predicts the existence of a vast number of possible crystal-packing arrangements following a comprehensive sampling of different unit cell sizes in six-dimensional space. Our algorithm suggests that most of these crystals are roughly within 1 kcal/mol of each other. This is due to sampling all energet- ically relevant conformational isomers for the flexible molecule, diTMS-BTBT. To isolate groups of structurally similar polymorphs, we used a clustering al- gorithm called “k-means clustering,” which is one of the most popularly used unsupervised machine learning algorithms. [82, 83] This approach provides a way to group similar data points together and discover underlying patterns in the data. This grouping of structurally similar crystals essentially represents an ensemble of candidates that collectively signify a “polymorph.” That is, many 31 candidates may be so structurally similar that they cannot be separated experi- mentally. Unsupervised machine learning methods draw inferences from the data sets by using information from only the input vectors and without referring to known, or “labelled,” outcomes. This is what we wanted to achieve in our system, namely, to make groupings of polymorphs that were structurally simi- lar without taking into account the associated energy values. k-means cluster- ing achieves this by looking for a fixed number (k) of clusters in the data set. Here, a cluster refers to a collection of crystal candidates aggregated together because of their structural similarities. k-means clustering is a vector quanti- zation method [84] that partitions n data points into k clusters, where each data point gets assigned to the cluster with the nearest mean (cluster centroid). These centroids serve as prototypes of the cluster. This partitioning results in the seg- regation of the data set into Voronoi cells. These Voronoi cells depict regions in the data plane that are partitioned on the basis of the similarity between each region. k-means clustering takes k as input, which refers to the number of cen- troids needed in the dataset. It then minimizes within-cluster variances (as the sum of squared Euclidean distances between each member of the cluster and its centroid) as its “cost function” and allocates each data point to the nearest cen- troid. In other words, the k-means algorithm identifies k number of centroids in the data space, and then allocates every data point to the nearest cluster, while keeping the variances within the cluster as small as possible. One of the disadvantages of k-means clustering is that the number of clusters or centroids (k) needs to be supplied beforehand. One way to solve this problem is to visualize the data and guess (visually) at the data points in order to see 32 cluster patterns. But since, in most cases, real-world data are typically multi- dimensional, it becomes difficult to visualize this and, as a result, the optimum number of clusters is no longer an obvious choice. A more mathematical way to approach this problem is to plot the relationship between the number of clusters with respect to its cost function. This is called a “Within Cluster Sum of Squares” (WCSS). It allows us to select the most suitable value of the number of clusters as being given by the point on the plot at which the change in WCSS begins to level off. This method is called “elbow method” due to the typical shape of this graph. We use the elbow method to determine the optimal number of clusters for our system. Figure 2.6(a) shows that the cost function starts to level off as the number of clusters increases from 2. When we changed the cluster value from 1 to 2, the cost function value reduced very sharply. This decrease in the sum of squares reduces and eventually reaches “equilibrium” as we increase the num- ber of clusters from 4 onwards. The cluster value where this change in cost function becomes constant is typically chosen as the appropriate cluster value. For our system, this becomes a choice between 2 and 3 as the correct number of clusters. As a second test, we used a method called a “silhouette score,” based on the “silhouette coefficient” as a means to justify our choice of the number of clus- ters. [85] The silhouette coefficient measures the separation distance between the resulting clusters. It is calculated using the average distance between each data point i and all the other data points in the cluster to which the data point i belongs, represented as (a). The mean distance between i and the nearest cluster that the point i is not a part of is denoted as (b). The silhouette coefficient is then 33 34 16000 14000 12000 10000 8000 6000 4000 2000 1 2 3 4 5 6 Number of clusters (a) 2 1 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0.0 0.2 0.4 0.6 0.8 1.0 Silhouette Coefficient Values (b) Figure 2.6: (a) Use of the elbow method to determine the appropriate number of clusters in the data set. The plot shows the variation within the cluster as a function of the number of clusters. The elbow of the curve is located at the point where the variation starts to level off and represents the optimal number of clusters to use. (b) “Silhouette analysis” is used to measure how similar a data point is to its own cluster (“cohesion”) compared to other clusters (“sepa- ration”). Black dotted lines show the average silhouette score. Cluster Labels Cluster Labels Sum of Squared Distance calculated as: (b − a) (2.14) max(a, b) The silhouette plot provides a quantitative measure of how close data points in one cluster are to points in the neighboring clusters—the value of the coeffi- cient ranges from -1 to 1. A coefficient value of near +1 indicates that the data point in that cluster is far away from the neighboring clusters. A value close to 0 indicates that the data point is on, or very near to, the decision boundary between the neighboring clusters. Negative values indicate that the data point has been assigned to the wrong cluster. The elbow method suggested that the best choice for the optimum number of clusters in our study should be either 2 or 3. We conducted a silhouette anal- ysis to investigate this observation further. Figure 2.6(b) shows that, for cluster size 2, the silhouette analysis is unequivocal in clustering the data. This is mani- fested due to the high silhouette score of 0.71 (shown as a black dotted line in the figure) and the presence of both clusters above this average score. The thickness of the silhouette plot can be used to visualize the cluster size. If we choose the number of clusters to be 2, our analysis shows that cluster number 1 is roughly twice the size of cluster 2. We hypothesize here that cluster 1 is composed of a set of mini-clusters. We extended our silhouette analysis to test our hypothesis but now using the number of clusters as 3, as suggested by the elbow method. This analysis reveals interesting observations. Firstly, the average silhouette score is lower, around 0.45. Since this score is greater than zero (which is the mid-point of the 35 range of values that a silhouette score can take) and roughly lies in the middle of 0 to 1, it suggests that the two clusters are different, albeit with slight dif- ferences in their configuration. Secondly, the coefficient values associated with each cluster lie beyond the average silhouette score. These two observations justify the choice of 3 as the number of clusters in our system. And lastly, we observe that the division into two clusters comes by splitting cluster number 1 in the top sub-figure of Figure 2.6, thus validating our hypothesis. This obser- vation suggests that there are two distinct set of polymorphs, with one set split into two derivatives. Next, we used a different method, a t-distributed Stochastic Neighbor Em- bedding (t-SNE) method, to visualize the clusters found by the k-means clus- tering algorithm. [86] t-SNE is a powerful statistical tool to visualize high- dimensional data. It is a non-linear dimensionality-reduction technique that models the similarity between the points to joint probabilities to finds low- dimensional embedding of high-dimensional data. It tries to minimize the Kullback-Leibler divergence [87] between the joint probabilities of the low- dimensional embedding and the actual high-dimensional data. In other words, t-SNE models high-dimensional data to a low-dimensional embedding in a way that similar objects are modeled by neighboring points, and dissimilar objects are modeled by distant points with high probability. Kullback-Leibler diver- gence is also called relative entropy and measures the difference between two probability distributions. t-SNE’s cost function is not convex, and so it yields different results for different initializations. t-SNE has a tuneable parameter called “perplexity,” which balances the at- tention given between local and global aspects of the data. The perplexity value 36 37 Perplexity 2 Perplexity 5 Perplexity 10 100 100400 50 200 0 0 0 50 200 100 100 400 500 0 500 100 0 100 0 100 Perplexity 30 Perplexity 50 Perplexity 100 40 40 7.5 5.0 20 20 2.5 0 0 0.0 20 20 2.5 50 0 0 20 0 20 Figure 2.7: Student’s t-distributed stochastic neighbor embedding (t-SNE) of our six-dimensional data represented on a two-dimensional plane for different values of perplexities. Gray dots represent candidate crystal structures, red and yellow dots, respectively, denote the thermodynamically metastable alpha and thermodynamically stable beta phases. 38 Alpha phase Beta phase 40 20 0 20 40 30 20 10 0 10 20 30 t-SNE1 (a) Alpha phase Beta phase 40 20 0 20 40 30 20 10 0 10 20 30 t-SNE1 (b) Figure 2.8: t-SNE low-dimensional embedding of our six-dimensional data, color coded using the same scheme applied to clusters discovered by k-means clustering algorithm: (a) with two clusters and (b) with three clusters. t-SNE2 t-SNE2 has a complex effect on the resulting embedding in low-dimensional space. It influences the variance of the distribution and is essentially the number of near- est neighbors for each data point. The original t-SNE paper [86] recommends perplexity values between 5 and 50. We conducted a sensitivity analysis for dif- ferent values of perplexity and plotted the resulting t-SNE distribution in Figure 2.7. For perplexity values 2, 5, and 10, local variations dominate. For values out of range like 100, the analysis shows merged clusters. Values between 30-50 show typical cluster behavior. For our system, a perplexity value of 30 gave consistent results, so we chose that number for our future calculations. In Fig- ure 2.8, we plot clusters calculated by k-means clustering algorithm using the number of clusters values as 2 and 3 on two dimensional embedding generated by the t-SNE method. Appendix A shows a three dimensional embedding of the data with number of clusters value as three as calculated by k-means clustering algorithm. Each cluster indicates an ensemble of polymorphs that they represent. Gen- erally, the typical way of thinking about polymorphs is that there are a small handful of polymorphs for a given system, namely the ones that are found by experiments. But the simulations tell us that changing the value of one or all of the unit cell parameters by small amounts produces a set of candidates that are very close in energy to one another. Basically, we need to rethink the prevalent conception of polymorphs in terms of the energy landscape for a given materi- als system. There are not just a few relatively deep and distinguishable “energy wells” in the landscape, there are many wells that are similar and may be easily accessible kinetically, for example, using shear in the study by Diao et al. on a similar organic semiconductor system. [1] Ultimately, experiments are limited by the resolution of their instruments and, while an experimental resolution of 39 50 meV or so (¡1.0 kcal/mol) is impressive, the MD simulations are capable of providing data with much smaller energy differences that we are able to deter- mine the energy of all possible choices of unit cell parameters. In the following sections, we discuss the correlation of polymorphs in terms of their energy and lattice constants. 2.4.3 Correlation with Energy In order to examine the range of values that the total energy can take for com- binations of unit cell parameters (a, b, c, α, β, γ) in diTMS-BTBT, we plotted a heat map distribution showing the probabilities of total energy values on the low-dimensional embedding generated by t-SNE technique in Figure 2.9. We note that there is a clear preference for lower energy values in cluster number 1, which is shown in orange in Figure 2.8(b). This cluster also contains the ex- perimentally identified beta polymorph, which is thermodynamically the more stable one of the two found experimentally. Next, cluster number 2, shown as blue in Figure 2.8(b), exhibits a stronger preference for higher energy values. This group of polymorphs contains the experimentally identified alpha poly- morph, which is thermodynamically metastable. If we identify the number of clusters as three in the k-means clustering algorithm, Figure 2.8 suggests that a third polymorph shown in green might exist as a derivative of the original cluster 1. We observe in Figure 2.9 that this cluster, although structurally similar to cluster number 1, tends to have a wider range of energy values. To further probe into the energy distribution associated with each cluster of structures, we plotted 40 histograms of energy values in Figures 2.10 and 2.11, assuming the number of clusters to be two or three, respectively. These histograms indicate how likely are the energy values for a particular set of candidate structures. The histograms are color-coded using the same scheme utilized in Figure 2.8 to isolate the clus- ters. In Figure 2.10, we note that for cluster 1 (orange), our model predicts the experimentally identified beta polymorph to be in the lowest energy bin. This result matches with the experimental finding, which determined the beta poly- morph to be the stable one. The graph also shows that the distribution peaks at an energy value of ∼ 267 kcal/mole. Since the dominant forces of interactions in organic crystals are weak non-bonded forces (van der Waals and electrostatic), we attribute this observation to the possibility of various kinetically accessible conformations of the crystal, leading to a complex polymorphic landscape with multiple structures located within a very small energy window. For the second cluster (blue), the majority of the structures align with the energy value of the metastable alpha polymorph, as depicted by the energy peaks. In Figure 2.11, where we chose the number of clusters to be three, we note that the peak of the distributions lie at roughly ∼ 265, 267, and 270 kcal/mol. This suggests that the green cluster, which we believe is the derivative of the original cluster 1 (shown as orange in Figure 2.8(a)) is not only structurally close to the group 1, but also exhibits similar energy values albeit spread over a wider range (265-275 kcal/mole). And that the three peaks are within 2-3 kcal/mol of each other. We suspect that, due to these observations, this third polymorph is probably not observable experimentally. 41 42 290 Alpha phase Beta phase 40 285 20 280 275 0 270 20 265 40 30 20 10 0 10 20 30 t-SNE1 Figure 2.9: Energy distribution in kcal/mole for candidate structures plotted using t-SNE for two-dimensional embedding. t-SNE2 43 0.20 Beta Phase 0.15 0.10 0.05 255 260 265 270 275 280 285 290 295 0.20 Alpha Phase 0.15 0.10 0.05 255 260 265 270 275 280 285 290 295 Energy (Kcal/mol) Figure 2.10: Probability density of total energies for the case in which we use the number of clusters to be 2 in the k-means clustering algorithm. The top figure shows results for the cluster (polymorph) 1 and the bottom shows that for the cluster (polymorph) 2. Probability Density 44 0.20 Beta Phase 0.15 0.10 0.05 255 260 265 270 275 280 285 290 295 0.20 0.15 0.10 0.05 255 260 265 270 275 280 285 290 295 0.20 Alpha Phase 0.15 0.10 0.05 255 260 265 270 275 280 285 290 295 Energy (Kcal/mol) Figure 2.11: Probability density of total energies for the case in which we use the number of clusters as 3 in our k-means clustering algorithm. Top, middle, and bottom figures show results for the derivative of polymorph 1, as well as for polymorphs 1 and polymorph 2, respectively. Probability Density 2.4.4 Correlation among Lattice Constants Each of the points in a given cluster has a set of lattice parameters associated with it. We now analyze these parameters in order to identify any preferential (commonly occurring) lattice parameters in a given cluster. In a sense, this can be thought of as either the “degeneracy” of preferred lattice constants or as a measure of the “stiffness” of the parameters, using Sethna et al.’s terminology for parameters. [88] This analysis is shown in Figures 2.12 and 2.13 given that the number of clusters is 2 or 3, respectively. From Figure 2.12’s depiction of the distributions of unit cell parameters, it is clear that there are, indeed, highly stratified and energetically preferred unit cell parameters. For the case identified as having two clusters, we conclude that group 1 (shown in orange) which corresponds to having the lower energy values, the lattice parameters prefer to have values for a in the range of 12.2 − 14.7 Å, b between 5.8 − 7.1 Å, and c in the range 10.6 − 11.9 Å. While these ranges in the individual parameters would be considered to be broad from an experimental viewpoint, there are clearly only certain combinations of a, b, c parameters that lead to low-energy candidates. Group 2, which associates with the metastable polymorph, prefers to have values for a in the range 6.9 − 8.1 Å, b between 9.3 − 10.5 Å, and c in the range 10.6 − 11.2 Å. Using the number of clusters as three, we see in Figure 2.13 that the lattice constant values for group 1 (orange) and its derivative (green) lie in similar ranges. This validates our analysis using a k-means clustering algorithm which suggested these two clusters are structurally close. We notice particular differ- ences in the lattice length a and angles α and β. For the derivative polymorph (green), a values range from 12.2 − 14.8 Å while for the original polymorph (or- 45 ange) that contains the experimental beta polymorph, prefers values of a clus- tered around 14Å. Similarly, for the angle α, the derivative polymorph adopts a range of values from 83− 88 degrees, while for the original one, the values align around 89 degrees. For angle β, the derivative polymorph has values between 86 − 89 degrees and the original one ranges from 89 − 91 degrees. The overall result here is to recommend to researchers that they constrain their domain search for other polymorphs within the parameter ranges iden- tified above to look for the presence of other thermodynamically stable or metastable polymorphs. We also draw readers’ attention to Herbstein’s re- view [89], which warns that large discrepancies can be involved in measuring lattice lengths and angles. These discrepancies can be associated with differ- ences in approach taken by different experimental techniques and difficulties involved in low-temperature measurements. In contrast to the relatively wide range in lattice lengths, the unit cell angles, α, β, and γ that correspond to low energies prefer to adopt values within a rela- tively narrow range, between 84 − 90 degrees for α, 89 − 92 for β and 88 − 90 for γ. In Sethna’s parlance [88], this would suggest that the energy of the candidate lattice is “stiffer” in, i.e., is more sensitive to, γ than either α or β. Lenn et al. found a similar pattern for TIPS-pentacene. [1] The experimentally observed values of all six parameters lie exactly on our BayesOpt-predicted results that correspond to the lowest ener- gies. Experiments show that the stable beta phase has lattice constants {14.28, 6.29, 11.15, 90.00, 91.47, 90.00}, while the metastable β phase is char- acterized by {7.43, 10.04, 13.53, 82.83, 82.08, 88.63}; this is indicated in Figures 2.12 and 2.13. 46 47 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 6 8 10 12 14 16 6 8 10 10 11 12 13 14 15 1.0 a(Å) 1.0 b(Å) 1.0 c(Å) 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 85 90 85 90 85 90 (°) (°) (°) (a) 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 6 8 10 12 14 16 6 8 10 10 11 12 13 14 15 1.0 a(Å) 1.0 b(Å) 1.0 c(Å) 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 85 90 85 90 85 90 (°) (°) (°) (b) Figure 2.12: Distribution of lattice constants of polymorphs found by the k- means clustering algorithm using the number of clusters as two. (a) Cluster 1 (top subfigure) contains the experimentally found stable beta phase, and (b) cluster 2 (bottom subfigure) contains the experimentally found metastable alpha phase. Probability Probability Probability Probability 48 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 6 8 10 12 14 16 6 8 10 10 11 12 13 14 15 1.0 a(Å) 1.0 b(Å) 1.0 c(Å) 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 85 90 85 90 85 90 (°) (°) (°) (a) 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 6 8 10 12 14 16 6 8 10 10 11 12 13 14 15 1.0 a(Å) 1.0 b(Å) 1.0 c(Å) 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 85 90 85 90 85 90 (°) (°) (°) (b) Probability Probability Probability Probability 49 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 6 8 10 12 14 16 6 8 10 10 11 12 13 14 15 1.0 a(Å) 1.0 b(Å) 1.0 c(Å) 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 85 90 85 90 85 90 (°) (°) (°) (c) Figure 2.13: Distribution of lattice constants of polymorphs found by the k- means clustering algorithm using the number of clusters as three. (a) Deriva- tive polymorph (top subfigure), (b) cluster 1 (middle subfigure) that contains the experimentally found stable beta phase and (c) cluster 2 (bottom subfigure) contains the experimentally found metastable alpha phase. Probability Probability 2.5 Conclusions We have developed an algorithm based on a Bayesian Optimization technique that we have shown is capable of accurately predicting low-energy crystal struc- tures that are likely to be thermodynamically stable or metastable, using an ef- ficient machine learning-based search. In our test case, for diTMS-BTBT, our approach was able to select viable potential candidates from hundreds of po- tential structures by considering under a quarter of them. This demonstrated that a BayesOpt approach can significantly reduce the number of trials of can- didate materials, leading to a greatly reduced computational cost. Running a sequential and comprehensive search on complex systems, like the one studied in this work, would be close to intractable computationally or experimentally. It also has the significant advantage of being self-directed without user inter- vention or monitoring. In addition, our BayesOpt model is highly scalable and can lend itself to further development (such as the consideration of multiple information sources or multiple objective functions). From the point of view of the code’s ability to accurately predict thermody- namically stable and metastable polymorphs, we showed the accuracy of the method to be within the uncertainty of MD simulations (around 1-2 kcal/mol) and sufficiently accurate to identify promising candidates for experimental in- vestigation. We observed that the lattice parameters fell into narrow ranges that would be preferred by stable polymorphs, rather than metastable or un- stable combinations of unit cell parameters. As a result, our machine learning approach was able to predict unit cell parameters of both polymorphs that had been identified experimentally. The lattice parameter values we predicted for these two polymorphs using our accelerated search were in excellent alignment 50 with those found experimentally. In addition to the two polymorphs identified experimentally, we suggest that a third polymorph as a derivative of the beta phase might exist. But stabilizing it would require higher temperatures with slower heating-cooling cycles. Finally, it is worth remembering that, while ex- periments only identified two polymorphs, the computations have shown again (as for TIPS-pentacene, another organic semiconductor with a π-π core) that a “forest” of other kinetically stabilized polymorphs exist, separated by energies well below kT. While these are unlikely to be identifiable experimentally, this reminder of the richness of the situation may help us to rethink our views on polymorphic characteristics. 51 CHAPTER 3 MOLECULAR DESIGN RULES FOR ACCESSING DIFFERENT POLYMORPHIC TRANSITION MECHANISMS IN ORGANIC ELECTRONIC CRYSTALS The following chapter is based on a paper published in the Journal of Chemistry of Materials. [3] 3.1 Introduction Polymorphism is a condition in which a compound can adopt multiple crys- talline structural forms. Due to the differences in the structural packing of the polymorphs, these materials exhibit different physical properties. [90] In the case of organic semiconductors, polymorphs can have different charge carrier mobilities varying in several orders of magnitude. [2] This has attracted con- siderable research interest in the scientific community, [1, 26, 27, 91] especially to understand two major aspects of polymorphism in organic electronics: (a) to exploit this phenomenon as a way to enhance the device performance without changing the chemical structure and (b) to understand the fundamental rela- tionships between charge transport and crystal structure. However, since poly- morphs are often prone to structural reversibility, specific access and isolation of these structures in a consistent manner for large-area production presents a great challenge. Towards this aim, a molecular level understanding of the poly- morphic transition mechanism and pathways is essential, but is rarely studied. Two distinct polymorphic transitions have been studied before, known as the ’nucleation and growth’ and ’cooperative’ transitions. The ’nucleation and 52 growth’ transition transforms in a molecule-by-molecule fashion [92] causing significant changes in the “daughter crystal” structures. On the other hand, the “cooperative” transition [93] allows slight changes enabling single-crystal- to-single-crystal transitions and, thereby, preserving the structural integrity of the crystal. Nucleation and growth usually disrupts the integrity of the “par- ent” crystal, [94] while cooperative transition allows a close orientational rela- tionship between the “parent” and “daughter” crystal structure. The “nucle- ation and growth” transition correlates to larger differences in the polymorph structures, whereas the “cooperative” transition relates to smaller differences. However, such a correlation does not explain the molecular origin of this “co- operative” behavior. Therefore, in this work, we address this molecular-level question to better understand the cooperative transition in organic systems. Cooperative polymorphic transitions in molecular crystals are typically trig- gered by mechanical and/or thermal energy that lead to interesting properties such as the “shape memory” effect, [95] thermosalient effect, [96] and supere- lasticity. [97] However, studies investigating this polymorphic transition remain significantly low given their rare occurrence in molecular crystals. Reports in- vestigating the molecular mechanisms underlying cooperative transitions are even more scarce. Cooperative transitions in organic electronics was first discovered in ditert-butyl [1]benzothieno[3,2-b]- benzothiophene (ditBu-BTBT) and 6,1 3 - bis -(triisopropylsilylethynyl)pentacene (TIPS-pentacene) single crys- tals. [50] A major advantage of utilizing cooperativity in organic electronics is that such a mechanism preserves structural integrity and allows rapid, re- versible modulation of electronic properties. In this work, we hypothesize that the bulky side-chain rotation provokes this rare transition. With a molecular- level understanding of the mechanism behind the cooperative transition, we 53 can articulate molecular design rules to prompt cooperativity for future appli- cations in organic electronics and molecular devices. In this work, we investigate our hypothesis of side-chain rotation as a molec- ular design rule for triggering a cooperative transition in molecular crystals. We additionally demonstrate that the polymorphic transition mechanism is sensi- tive to even a single atom substitution within the side-chain. We substitute a carbon atom in the side-chain of ditBu-BTBT that exhibits a cooperative tran- sition to silicon in the new system, bis(trimethylsilyl) BTBT (diTMSBTBT). The bulkier silicon atom in the side-chain of diTMS-BTBT alters the neighboring environment to inhibit side-chain rotation, thereby converting the initial coop- erative transition mechanism to one of nucleation and growth. As a result, by a simple one-atom substitution in the side-chain, we have access to two poly- morphic transition pathways. Using this simple molecular design rule gives the advantage of accessing both polymorphic transition pathways and selectively utilize their advantages in organic electronic applications. In the following sec- tions, diTMS-BTBT and ditBu-BTBT will be abbreviated as diTMS and ditBu, respectively. 3.2 Transition Pathways of Nucleation and Growth versus Co- operative Transition in Organic Electronics A polymorphic transition usually occurs in molecular crystals following the “nucleation and growth” mechanism. Recently Chung et al. [50] discovered a cooperative polymorphic transition mechanism in single crystals of ditBu or- ganic semiconductor triggered by bulky side-chain rotation. Using differential 54 calorimetry scanning (DSC) technique, they discovered polymorphs in ditBu and diTMS, and demonstrated that polymorphic transitions happen at 151 °C (Figure 3.1a) and 71 °C (Figure 3.1b) upon heating for diTMS and ditBu, re- spectively. Note that the transition temperature significantly increased with the change of just a single atom from carbon to silicon in the side-chains. For the case of diTMS, our experimental collaborators (H. Chung and Y. Diao at UIUC) [3] discovered that, during the annealing process, the α crys- tals demonstrated polymorphic transitions (Figure 3.1c), while the β crystals showed no transition but sublimed. The α-to-β transition in diTMS was marked by a single nucleation site, with broad growth directions of the new polymorph, and a slow irreversible transition that lasted several minutes (Figure 3.1c). On the other hand, ditBu crystals showed a typical phase boundary line formation, a directional propagation of the new polymorph, and a fast reversible transi- tion that lasted seconds (Figure 3.1d). diTMS single crystals followed a single- crystal-to-polycrystal transition behavior, whereas ditBu single crystals showed single-crystal-to-single-crystal transition. The loss in the crystalline integrity of the diTMS crystal is caused by the large structural differences between the par- ent and daughter forms, typically observed in the nucleation and growth mech- anism. [94] However, due to the structural similarities between the polymorphs in ditBu, the crystals are able to preserve the structural integrity for multiple thermal cycles of cooperative transitions. [50] Therefore, the substitution of a single atom in the side-chains from silicon to carbon, with otherwise identical conjugated cores, leads to accessing two poly- morphic transition pathways. A cooperative transition preserved the structural integrity and allowed reversible access to the polymorphs, leading to the shape 55 56 Figure 3.1: Two polymorphic transition pathways of nucleation and growth and cooperative transition in single crystals of diTMS and ditBu. (a) DSC of diTMS powder showing reversible polymorphic transition. The molecular structure is shown in the inset. (b) DSC of ditBu powder showing reversible polymorphic transition. The molecular structure is shown in the inset. (c) Cross-polarized optical microscopy (CPOM) images of a single crystal of diTMS α form. The α form transitions to the β form upon heating after approximately 415 s. Upon cooling, the β form does not transition back. (d) CPOM images of a single crystal of ditBu LT form (blue birefringence). The LT form rapidly transitions to the HT form (brown birefringence) upon heating in approximately 1.25 s. Upon cooling, the HT form reversibly transitions to the LT form in 0.38 s. Reprinted from [3]. Copyright 2019 American Chemical Society. 57 diTMS-α diTMS-β ditBu-LT ditBu-HT T (°C) 25 27 25 97 a (Å) 7.55 6.35 6.15 6.33 b (Å) 10.10 11.36 10.65 10.42 c (Å) 13.59 14.31 14.18 14.63 α (°) 82.75 87.65 87.79 86.25 β (°) 82.17 90 90 90 γ (°) 88.52 90 90 90 V (Å3) 1020.14 1030.34 928.46 962.47 Table 3.1: Unit cell information for polymorphs of diTMS and ditBu. memory and function memory effects studied by Chung et al. [50] This serves as an ideal platform to investigate the impact of subtle changes of side-chains on the polymorphic transition pathway. 3.3 Computational Techniques Understanding the molecular mechanism behind polymorphic phase transi- tions is essential for controlling molecular packing for high-performance or- ganic electronics. In this study, we conducted Plane Wave Density Functional Theory (DFT) simulations to investigate the polymorphic transition pathways in diTMS and ditBu. We test our hypothesis of side-chain rotation as a molecu- lar design rule for triggering cooperative transition by calculating the rotational energy barrier associated with the polymorphs in the two systems. Using these calculations, we establish that, by comparing the molecular environment be- tween the two systems, we can pinpoint the molecular origin of each transition mechanism. 3.3.1 Density Functional Theory DFT is a quantum mechanical computational modeling technique that is used to investigate the electronic environment and nuclear structure of many-body systems. This theory is employed to solve Schroedinger’s equation as given by Eq. 3.1 ĤΨ = EΨ (3.1) 58 where E is the energy, Ψ the wavefunction, and Ĥ the Hamiltonian operator for a system of M nuclei and N electrons. Eq. 3.2 gives the mathematical expression of the Ĥ, that includes the sum of the kinetic energy of all particles in the system and the electrostatic interactions between them. rij , riA and RAB denote electron- electron, electron-nucleus and nucleus-nucleus distances, respectively, ∇ is the Laplace operator and ZA is the charge associated with nucleus A. 1 ∑N 1 ∑M ∑N ∑M ∑N ∑N ∑M ∑M Ĥ = − ∇2i − ∇2 ZA 1 ZAZB A + + + (3.2)2 2 r i=1 A=1 i=1 A=1 iA r R i=1 j>1 i j A=1 B=1 AB Since DFT allows the prediction and calculation of materials behavior based on the principles of quantum physics, it is the most powerful and accurate form of computational modeling technique to study materials’ properties and struc- ture. Using this theory, the properties of a many-electron system, including ground-state energies, reaction paths, atomization energies, etc., can be deter- mined with atomistic resolution. DFT is among the most popularly used meth- ods in condensed-matter physics, computational physics, and computational chemistry. This technique has been successfully employed for studying a wide range of problems ranging from simple atomic systems to molecular complexes, solids, quantum, and classical fluids, etc. However, its accuracy comes at the cost of extensive computational resources which limits the system size, i.e., the number of atoms that can practically be studied, to a few hundred atoms at best. The Schrödinger equation gives the time evolution of a wave function, which is a mathematical description of the quantum state of an isolated physical sys- tem. However, for any realistic system, it is impossible to calculate the exact so- lutions of this equation. A few approximations like Born-Oppenheimer and re- 59 lated concepts come in handy. The Born-Oppenheimer approximation assumes that the electrons move in the field of fixed nuclei. This essentially sets the ki- netic energy of the nuclei to zero and their potential energy to a constant Vext. DFT also uses the concept of “electron density” that is described as the proba- bility of finding electrons within a volume element d→−r . Another important con- ceptualization that helped in the development of DFT is the variational princi- ple for the ground state, which states that the ground-state energy is a functional (functions of another function) of the spatially dependent electron density and the nuclear potential Vext. However, despite advances in the new concepts and assumptions, the use of Schrödinger’s equation to solve for materials properties still remained evasive. Hohenberg and Kohn proposed two fundamental theorems that made it pos- sible to solve Schrödinger’s equation. The first theorem states that the external potential Vext, and hence the total energy, is a unique functional of the electron density n(r). This means that with the knowledge of the electron density, it is possible to compute all properties of the system, including total energy, atomic structure, etc. The proof can be consulted in the original paper. [98] The sec- ond Hohenberg-Kohn theorem states the ground-state energy can be obtained variationally, such that the density that minimizes the total energy is the ex- act ground-state density. This means that any electron density that satisfies the boundary conditions on the system will lead to an upper bound on the ground- state energy of the system unless that electron density is the true ground-state electron density itself. The development of these theorems marked the forma- tion of DFT. Due to these theorems, it is possible to theoretically solve for only the sim- 60 pler electron density, which is a 3-dimensional function of x, y and z and forego computing the complex 3N-dimensional wavefunction (where N is the num- ber of electrons in the system). However, the exact form of the functional that maps electron density and energy is unknown in practice. In reality, the self- interaction correction, exchange and Coulomb correlation of the kinetic energy and any electron-electron interactions are completely unknown. Here, approxi- mations based on the Kohn-Sham (KS) equations (Eq. 3.3) become necessary. In a KS formulation, the orbitals Ψi(r) of a reference non-interacting system with the same electron density as the real interacting one are used. An additional term, Vxc, called the exchange-correlation potential is introduced to account for all the unknown contributions to the Hamiltonian. Vext and VH in Eq. 3.3 are, respectively, the external potential due to the presence of the nuclei and the Coulomb potential.  1 ∑N − ∇2 + Vext(r) + VH[n] + Vxc[n] Ψi(r) = iΨi(r) (3.3)2 i=1 The KS equations help with solving the intractable problem of N interacting particles to a much simpler approach of solving N non-interacting particles us- ing the exchange-correlation contribution. However, despite this simplification, the exact formulation of the exchange-correlation potential is still unknown. Some approximations like The Local Density Approximation (LDA) and Gener- alized Gradient Approximation (GGA) become necessary. LDA approximates the exchange-correlation potential as dependent only on the electron density at each point. One limitation of this idea is manifested as a significant underesti- mation of the bandgap energy of semiconductors. GGA improves upon LDA by including the gradient of the electron density to account for non-homogeneity. 61 3.3.2 System Details To test our hypothesis, we conducted plane wave DFT simulations to obtain geometry-optimized structures and the corresponding energetics of the rota- tional barrier of the side-chains in diTMS and ditBu. All the simulations were performed using the Quantum Espresso code. [99] Our system consisted of a periodic unit cell lattice containing 96 atoms. Due to the intensive nature of the calculations involved for a system even of this size, simulating bigger sys- tem sizes was essentially out of reach even with access to a local supercomputer. Perhaps more importantly, since the rotation of the bulky side-chains will affect, and be affected by, chiefly, its nearest neighbors, we expect that no significant changes in the energy barriers are likely to result if we increased the size of the unit cell, but were unable to test this assumption. We used a cutoff energy of 60 Ry and a 2 × 4 × 2 k-mesh for the LT and HT phases of ditBu and the α phase of diTMS, and we used a 4 × 3 × 2 k-mesh for the β phase of diTMS to achieve a targeted convergence of below 0.005 eV in energy. The simulations were per- formed using Perdew-Zunger pseudopotentials within the LDA level of theory. 3.4 Results We created a test set of distinctly different initial configurations that were ob- tained by rotating one side-chain in a crystal in 5° increments until a full rotation of 120° was made (threefold symmetry). Each of these initial configurations was then run in a geometry-optimized DFT simulation (using Quantum ESPRESSO) such that the side-chain atoms under consideration remained fixed at their ini- tial positions, while the rest of the atoms were allowed to relax around this 62 rotational displacement. This was done to make sure that there were no arti- facts related to van der Waals repulsion. Each of these simulations resulted in a calculation of a characteristic energy value associated with a particular angle of rotation induced in the system. The results for both polymorphs of diTMS and ditBu are shown in Figures 3.2 - 3.6. For diTMS, the rotation energy was much larger for the α form, in which the maximum was ca.17.9 kcal/mol, than for the β form, ca.9.1 kcal/mol (figure 3.2, 3.3, 3.6). We observe that the rotation energy decreases in the β form after a large change in structure via nucleation and growth. For ditBu, the rotational energy for the LT form is ca.3.3 kcal/mol, while for the HT form, it is ca.1.7 kcal/mol (Figure 3.4, 3.4, 3.6). After a cooperative transition occurs, the molecular envi- ronment favors side-chain rotation, hence lowering the rotational energy barrier for ditBu side-chains. Comparing the diTMS with the ditBu system before tran- sition, the rotational energy of the α form of diTMS (ca.17.9 kcal/mol1) is more than a factor of 5 higher than that of the LT form of ditBu (ca.3.3 kcal/mol), confirming our observation that the diTMS side-chains are more interlocked compared to the ditBu side chains. After transition, the rotation energy of the β form of diTMS (ca.9.1 kcal/mol) is also higher than that of the HT form of ditBu (ca.1.7 kcal/mol), indicating that the diTMS side-chains remain static af- ter nucleation and growth, while the ditBu side-chains become disordered after cooperative transition. A single-atom substitution of the ditBu bulky side chain leads to stronger intra- and interlayer interactions and thus a more restricted packing environ- ment of the diTMS system that constrains the rotation of the side-chains. This mechanistic analysis confirms our hypothesis that the rotation of the tBu side- 63 64 Figure 3.2: Energy barrier associated with rotating a side-chain in a single crys- tal of diTMS-α. Simulation snapshots of key rotational points are shown on the graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms in the side-chain that were rotated in this study; gray, yellow, pink, and white indicate carbon, sulphur, silicon, and hydrogen atoms, respectively. 65 Figure 3.3: Energy barrier associated with rotating a side chain in a single crystal of diTMS-β. Simulation snapshots of key rotational points are shown on the graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms in the side-chain that were rotated in this study; gray, yellow, pink, and white indicate carbon, sulphur, silicon, and hydrogen atoms, respectively. 66 Figure 3.4: Energy barrier associated with rotating a side-chain in a single crys- tal of ditBu-LT. Simulation snapshots of key rotational points are shown on the graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms in the side-chain that were rotated in this study; gray, yellow, and white indicate carbon, sulphur, and hydrogen atoms, respectively. 67 Figure 3.5: Energy barrier associated with rotating a side-chain in a single crys- tal of ditBu-HT. Simulation snapshots of key rotational points are shown on the graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms in the side-chain that were rotated in this study; gray, yellow, and white indicate carbon, sulphur, and hydrogen atoms, respectively. 68 25 diTMS form diTMS form 20 15 10 5 0 0 20 40 60 80 100 120 Side chain rotation angle (°) 7 ditBu LT form 6 ditBu HT form 5 4 3 2 1 0 0 20 40 60 80 100 120 Side chain rotation angle (°) Figure 3.6: Comparison of energy barrier of rotation for diTMS α and β phases (top) and ditBu low temperature (LT) and high temperature (HT) forms (bot- tom). The more interlocked side-chain environment of diTMS (which has a Si atom in the side-chain) shows a much higher energy barrier compared to the more flexible environment of ditBu (which has C in the side-chain). Energy Barrier (kcal/mol) Energy Barrier (kcal/mol) chains drives a cooperative transition, whereas hindering the rotation of TMS side-chains leads to a nucleation and growth transition. Therefore, promoting the rotation of side-chains can be used as a molecular design rule to access the rarely observed cooperative transition. Modifying the side-chain can also serve as a useful tool to access both polymorphic transition pathways. 3.5 Conclusions The use of plane wave DFT simulations allowed us to study the experimen- tal findings of cooperative polymorphic transition in ditBu and nucleation and growth mechanism in diTMS, and rationalize them in terms of rotational side- chain barrier in the molecule. Our results show that less bulkiness in the side- chain of tBu causes a low rotational barrier, on the order of 2-4 kcal/mole, in both LT and HT phases. This triggers a cooperative transition mechanism and also allows for rapid structural reversibility between the polymorphs. However in TMS, which involves the presence of a bulkier atom, Si, in the side-chain leads to a more restricted packing environment for the molecule, thereby constraining the rotation of the side-chains. A higher rotational barrier locks the side-chains in place and forces a nucleation and growth mechanism for the transition. This mechanistic insight confirms our hypothesis and establishes side-chain rotation as a molecular design objective to access the rarely observed transitional path- ways and that can be selectively accessed by this simple molecular design rule. We anticipate that our study has broad implications beyond organic electronics. This study is the first to demonstrate that polymorphic transition pathways are extremely sensitive even to a single atom substitution in the molecular struc- 69 ture, and that bulky side-chain rotation is a simple molecular design rule to modulate the transition mechanism. In this work, we demonstrate selective trig- ger of both the nucleation and growth as well as cooperative transition mecha- nism by a single atom substitution. Our computational approach validates the experimental findings and ex- plains the underlying molecular principles behind polymorphic phase transi- tion. Using plane wave DFT calculations, we were able to accurately probe the molecular scale of phase transitions and develop a valuable insight into un- derstanding the correlation between molecular design packing and preferred transitional pathways. 70 CHAPTER 4 ELUCIDATION OF STRUCTURE-PROPERTY RELATIONSHIPS IN A WOVEN CLASS OF MATERIALS Parts of this chapter are inspired by a paper published in the Journal of Applied Materi- als and Interfaces. [5] 4.1 Introduction Metal-organic frameworks (MOFs) represent the first class of nanoporous ma- terials synthesized by the reversible coordination of bonds between metal ions and organic ligands. [36, 100–102] In the past decade or so, these ma- terials have garnered huge interest from the scientific community for their widespread use in a variety of applications, including gas storage, [103, 104] separation, [105–108], catalysis [109, 110] sensing [111, 112] and the capture of toxic chemicals. [113, 114] Due to their high porosity and the accessibility of their high surface area, MOFs have gained a reputation for being suitable for reliable, safe, and low-energy means of dense gas fuel storage. For example, HKUST-1 [Cu3(1,3,5-benzenetricarboxylate)2] showed an outstanding volumet- ric methane uptake capacity of 261 cm3 (STP) cm−3 (65 bar, 298 K), [115] making it the only nanoporous material to effectively reach the the U.S. Department of Energy’s ambitious goal of 263 cm3 (STP) cm−3 at 65 bar and 298 K. [116] Although MOFs hold huge promise for use in their pure form, their combi- nation with confined nanoparticles during crystal growth have produced com- posites with exciting new properties. [117] Examples include nanostructured 71 carbon, [118] polymers, [119] metal oxides, [120] and metals, [121] encapsu- lated into aggregates of MOF crystals and used for applications including catal- ysis [122,123] and gas storage. [118] A particularly promising example involves HKUST-1 crystals (a MOF) functionalized with confined Palladium nanoparti- cles. This composite has shown one of the highest reported hydrogen gas up- take capacities in preliminary experiments conducted by our collaborators, Prof. David Fairen-Jimenez and Bethany M. Connolly, at the University of Cambridge (unpublished work). Adsorption-based storage of hydrogen offers cutting-edge technology, as it provides a cost-effective way to store the gas at much lower pressure and in much higher amounts than in high-pressure (ca. 140 bar) tanks. Another member of the porous materials family, called Covalent Organic Frameworks (COFs), has emerged as among the most promising nanoporous materials with desirable features, such as tuneable topology, low mass density, organic metal-free backbone, well-defined porosity, a huge accessible surface area, and high mechanical stress resistance. [124–127] COFs, similar in notation to MOFs, are synthesized of organic building units (“connectors” and “linkers”) woven in a fabric-like fashion to generate extended ordered structures in both 2D [124] and 3D. [128] Strong covalent bonds, rendered through the interlac- ing of lightweight organic threads, impart COFs with outstanding properties of low density and high elasticity, and additionally, the ability to retain their shape under deformation and stress. As the first example of its kind, COF-505 (devel- oped by Yaghi and co-workers [129]), exhibited a ten-fold increase in elasticity between its metallated form, an integral part of its synthesis, and its stretchy, demetallated woven form. Additionally, unlike polymers, COFs offer the sig- nificant advantage of pre-designing their structure pertinent for the desired ap- plication. Due to these noteworthy advantages, COFs have found applications 72 in gas storage, [130, 131] photoelectronics, [126] catalysis, [132] and chemosens- ing. [133] In this study, we present a detailed computational analysis of structure- property relationships in two high-performing nanoporous materials- COF-506 and a nanocomposite of HKUST-1 with Pd nanoparticles. Both these mate- rials have shown to exhibit outstanding properties which make them excel- lent choices for gas storage and separation applications. For COF-506, we in- vestigate its pore size distribution and, relatedly, study the diffusion of solute molecules inside the woven material. For the HKUST-1 functionalized with Pd nanocomposites, we conduct pore size distribution calculations to understand the origin of the composite’s outstanding capacity and hence its high density gas storage. 4.2 Molecular Morphology of COF-506 Synthesized in 2018 by Yaghi and co-workers [134], COF-506 has demonstrated a remarkable ability to take up dyes and other small molecules. In this section, we lay out our computational strategy, along with the approaches we under- took, including diffusional studies of solute particles inside the material and then its pore size analysis, to provide a molecular-scale understanding of the above observed phenomena. 73 4.2.1 Computational Strategy and System Details Covalent organic weaves are an exciting class of materials that not only exhibit unusual properties but can also be easily tailored for desired applications. Such engineering could be achieved by changing the chemistry of the building com- ponents or their functionalization. However, developing a critical understand- ing of atomic-scale conformations in porous materials is essential for the con- struction of a desired woven material. Even if this process is guided by experi- ence and intuition, a trial-and-error design method would be very tedious and may not fully exploit the untapped potential of possible combinations of linkers and connectors. The ideal way to facilitate this process is to computationally model these materials, preferably with an atomic-scale force field model, describing both short- and long-range inter- and intra- molecular forces of interactions. This ap- proach using a Molecular Dynamics (MD) technique has been used to describe the diffusion of small molecules in MOFs and Silica zeolites. [135–137] The ac- curacy of the structural and dynamical properties, as predicted by atomic-scale MD, is largely determined by the effectiveness of the chosen force field. Exam- ples of popularly used force fields include Dreiding, [72] CHARMM, [73] and OPLS. [71] In their seminal work of 2016, the Yaghi group at Berkeley synthesized a new material called COF-505 [129] and later developed a similarly constructed COF called COF-506 [134](examined in this study) in 2018. At the time, none of the previously published force fields contained sufficient parameters to describe all the atomic interactions in this complex material, COF-505 (Fig. 4.1). In a sepa- rate study, a member of the Clancy group, Dr. Yaset Acevedo, conducted a force 74 75 Figure 4.1: Pictorial decomposition of the two interwoven strands comprising COF-505. Each color represents a different strand. Copper atoms are shown as red spheres. In the diagrams labeled “mesh 1” and “mesh 2” there are eight independent organic chains, leading to sixteen chains in the image labeled as “COF-505”. [5] 76 (a) (b) Figure 4.2: (a)Metallated COF-505. Snapshot was taken from an MD simulation. (b) Unit cell of COF-506 achieved by replicating the unit cell twice in the b- and c- directions. [5] field parameterization study and developed the roughly 60 missing parameters for COF-505 to create an expanded parameter set for the OPLS force field. OPLS was chosen due to its widespread use for organic molecules and for its numer- ous parameter set expansions. [138,139] OPLS has the distinction of having one of the most inclusive lists of parameterized atom types. OPLS was originally parameterized to be accurate for condensed phases of organic molecules and was validated to fit against experimental data for density and heat of vaporiza- tion of organic liquids. These properties made it an ideal choice to model wo- ven COF-505. The developed parameters for COF-505 were validated against its experimentally determined crystal structure, and elasticity since these were essentially the only experimental information known about this new material. More details on the development and optimization methodology, including the full parameter set, can be found in [5]. In their seminal work of 2016, the Yaghi group at Berkeley synthesized a new material called COF-505 [129] and later developed a similarly constructed COF called COF-506 [134](examined in this study) in 2018. At the time, none of the previously published force fields contained sufficient parameters to describe all the atomic interactions in this complex material, COF-505 (Fig. 4.1). In a sepa- rate study, a member of the Clancy group, Dr. Yaset Acevedo, conducted a force field parameterization study and developed the roughly 60 missing parameters for COF-505 to create an expanded parameter set for the OPLS force field. OPLS was chosen due to its widespread use for organic molecules and for its numer- ous parameter set expansions. [138, 139] Due to similarity in terms of chemical composition between COF-506 and COF-505, we have used the same force field model and parameters for COF-506 that were developed for COF-505. In the area of force field development, transferability of parameters is a key challenge. 77 But this is rarely achieved in practice, especially when these molecular models are extended to tackle more specialized and complex systems. Our use of ex- actly the same parameter set for COF-506, allowed us to validate the model in- dependently of the fitting process. Compared to COF-505, metallated COF-506 unit cell has a monoclinic geometry, with a slight tilt in the x−z direction. Struc- tural information, including lattice parameters and fractional coordinates of the COF-506- Cu unit cell were adopted by powder X-ray diffraction. [134] In spite of the different crystal structure of COF-506 (compared to COF-505), the model developed for COF-505 model worked extremely well in predicting the proper- ties of COF-506. This was shown more encouragingly in its ability to provide an accurate quantitative prediction of the value for the uptake of tetrahydrofuran (THF) by COF-506, predicting a value that was 85 percent of the experimental value. [5] Despite no parameter fitting conducted for COF-506, achieving a THF uptake capacity so close in agreement with the experimental value validates our choice of using a COF-505 model to represent COF-506. And therefore, all simulations in this study were performed using the developed OPLS parameter extension in LAMMPS [78] (Sandia’s open source MD code) software. 4.2.2 Multiple Time Origin-based Calculations of Self-Diffusion in COF-506 The importance of woven materials lies, to a considerable extent, in their effec- tiveness for separations and other properties related to their porous nature. In order to understand this structural-property relationship, we studied the diffu- sional characteristics of solute molecules inside COF-506. 78 Diffusion can be described by Fick’s law, which states that the flux, j, of the diffusing species is proportional to the negative gradient in the concentration of that species c [140]: j = −D∇c (4.1) where D is the constant of proportionality and is called the diffusion coefficient or diffusivity. In this research study, we calculated self-diffusivities for three representative solutes, Helium, CO2 and THF, in COF-506-Cu (i.e., a metallated form of COF-506 with Cu as the central metal atoms that help facilitate the in- terweaving of the threads). In order to compute these diffusivities, we first compute the concentration profiles of each species. Assuming the species to be at the origin of the coordi- nate frame at t = 0, the time evolution of the concentration profile according to Fick’s law can be written as: ∂c(r, t) + ∇. j(r, t) = 0 (4.2) ∂t Combining Eq. 4.1 with Eq. 4.2 gives: ∂c(r, t) − D∇2c(r, t) = 0 (4.3) ∂t with the boundary condition c(r, 0) = δ(r) where δ(r) is the Dirac delta . Solving Eq. 4.3 w(e get:1 r2 ) c(r, t) = exp − (4.4) (4πDt)d/2 4Dt Here d is the dimensionality of the system. For c(r, t), we do not need to know its value, just the time-depende∫nce of its second moment: 〈r2(t)〉 = dr c(r, t) r2 (4.5) 79 where the imposed condition ∫is: dr c(r, t) = 1 The time evolution of 〈r2(t)〉 can be obtained by multiplying Eq. 4.3 by r2 and integrating over all space∫, which gives: ∫ ∂ dr r2 c(r, t) = D dr r2 ∇2c(r, t) (4.6) ∂t The left-hand side of the above Eq. is simply ∂〈r2(t)〉 ∂t Applying partial integration to the right-hand side, we obtain ∂〈 ∫r2(t)〉 = D ∫ dr r2 ∇2c(r, t)∂t ∫ = D ∫ dr ∇ .(r2∇c(r, t)) − D∫ dr ∇ r2 .∇c(r, t) = D dS∫(r2 ∇c(r, t)) − 2D d∫r r.∇c(r, t) (4.7) = 0 − 2D ∫dr (∇.r c(r, t)) + 2D dr (∇ .r) c(r, t) = 0 + 2dD dr c(r, t) = 2dD The elegant form obtained in Eq. 4.7 was first derived by Einstein. [141] In the above Eq., D is a macroscopic property, while 〈r2(t)〉 has a microscopic inter- pretation. This quantity is known as the “mean squared displacement” which the species has traveled over the course of time, t. This relationship allows the diffusion coefficient to be calculated from molecular simulation data. For every particle i, we measure the distance traveled in time t, ∇ ri(t), and we plot the mean square of these displace∑ments as a function of time t: 〈∇ 1 N r(t)2〉 = ∇r 2 N i (t) (4.8) i=1 80 The slope of the best fit curve to the linear region of the mean squared displace- ment curve gives us an estimate of the diffusion coefficient. Using the relation obtained in Eq. 4.8, we calculated diffusion coefficient, DT , of the motion of solute molecules across the woven material, as a function of time, r(t): 1 d ∑N [ ] DT = lim 〈 ri(t + t0) − r (t 2i 0) 〉 (4.9)2nN t→∞ dt i In Eq. 4.9, DT is the translational diffusion coefficient, n is the dimensionality of the system, and N is the number of particles. ∆r(t) = r(t + t0) − r(t0) defines the translational displacement of the particle from time t0 to time t. The angle brackets in Eq. 4.9 indicate a thermodynamic average over many starting times, t , constituting an ensemble of information. In our case, r(t) constitutes the total displacement averaged over solute particles. 4.2.3 Autocorrelation Functions The motion of solute molecules (He, CO2, THF) can be highly correlated, espe- cially at the beginning of the simulation when the system has not reached an equilibrium state and is, perhaps, in a metastable state. Time correlation func- tions [6] measure how the value of a dynamic quantity, A(t), may be related to another quantity, B(t). If A(t) and B(t) represent two different time-independent quantities, the time correlation function∫C(t) can be defined as:τ C(t) = lim→ ∞1 A(t0)B(t0 + t) dt0 (4.10) τ τ 0 The integral represents an average accumulated over many time origins t0, with each origin taken from a system at equilibrium. The quantity A is sampled 81 82 Figure 4.3: Representation of a data set storing values at a total number of L times, ti → {i = 0, L − 1}. The method of multiple time origins loops over all available time origins, tk → {k = 1,M} to calculate the time correlation function. This case only represents the schematic of one time delay of t = 3 ∆t. However, in practise, the method averages data for multiple values of time delay ranging from ∆t = {1, L − 1} [6]. at the time-origin, while B is sampled after a delay time t. Thus, the correlation function depends on the length of the delay. However, as it is an equilibrium quantity, it is independent of the time- origin. The time average can then be written as: C(t) = 〈A(t0)B(t0 + t)〉 (4.11) If A and B are unrelated, then they are uncorrelated, and then C can be written as: C → 〈A〉〈B〉 (4.12) A and B are said to be correlated if A(t0) causes, or contributes to, B(t0 + t). When A and B are different quantities, then C is called a “cross-correlation function.” Finally, if A and B denote the same quantity, then C is called an auto- correlation function. C measures how the value of A at t0 + t is correlated with its value at t0. Auto-correlation functions can generally be divided into two types: • A one-particle function where any dynamic quantity A(t) is a property of the individual molecules, for example, the velocity auto-correlation func- tion is this type of function. • A collective function in which A(t) depends on the accumulated contribu- tions from all the molecules in a system. Since the solute molecules need to find porous pathways inside the woven material in order to traverse through the structure, the motion of these particles is typically slow and can result in a lot of noise in the data. And therefore, as 83 a good practice, the approach of sampling multiple time-origins was employed in this study. [6] This method samples the quantity at thousands of different ori- gins, combines the samples algebraically, and averages them over many sample times. Data from multiple overlapping time-origins are averaged and fitted us- ing linear regression to find the diffusivity. Any point in time can be considered as the time origin of the calculation. Center of mass (COM) positions of the molecules are saved as a function of time, and the mea∑n squared displacement (MSD) for a given time-origin is calculated [xave = 1/n xi (summed over x = 1 to n)](n is the number of windows with the same time delay). The solid lines indicated in Fig. 4.4 indicate the displacement of solute molecules in COF-506- Cu as calculated using the multiple time-origins method, while the dashed lines indicate a linear regression fit used to calculate their diffusivities. A schematic representation of the algorithm is shown in Fig. 4.3. 4.2.4 Diffusion Results A key objective in simulating COF-506-Cu was to investigate the uptake and motion of guest molecules through its woven structure. To that end, we looked at the tendency of molecules to diffuse in the COF-506-Cu, using He, CO2, and THF as the solutes, tested at four temperatures (200 to 500 K in 100 K intervals) and pressure values of −1000 × 105 Pa, 1 × 105 Pa, and 1000 × 105 Pa. Data for each gas for ranges of temperatures and pressures were collected over a pe- riod of 10 ns. Fig. 4.4 shows example calculations depicting diffusivities of He, CO2 and THF in metallated COF-506-Cu. As seen in the Figure, the diffusional data is highly correlated given the fact that solute particles need to find porous pathways inside the COF material to be able to traverse through it. In order 84 to remedy this problem, the method of multiple time-origins was undertaken (discussed in section 4.2.2). Note that diffusion over these short time frames for solute molecules is probably not accurate beyond providing the order of mag- nitude of the diffusion coefficient; however, the trends should be instructive. The blue lines in the Fig. 4.4 indicate the displacements of the solute inside a COF-506 as calculated using the multiple time-origins method, while the black line indicates a linear regression fit used to calculate their diffusivities. For com- parison, the red curve denotes MSD of solutes as calculated using a single time- origin method, which is much more inaccurate. At 300 K and 1 atm, the self-diffusivities of He and CO2 in COF-506-Cu are 13 × 10−6 and 2.8 × 10−8 cm2/s, respectively (Fig. 4.5 (a) and (b)). The non- interpenetrated structure of COF-506-Cu facilitates the free movement of small guest molecules inside the weave. However, compared to its diffusion rate in rigid COFs, such as TS-COFs and PI-COFs, or MOFs such as MOF-5, diffusion in COF-506-Cu is slower by an order of magnitude. [136, 142] Despite the large range in temperature, the diffusion coefficient changes un- remarkably (e.g., ranging from 0.5 − 2.2 × 10−5 cm2/s over 300 K at 1 atm for Helium). Pressure and temperature changes display little effect below 400 K. The guest-accessible pores within the COF-506-Cu weave can undergo more flexibility in terms of expansion and contraction in pore size due to its single layer structure. Self-diffusion of THF in COF-506-Cu (Fig. 4.5 (c)) at 300 K and 1 atm is 5 × 10−9 cm2/s, which is far slower than diffusion of He and CO2 in COF-506, by factors of 600 and 60, respectively. Closer observation of THF’s diffusional trajectories provides some valuable insights into its interactions with the COF- 85 86 5 Helium at 200K and 1×105Pa Co2 at 400K and 1000×10 Pa Single Time Origin Single Time Origin Multiple Time Origin 3000 Multiple Time Origin D= 8.508e-06 2/ D= 1.349e-07 cm 2/s cm s 40 2500 30 2000 1500 20 1000 10 500 0 0 0 1 2 3 4 5 0 1 2 3 4 5 time (ns) time (ns) (a) (b) THF at 300K and -1000×105Pa 30 Single Time Origin Multiple Time Origin D= 4.326e-08 cm2/s 25 20 15 10 5 0 0 1 2 3 4 5 time (ns) (c) Figure 4.4: Mean squared displacements (MSD) of solutes (a) Helium, (b) CO2 and (c) THF with respect to time. The red curves indicate the MSD measured using a single time-origin, known not to be representative of the average result. The blue graphs improve the statistics by measuring the MSD using multiple (100s of) time-origins (explained in section 4.2.2). The drop-off at the end of the simulations reflects the fewer origins left by that time and are not representative of accurate values at that point. r2 (Å2) r2 (Å2) r2 (Å2) 87 Helium 2.5 Pressure 1 ×105 Pa Pressure 1000 ×105 Pa 2.0 1.5 1.0 0.5 0.0 200 250 300 350 400 450 500 Temperature (K) (a) CO2 7 Pressure 1 ×105 Pa Pressure 1000 ×105 Pa 6 5 4 3 2 1 0 200 250 300 350 400 450 500 Temperature (K) (b) Diffusion Coefficient (10 7cm2/s) Diffusion Coefficient (10 5cm2/s) 88 THF 40 Pressure 1 ×105 Pa 35 30 25 20 15 10 5 200 250 300 350 400 450 500 Temperature (K) (c) Figure 4.5: (a) Self-diffusivity of He in COF-506-Cu from 200 to 500 K at two pressures (1 atm (blue) and 1000 atm (green)). (b) Self-diffusivity of CO2 in COF-506-Cu from 200 to 500 K and two pressure values as denoted by the inset color key. (c) Self-diffusivities of THF in COF-506-Cu at 1 atm as a function of temperature (200 to 500 K). Error bars represent the standard error. Dot-dashed lines join the data points as a guide to the eye. Diffusion Coefficient (10 9cm2/s) 89 Self-diffusivity at 1 bar Pressure Temperature Helium CO2 THF (K) 200 8.51e-06 2.03e-08 3.21e-09 300 1.30e-05 2.88e-08 5.00e-09 400 3.86e-06 2.28e-07 5.95e-09 500 2.26e-05 1.39e-07 3.88e-08 Table 4.1: Comparison of self-diffusivities of He, CO2, and THF within metal- lated COF-506 over a range of temperatures at 1 bar pressure. Self-diffusivity at 1000 bar Pressure Temperature Helium CO2 (K) 200 2.22e-07 6.20e-09 300 6.93e-07 5.77e-08 400 1.25e-06 1.35e-07 500 2.98e-06 6.32e-07 Table 4.2: Comparison of self-diffusivities of He, CO2, and THF within metal- lated COF-506 over a range of temperatures at 1000 bar pressure. 506-Cu weave. Despite encountering a somewhat tortuous path, He and CO2 can readily find a diffusional path through the COF weave. In contrast, the non- aromatic (but cyclic) nature of THF, in concert with its lone pair of electrons on the oxygen atom, results in strong binding between THF and copper ions, which significantly constrains the mobility of the THF molecules. This electrostatic effect is believed to play a more significant role than the size difference between the two types of molecules. Next, in order to understand the physics behind the diffusional behavior, we conducted Arrhenius calculations for diffusivities of representative solutes, He, CO2, and THF, inside COF-506-Cu. A typical Arrhenius Eq. looks like: (−E ) D = D0 exp a (4.13) kt where D0 is the pre-exponential factor and Ea is the activation energy. Gen- erally, a lower activation energy means fewer barriers to diffusion. Thus the solute molecules are more able to diffuse and, consequently, show much more enhancement in their motion. Activation energies can exhibit a wide range of values, for example 17.4 to 55 kcal/mol for diffusion of N2, CO2, and O2 in BCC, FCC, HCP metals, [143] 4.6 to 6.9 kcal/mol for O2 diffusion in YBa2Cu3O7 grain boundaries. [144] Our activation energies shown in tables 4.3, 4.4, and 4.5 fall into that “ballpark.” For all three solutes, the results exhibit noisy variations of the diffusion co- efficient with the inverse of temperature (Fig. 4.6,) albeit all three solutes follow typical Arrhenius behavior: (a) He diffusion at 1 bar is roughly invariant with temperature, as one might expect for such a small solute moving in a flexible 90 91 Helium 11 12 13 14 5 15 Pressure 1 ×10 PaPressure 1000 ×105 Pa 20 25 30 35 40 45 50 104/T (K 1) (a) CO2 15 16 17 18 Pressure 1 ×105 Pa 19 Pressure 1000 ×10 5 Pa 20 25 30 35 40 45 50 104/T (K 1) (b) ln D (cm2/s) ln D (cm2/s) 92 THF 17.0 17.5 18.0 18.5 19.0 19.5 Pressure 1 ×105 Pa 20 25 30 35 40 45 50 104/T (K 1) (c) Figure 4.6: Plots depicting the Arrhenius relation between the self-diffusion coefficient and the reciprocal of temperature for (a) He, (b) CO2, and (c) THF diffusing in COF-506-Cu. The temperature varies from 200 to 500 K in 100 K intervals. Dotted lines denote the best fit to the Arrhenius data. ln D (cm2/s) 93 Helium Pressure (bar) D (cm20 /s) Ea (kcal/mol) 1 5.55e-06 -0.15 1000 1.18e-05 1.62 Table 4.3: Activation energy and pre-exponential factor calculated for diffusion of Helium in metallated COF-506. A negative value in the activation energy for 1 bar is attributed to artifacts in simulation. CO2 Pressure (bar) D0 (cm2/s) Ea (kcal/mol) 1 7.77e-07 1.53 1000 7.64e-06 2.87 Table 4.4: Activation energy and pre-exponential factor calculated for diffusion of CO2 in metallated COF-506. THF Pressure (bar) D0 (cm2/s) Ea (kcal/mol) 1 4.61e-08 0.83 1000 4.61e-08 0.61 Table 4.5: Activation energy and pre-exponential factor calculated for diffusion of THF in metallated COF-506. framework; (b) CO2 diffusion shows noisy, but roughly linear, relationships; (c) the temperature-dependence of diffusion of THF in COF-506-Cu is consistent with typical Arrhenius behavior, except at the lowest temperature where the diffusion looks unexpectedly and unexplainably high. However, these are rel- atively short timescale simulations for the diffusion of solute molecules, so it is probably more prudent to only look at trends and not look for quantitative data beyond the order of magnitude of the diffusion. Finding trends and order of magnitude diffusion characteristics were the goals of the diffusion study here. 4.2.5 Pore Size Distribution Algorithm We developed a new algorithm to fully characterize the pore size distribution (PSD) for COF-506-Cu and expand our understanding of the morphology of the weave. We first defined a spherical probe of radius 1.5 Å to check whether the space inside the simulation cell is porous or not. We scanned over the entire area in the simulation box and identified those regions which are not occupied by atoms. Fig. 4.7 (a) shows a simulation snapshot with Helium diffusing through porous pathways inside metallated COF-506. Fig. 4.7 (b) represents the idea of the pore-finding methodology employed in this study. The schematic shows a probe (in blue) that “rasters” across the simulation box and identifies any re- gions not occupied by atoms (shown in yellow). In order to be considered a porous space, we set the distance between the probe point and the center of a neighboring atom to be greater than the sum of (a) the atomic radius of the atom associated with that distance, (b) the radius of the probe and (c) 1 Å, as shown below: 94 Atom Radius (Å) C 0.77 H 0.31 N 0.70 Cu 1.173 B 0.80 F 0.64 Table 4.6: Atomic radii of each atom type in the COF-506-Cu system used to calculate the porous spaces. d = ratom + rprobe + 1 (4.14) where ratom is the atomic radius of each atom type and rprobe is the radius of the probe. These two terms are necessary to ensure that there is no overlap between any atoms and the area being searched by the probe. A 1Å window size is added to ensure that the point is far enough away from any neighboring atoms that another atom could easily be inserted at that point. Table 4.6 shows the atomic radii of each atom type present in the system. If the point searched is determined to be within a porous space, the coordi- nates of that point are stored, and then we search for the next point that does not overlap with space. This next point is set up such that it lies at a distance of the probe radius from the center of the previous probe. This “backtracking” is ap- plied to ensure that we define the extent of the pores with as fine a resolution as possible. A representation of such a probe mesh-grid is shown in Fig. 4.8. Once all such porous spaces have been identified, we group them into contiguous spaces, which then constitute pores. Note that these independent, contiguous spaces will be disjointed from each other. After all the porous spaces have been 95 96 (a) (b) Figure 4.7: (a) A simulation snapshot of COF-506-Cu from MD calculations showing patches of porous spaces inside the weave. Color key: Carbon in red, hydrogen in blue, nitrogen in yellow, copper in white, boron in cyan, fluorine in green, helium in orange. (b) A schematic depicting a probe in blue that rasters the simulation box with atoms (in yellow) to identify the porous regions. 97 Probe Mesh Grid Grid 1 12 Grid 2 10 8 6 4 2 0 −2 −4 −6 −8 −10 −12 −14−12−10 −8 −6 −4 −2 0 2 4 6 8 10 12 14 X Figure 4.8: A two-dimensional representation of the probe mesh-grid used in this study. Probes of radius 1.5 Å are depicted as grids 1 and 2 shown in colors blue and red, respectively. The mesh grid is designed this way to ensure that no region inside the simulation box is left untested while searching for porous spaces. Y 98 12.0 11.5 11.0 10.5 10.0 9.5 9.0 8.5 8.0 8.0 7.5 7.0 8.0 7.5 6.57.0 6.5 6.0x ( 5.5Å 6.0) 5.5 5.0 5.0 (a) 12.0 11.5 11.0 10.5 10.0 9.5 9.0 8.5 8.0 8.0 7.5 7.0 8.0 7.5 6.57.0 x 6.5 6.0 (Å 6.0 5.5) 5.5 5.0 5.0 (b) y ( yÅ ) (Å) z (Å) z (Å) 99 12.0 11.5 11.0 10.5 10.0 9.5 9.0 8.5 8.0 8.0 7.5 7.0 8.0 7.5 6.57.0 6.5 6.0x ( 5.5Å 6.0) 5.5 5.0 5.0 (c) 12.0 11.5 11.0 10.5 10.0 9.5 9.0 8.5 8.0 8.0 7.5 7.0 8.0 7.5 6.57.0 x 6.5 6.0 (Å 6.0 5.5) 5.5 5.0 5.0 (d) y ( yÅ ) (Å) z (Å) z (Å) Figure 4.9: This Figure shows a small section of a bigger (actual) simulation box. (a) A single porous sphere of radius 1.5 Å. This porous space is identified by the probe as a volume not occupied by any atom. (b) Next, adjacent and overlapping porous spaces are identified, as shown in the red, yellow, blue, and green spheres. Since the probe size chosen for this method is 1.5 Å, these iden- tified spheres have equal size and have a radius of 1.5 Å. (c) Then the extent of the space covered in aggregate by the overlapping and adjacent porous spaces (found in (b)) is identified. Yellow dots define the outer surface of this enclosed porous volume. (d) Once this region has been identified, as in (c), a sphere is fitted inside this space in order to calculate the radius of this spherical pore. Ra- dius from all such discontiguous regions are calculated, analyzed, and plotted as a histogram in Fig. 4.10. 100 101 60 50 40 30 20 5 10 15 y 20(Å) 25 30 0 10 35 20 30 40 40 x (Å) 50 (a) 8 7 6 5 4 3 2 1 0 6 8 10 12 14 16 18 Diameter (Å) (b) Figure 4.10: (a) Representation of pores located inside a COF-506-Cu weave pre- dicted by computer simulation. (b) Pore size distribution for the same system as illustrated in (a). Pore diameters were estimated assuming spherical pores. The histogram shows that most pores are less than 10 Angstroms in diameter. Count z (Å) separated into pores, any pores with a volume less than 500 Å3 are discarded, as experimental techniques are not currently capable of identifying pores below this size. The remaining pores are used to find the pore size distribution. A step-by-step depiction of this algorithm is presented in Fig. 4.9. In addition, the Python code developed here to encode this algorithm is provided in Appendix B. Our algorithm provides the advantages of simplicity while providing high accuracy over Monte Carlo-like [145] and grid-based [146] methods and can be readily implemented for calculating pores in porous materials. 4.2.6 Structure of COF-506-Cu These calculations form an integral part of understanding solute diffusion in- side COF materials. Our calculations (Fig. 4.10) suggest that most (about 80%) of the pores inside this metallated COF weave are between 6 and 10 Å in diam- eter. A small fraction of the pores can be up to 20 Å in diameter. Pores of such sizes allow small molecules such as He and CO2 to traverse the weave with rel- ative ease; however, larger molecules, such as THF, will find it more difficult to navigate the dense woven material. This significantly hindering the motion of larger solutes inside these materials, making for a more tortuous path through the woven material. These results help explain our self diffusivity results in Figs. 4.5 and 4.6. 102 4.3 Structural Characterization of the HKUST-1 MOF Function- alized with Pd Nanoparticles as a Composite A long-standing challenge for the use of hydrogen as a transportation fuel has been to design storage systems that efficiently and safely store the gas and that allow its extraction at practical temperatures and pressure. [147] From all the existing adsorbents posited for this task, MOFs, constructed by the coordina- tion of metal atoms with organic linkers, are arguably the most promising class of hydrogen-storage materials due to their large surface areas and pore vol- umes. [148] Our collaborators, Prof. David Jimenez and Bethany Connolly at Cambridge, UK, have used a high-performing MOF called HKUST-1 which, in combination with palladium nanoparticles, has shown exceptional uptake ca- pacity for clean fuels like hydrogen and methane gases (unpublished results). Experiments show that the uptake of these gases increases as the percentage of palladium nanoparticles is increased. This uptake trend is reversed when the processing pressure is higher than 40 atm., a typical pressure-swing character- istic. Previous computational studies [149, 150] have shown promising results for high-throughput screening of MOF candidates that have shown their high po- tential for the capture of, and catalysis to decompose, toxic molecules. How- ever, while this previous work has laid the foundation for molecular simulation approaches to exploit, it has not tackled the problem of rational design of the interface that is created between the MOF framework and the nanoparticles. This is an important omission as the details of this interface are likely to be plan a role, potentially a defining role, in the creation of MOF-nanoparticle designs 103 that attempt to optimize the storage of clean fuels, like hydrogen and natural gas. Similarly, studies investigating the molecular mechanism behind the great extent of gas uptake are limited by the current state-of-the-art lack of knowledge of the nature of this interface. With this aim in mind, we conducted a detailed atomic-scale MD study to understand the molecular origin behind such experimentally observed phe- nomena. We conduct pore size distribution calculations to characterize the over- all morphology of a composite of HKUST-1 and Pd nanoparticles. Developing this understanding is critical to explaining the dynamics of solute molecules, especially hydrogen or carbon oxide or methane, inside these porous materi- als, and also opens the door for a more conscious design of MOFs with tunable structural properties, such as topology, pore size, shape, and surface chemistry. 4.3.1 System Details All simulations were performed in LAMMPS [78] (Sandia’s open-source MD code) using an OPLS [71] force field. OPLS reproduces the bulk properties of or- ganic systems, typically with an excellent match to relevant experimental prop- erty values. However, it did not contain all the necessary parameters required to model all HKUST-1 and Pd nanocomposite interactions. All the remaining parameters were calculated by my colleague, Ryan Heden, who is a graduate student in Paulette Clancy’s research lab. The simulation set-up (shown in Fig. 4.11 (a) and (b)) is created by inserting a large-radius Pd crystal into the HKUST- 1 lattice and deleting any MOF monomers whose atoms are located within 3Å of a Pd atom. That is, any atoms belonging to the MOF crystal that are uncom- 104 fortably close to the Pd nanoparticle are removed. The combined system is then run for 500 picoseconds using a microcanonical NVE ensemble with a small 0.1 femtosecond as the time step. The time step is kept deliberately small for this relaxation run to allow any atoms that are too close to the Pd surface to move away. In an experimental setting, a Pd cube of size 10 nm in diameter is used. However, given the constraints of our computational resources for such a rel- atively huge Pd nanoparticle, we employed a small but representative model that employed a Pd nanoparticle of diameter 5 nm to conduct our theoretical calculations. 4.3.2 Pore Size Distribution of the Pd-MOF Composite The pore size distribution (PSD) is one of the main properties used to charac- terize microporous materials like MOFs. This property essentially determines their permeability to solute molecules, the adsorption capacity, and the capacity of these materials to be used for industrial applications that involve gas sepa- ration, [108] gas storage, [151] catalysis, [152] or drug delivery. [153] Since all the main applications for MOFs involve the adsorption of guest molecules, tai- loring the size of the pores is critical to enable more effective use of the avail- able pore volume. Establishing a relationship between the structure and the storage/catalytic capacity is key to guiding rational design and synthesizing of high-performing MOF materials. We used an open source freely available software, Zeo++ (www.zeoplusplus.org), [154] to conduct PSD calculations. The PSD algorithm developed in Section 4.2.5 works best for the case where the pore size distribution is similar in concept to 105 106 (a) (b) Figure 4.11: (a) Side and (b) top views of a simulation snapshot of the HKUST- 1 and Pd nanoparticle system. Palladium shown in blue is placed inside a HKUST-1 crystal to mimic the experimental setup. Color key: Carbon shown in gray, copper in brown, oxygen in red, hydrogen in white, and palladium in blue. the cavities in Swiss cheese, where pores are strictly defined and discontiguous. For highly porous systems like ours, where the pores are interconnected as seen in Fig. 4.11, a more robust approach is needed to strictly isolate such regions and calculate their distribution. Fig. 4.12 shows the application of the PSD algorithm developed in Section 4.2.5 for the HKUST-1/Pd nanocomposite. As can be seen, the algorithm per- forms well in identifying porous spaces. Our algorithm offers advantages over other more complicated methods [145, 146] in terms of simplicity while provid- ing high accuracy. However, the limitation of the approach we developed is that it includes the ability to correctly coalesce the identified contiguous pores when the nature of the material is exceedingly penetrable, as is the case in our nanocomposite. Therefore, in this study, we felt that a more robust approach than the one developed in this thesis, namely one that used a Monte Carlo sam- pling method (like Zeo++) should be applied in this case. The Zeo++ software is based on Voronoi decomposition, which provides a graphical representation of void spaces in a periodic simulation box of atoms. This resulting network is analyzed to generate diameters of the largest-included sphere and the largest-free- sphere. The nodes in this network are analyzed for accessibility, and the resulting information is used to conduct Monte Carlo sampling of accessible surfaces (Fig. 4.13), volumes, pore size distribution. We calculated, characterized, and identified accessible (void) regions and calculated their corresponding sizes. To assess the convergence of the method, we ran the Zeo++ algorithm on our system (shown in Fig. 4.11) for a range of Monte Carlo sampling approaches. From the results in Fig. 4.14, it is immediately evident that the computation 107 108 Figure 4.12: Colors representing the pores identified by the PSD algorithm cre- ated in this thesis and described in Section 4.2.5. Due to the exceedingly pene- trable nature of the HKUST-1 and Pd nanocomposite, the algorithm incorrectly combines the contiguous porous spaces. Therefore, in this study, we have em- ployed a more robust approach based on Monte Carlo sampling to conduct pore size calculations. 109 Figure 4.13: Geometric representation of the polyhedral surface mesh (shown in white) around HKUST-1 and a 5 nm-dia. Pd nanoparticle. The Zeo++ software uses Voronoi decomposition to identify surface mesh and accessible volume in the system of atoms and uses this information to calculate the pore size distri- bution. 110 0.40 Samples=500Samples=5000 0.35 Samples=50000 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 5 10 15 20 25 30 Pore width (Å) Figure 4.14: Comparison of pore size distribution in the HKUST-1/Pd nanocom- posite for different values (shown as Samples in the inset) of Monte Carlo sam- ples per unit cell. The distribution starts to converge as the number exceeds 5000. 0.5 0.4 0.3 0.2 0.1 0.0 0 5 10 15 20 25 30 Pore width (Å) Figure 4.15: Distribution of pore size analysis in the HKUST-1/Pd nanocom- posite conducted over Monte Carlo samples of 5000 and 50000 (but not 500) per unit cell. Distribution Distribution starts to converge as the number of Monte Carlo samples per unit cell are varied from 5000 to 50000. The final results shown in Fig. 4.15 depict that the majority (∼ 80%) of channels inside the material typically consist of pores width, pore diameters, ranging from ∼ 11 to 13 Å. Smaller 6 Å- diameter cavities are also present in the structure, typically between the larger cavities. Our results are in agreement with the experimental pore size distribution in pure HKUST-1, where cavities are known to range from ∼ 10 to 14 Å, along with some smaller pores of size 6 Å. [155] This observation indicates that encapsulation of Pd nanoparticles inside HKUST-1 crystals has little effect on its structural form. HKUST-1, in its unadulterated form, serves as an excellent choice of candidate for gas storage applications. With our new observation that the presence of the Pd nanoparticle does not affect its structure, these nanoparticles act to provide additional active sites for gas solute particles to bind to and hence react, depending on the nature of the solute. These additional sites help explain the outstanding capacity of the nanocomposite for significant capability for hydrogen gas uptake. 4.4 Conclusions COF-506, with its unique property of being an elastic and stretchable porous material, is emerging as a promising member of a new class of woven porous (COF/MOF) materials. Its inherent flexibility, coupled with its ability to accom- modate mechanical stress, make COF-506 and any of its future derivatives po- tentially interesting materials to design for a pre-designed functionality. Using the model developed for COF-505, we predicted COF-506 to be a more prac- tical nanomaterial, both in terms of its flexibility and porosity, than COF-505. The pore size distribution of metallated COF-506 was dominated by small pores 111 (80% lie in the 6-10 Å range of diameter), although a few (5%) can be up to 20 Å in diameter, as well as uncovering the locations of these pores. To conduct diffusional studies in COF-506, we used the exact same set of parameters as for COF-505 and demonstrated that, in contrast to COF-505, COF-506 showed re- moval of the doubly interpenetrated layer in the extremely dense COF-506-Cu weave significantly accelerates the diffusion of small molecules within its ma- trix. In addition, we analyzed the morphology of a high performing MOF called HKUST-1 and Pd nanoparticle composite which has shown to provide excep- tional uptake of hydrogen gas capacity. We used Zeo++ software to fully char- acterize the internal volume of the composite and assess its pore size distribu- tion. The algorithm takes into account both the accessible surface area and the presence of solute molecules of the internal pore cavity to meaningfully gener- ate a distribution. Majority (∼ 80%) of the pores exhibit a size range between 11 to 13 Å in addition to some smaller cavities of size ∼ 6 Å. This distribution provides an ideal platform for small solute molecules like hydrogen to diffuse through and deposit within its porous network. 112 CHAPTER 5 CONCLUSIONS AND FUTURE DIRECTIONS 5.1 Accomplishments Organic materials are increasingly being deployed for applications such as gas storage by adsorption and as active layers in a wide range of new electronic technologies. They offer significant advantages, most notably being solution- processable, lightweight, and flexible. With judicious functionalization, these materials can be tuned to be suitable for incorporation into flexible displays, biological or chemical sensors, wearable electronics, and for the capture of dan- gerous materials, including chemical warfare agents. Single crystals of organic semiconductors are particularly interesting to study because they offer simpli- fications related to their lack of grain boundaries and their long-range periodic order with the presence of minimal defects. These crystals, therefore, provide an ideal platform for examining structure-property relationships, as well as the ability to uncover the fundamental physics governing the behavior of organic semiconductors. Organic frameworks, on the other hand, offer access to materi- als exhibiting high specific surface area and pore volume. Framework materials offer proven ability for high uptake capacity of solute gases like hydrogen and methane, far exceeding the uptake capacity of some more traditional adsorbents like zeolites and activated carbons. In order to explain the above extraordinary properties of organic materials, there is an increasing need to study these materials at a molecular and even atomic scale. While there has been continuous development in experimental techniques to characterize micro- and nano- scale morphologies, they are in- 113 herently slow. Computational techniques can offer powerful and accelerated research capabilities to provide atomic-level insight. In this dissertation, we have shown the use of all-atom Molecular Dynamics, Bayesian optimization- based Machine Learning, and ab initio Density Functional Theory techniques to accelerate the investigation of organic systems at an atomic/molecular level. With an educated selection, combination, and validation of such models, these computational methods can not only provide complementary studies to explain experimental observations but also drive the design of new organic molecules with tailored properties, thereby cutting down the time for experimental trial and error in the laboratory. Using such computational approaches, we briefly summarize the main ac- complishments of this dissertation: First, we tackled the prediction of polymorphs in small-molecules or- ganic semiconductors. To do so, we developed an unsupervised Bayesian optimization-based Machine Learning algorithm to interface with training data from Molecular Dynamics and efficiently predict the existence and characteris- tics of both thermodynamically stable and metastable crystalline structures. As a test case, for a promising commercial target, a diTMS-BTBT organic semicon- ductor, our algorithm was able to select viable candidates from a pool of hun- dreds of potential structures by requiring to run simulations for only a quarter of the possible candidates, resulting in a great reduction in computational cost. Running a sequential search on complex systems like ours would be close to in- tractable either computationally or experimentally. In addition, our algorithm has the benefits of running in a completely self-directed manner, without the need for user intervention. The accuracy of our method to accurately predict 114 polymorphs is ∼ 1-2 kcal mol-1, which is within the uncertainty of the Molecular Dynamics data and sufficiently accurate to recommend promising candidates for experimental investigation. We also found narrow ranges of lattice constants that were preferred by stable polymorphs, rather than metastable or unstable combinations of unit cell parameters. Consequently, our machine learning al- gorithm was able to accurately predict unit cell parameters of both the experi- mentally found polymorphs of diTMS-BTBT with excellent agreement between the Bayesian-predicted unit cell parameters and experimental values. We also suggest that a third polymorph might exist, albeit existing at a significantly higher energy, and hence would need to be stabilized at higher temperatures with slower heating-cooling cycles. It is worth noting that our Bayesian model is highly scalable and can accommodate the inclusion of multiple information sources or multiple objective functions. The model could also be extended to study polymorphism in therapeutic applications, such as the pharmaceutical in- dustry where polymorphs constitute a major issue for reproducibility and con- trol of properties and therapeutic effectiveness. Finally, our algorithm showed the existence of a “forest” of other kinetically stabilized polymorphs, separated by energies well below kT. While these may not be identifiable experimentally due to the tiny energetic differences between them, the richness of this situation should cause us to rethink our views on polymorphic existence and character- istics. In a complementary study, we uncovered the molecular-scale mechanism underlying polymorphic transitions in related organic semiconductors. Such transitions have been posited as potentially ultra-low energy “switches” in electronic devices, which could be transformational. For this study, we con- ducted plane wave DFT calculations and investigated two distinct transition 115 pathways: nucleation and growth and cooperative transitions. First, through a single atomic substitution in the bulky side-chains of BTBT-based organic semiconductors (from Si to C), we compared the characteristic polymorphic transitions in two high-performing organic electronic systems: ditBu-BTBT and diTMS-BTBT. These molecules were recently exposed by experimental studies by Ying Diao’s group at UIUC to follow either a so-called “cooperative” tran- sition (e.g., ditBu-BTBT) or a “nucleation and growth” polymorphic transition pathway (e,g., diTMS-BTBT). Substitution of the carbon atom in bulky side- chains of ditBu-BTBT with a larger-diameter silicon atom resulted in a signif- icant, 22%, increase in the bond lengths of diTMS-BTBT which subsequently altered the molecular packing environment. This change also resulted in the formation of two pairs of so-called “short contacts” associated with the side- chains in diTMS-BTBT, in contrast to their absence in ditBu-BTBT. In this study, we tested our hypothesis of side-chain rotation driving cooperative transitions in crystals. We used DFT calculations to develop a molecular-level understand- ing of this phenomenon and simulated energy barriers for side-chain rotation in the two systems to quantify the drastic difference in the side-chain pack- ing environments. Before the transition, the rotational energy was ca.18 kcal mol-1 for the diTMS α form of the side-chain, compared to a much lower ca.3 kcal mol-1 for the ditBu LT (low-temperature) form of the side-chain. After the transition, the rotational energy was ca.9 kcal mol-1 for the diTMS β side-chain and ca.2 kcal mol-1 for the ditBu HT (high-temperature) side-chain. These re- sults confirmed that the significant change in the packing environment caused by this single-atom change was sufficient to significantly restrict side-chain ro- tation in diTMS crystals. Using this molecular design rule, we can selectively leverage the advantages of both types of transition pathways in organic elec- 116 tronic devices. The nucleation and growth mechanism allows kinetic access to metastable polymorphs at ambient conditions, while a cooperative transition in- duces what is known as a “function memory” effect, leading to significant and reversible changes to the charge carrier mobility of thin film transistors. [3] This opens the door to selectively using a cooperative transition in organic electron- ics industry to design next-generation smart functional materials that display a “shape memory” effect. In a second major thrust of this dissertation, we moved to the study of larger- scale porous materials, we studied different exemplars of Organic Framework materials. We characterized and evaluated the properties of novel Covalent Or- ganic Framework materials and their generally larger-pored “cousins” Metal Organic Frameworks (COFs and MOFs, respectively) using Molecular Dynam- ics. Our investigation explored the propensity of both (1) the newly synthe- sized COF-506 as well as (2) a monolithic HKUST-1 (MOF) functionalized with Palladium nanoparticles to be suitable for small-molecule gas diffusion within their densely interwoven matrix of structural entities. For the rare woven COF- 506 material, our results showed that its inherent flexibility (due to its woven nature) allowed significant diffusion of gas molecules. Despite their investiga- tion over a wide range of pressures and temperatures (below 400 K), we found that these common variables have an unremarkable impact on solute gas dif- fusivity. Self-diffusivity of THF at 300 K and 1 atm was slower by factors of 600 and 60 than diffusion of He and CO2, respectively. However, this obser- vation is attributed to strong electrostatic interactions between its lone pair of electrons and copper ions rather than on the size difference between the two types of molecules. COF-506 could potentially be used as an effective trap for selective small solute molecules. Using a method developed here, we found 117 that the pore size distribution of metallated COF-506 was dominated by small pores (80% lie in the 6-10 Å range of diameter), although a few pores (5%) can be up to 20 Å in diameter. We were also able to uncover the locations of these pores within the woven matrix. These findings could potentially be exploited in the design of new molecular-scale woven materials as a means to help improve either the diffusion or selectivity of separation of solute gases. As an interest- ing complement for a material possessing a larger scale of pore sizes, we also characterized the structure of a high-performing functionalized HKUST-1 MOF embedded with 10 nm-diameter Palladium nanoparticles. We conducted pore size distribution calculations and found that functionalizing the MOF with Pal- ladium nanoparticles had a surprisingly small impact on HKUST-1’s pore size distribution. The majority (∼ 80%) of the pores displayed a size range between 11 to 13 Å in addition to some smaller cavities of size ∼ 6 Å. Our prediction of the pore size distribution agrees strikingly well with experimental suggestions that pure HKUST-1 exhibits cavities alternating in size between 14 Å and 11 Å in diameter, along with smaller pores of 5 Å. [155] This distribution provides an ideal platform for small solute molecules like hydrogen to diffuse through and deposit within its porous network, making it an effective choice of material for high density clean fuel storage. 5.2 Future Work Below, we suggest some natural extensions of the studies conducted in this the- sis. 118 5.2.1 Bayesian Optimization for Polymorph Prediction An Edisonian approach to discovery remains a common approach in scientific research. However, it is clearly an inefficient and resource-intensive manner in which to conduct a search within a representative sample space. One such ex- ample is the case of finding polymorphs in organic semiconductors. In this thesis and as noted above, we have developed a Bayesian optimization ap- proach that narrows down this search space from the intractable to the acces- sible. In our case, we accurately predicted the existence and properties of poly- morphs in a high-performing organic semiconductor, diTMS-BTBT. We worked towards a goal of predicting the lattice constant values associated with thermo- dynamically stable and metastable polymorphs. We varied the input vector in six dimensional real space as: x = [a, b, c, α, β, γ] and measured the similarity between these points using the Squared Exponential kernel. This choice of ker- nel works very well for handling smoothly varying functions like ours. One way to extend our approach is to predict the atomic arrangement inside the unit cell of stable and metastable polymorphs. The way to implement this idea is to formulate a “molecular fingerprint” representation for each atom in order to depict its neighborhood environment. A suitable kernel would then compare the environments of each atom with other atoms inside the same inside unit (local variation) and with atoms in different training set candidate structures (to model global variations). Naturally, this would constitute a computationally intensive process since, in our test case for example, the dimensionality of the model would far exceed six and is proportional to the number of atoms present inside the simulation cell. A potential choice of the kernel for comparing such atomic neighborhoods is called Smooth Overlap of Atomic Position (SOAP) kernel. [156] This kernel measures the structural similarity between candidate 119 structures by encoding regions of atomic geometries using a local expansion of a Gaussian smeared atomic density. It is formulated as the average squared over- lap of the smooth atom densities constructed as the superposition of Gaussian functions. The SOAP kernel is positive-definite and systematically combines a detailed description of atomic structures. Our model can be extended to gen- erate local neighborhood environments for each atom which can then be com- pared using the SOAP kernel to predict the arrangement of atoms inside crystal cell of stable and metastable polymorphs, in advance of experimental evidence. Additionally, our model can be extended to studying biological systems, es- pecially in the pharmaceutical industry, where more than 50% of active com- pounds are estimated to have more than one polymorphic form. [157, 158] For example, our model can be incorporated to provide complementary stud- ies to Bernhard Trout’s group at MIT work for the study of biologically rele- vant polymorphs, including l-glutamic acid system [159] and enantiotropic p- aminobenzoic acid [160], for accelerated polymorph discovery. Our input model can also be expanded to include experimental processing conditions like temperature, shear, and solvent effect to predict stable poly- morphs. This can be achieved by developing a linear combination of variable components. [161] Here we assume that the surrogate model that represents the energy function of the polymorphs would be a linear combination of lattice con- stants, temperature, shear, solvent effect which contribute to the total energy in an additive manner. ∑d E(x) = αixi + β(xσ) + f (x , xρ) (5.1) i=1 120 Here the input x is now a ten-dimensional vector, x = [x1, x2, x3, x4, x5, x6, x7,x8, x9,x10]. x1:d where d = 7 are real values that together indicate the combina- tion of six lattice parameters and temperature. x8 (xσ in Eq. 5.1) is a real-valued representation of pressure. x9 and x10 are real-valued descriptors of solvent with x9 representing the dielectric () and x10 the density (ρ). αi (i= 1 to 7) is the length scale associated with the contribution to the total energy made by the combination of lattice constants and temperature. β(xσ = x8) captures the deviation of E(x) due to the shear. f (x , xρ) captures the contribution of the solvent to the total energy (x = x9 and xρ= x10). Here we assume that αi and β are normally distributed as αi ∼ N(µα, σα) and β ∼ N(µβ, σβ), and the solvent contribution, f , is drawn from an independent Gaussian Process with prior mean function µ0 and covariance Σ0. 5.2.2 Covalent and Metal Organic Framework Materials A major obstacle in studying interfacial interactions in complex systems, such as the HKUST-1 MOF functionalized with relatively large palladium nanoparticles is the calculation of effective dimensionality and identification of suitable order parameters describing the low-dimensional intrinsic features to which the sys- tem dynamics are effectively restrained. Nonlinear dimensionality reduction techniques like “diffusion maps” [162] can help to understand the interfacial interaction energy between parts of the MOF and the Pd nanoparticles and un- cover the preferred orientation by which they align together. The idea here is to extract dynamically relevant, low-dimensional descriptors of the MOF and Pd nanoparticle monolith. This technique would allow us to identify the prin- 121 cipal order parameters in the system describing its evolution from one state to another. The newly found order parameters can then be employed to calculate the free energy of interaction, allowing us to locate the local minimum energy pockets on the free energy surface and understand the orientation of the com- bined MOF/Pd system. In this way, we can also probe the energy surface to develop an understanding of which constituent elements of MOFs like to be in proximity to a Pd nanoparticle. Additionally, the studies to investigate the uptake of solute gases can be en- hanced using techniques like Nudged Elastic Band (NEB) [163] to find mini- mum energy pathways for reactions between solute gases and the MOF and palladium composite. NEB is a powerful method for generating energy barriers and reaction pathways between known reactants and products. In this method, multiple geometry optimizations of the intermediate images between the re- actants and products are performed in parallel such that each optimization is coupled to the adjacent ones by harmonic constraints, which keeps the struc- tures geometrically linked. NEB can be used in our system to construct a reac- tion profile of hydrogen and methane uptake by HKUST-1 and the palladium nanoparticle composite. Then systematically uncover the reaction coordinates along with predicting the transition states for hydrogen and methane uptake. 122 APPENDIX A 3D T-SNE AND K-MEANS CLUSTERING OF POLYMORPHS Alpha phase Beta phase 200 100 0 100 200 200 100 200 0 2 100 E t 0 100 N-SNE1100 S 200 200 t- 300 Figure A.1: t-SNE low-dimensional embedding of our six-dimensional data. Data points are color coded based on the clusters discovered by k-means clus- tering algorithm. Orange denotes the cluster (polymorph) 1 that contains the experimentally found stable beta phase, blue indicates cluster (polymorph) 2 that contains the experimentally found metastable alpha phase, and green indi- cates the derivative polymorph of cluster 1. 123 t-SNE3 APPENDIX B PYTHON CODE FOR PORE SIZE DISTRIBUTION ALGORITHM DISCUSSED IN SECTION 4.2.5 1 ### This main code calls other dependent scripts ### 2 3 import numpy as np 4 import os 5 import sys 6 import itertools 7 import math 8 import porefind as pf 9 import adjacentPorefind as apf 10 import biggerPorefind as bpf 11 import overallPorefind as opf 12 import surfacePorefind as spf 13 import fitSpherePore as fsp 14 import makeHistoPore as mhp 15 import makeSphereHisto as msh 16 17 probeRadSize=1.5 18 atomName=’CO2’ 19 atomType=[10,11] 20 xyzFilePath=’look.xyz’ 21 22 data_directory=’’ #path to data directory 23 figure_directory=’’#path to where plots are stored 24 25 #Find pores 26 poresPositionFile=pf.main(probeRadSize,xyzFilePath,atomType,atomName) 27 if ’%s_poresPosition_%s_probeSize.data’%(atomName,probeRadSize) not in os.listdir(data_directory) \ 124 28 or os.stat(data_directory+’/%s_poresPosition_%s_probeSize.data’%( atomName,probeRadSize)).st_size==0: 29 30 poresPositionFile=pf.main(probeRadSize,xyzFilePath,atomType, atomName) 31 os.system(’mv %s_poresPosition_%s_probeSize* ’%(atomName, probeRadSize)+data_directory) 32 print(’Pores are found\n’) 33 else: 34 poresPositionFile=’%s_poresPosition_%s_probeSize.data’%(atomName, probeRadSize) 35 print(’Pores were already calculated\n’) 36 37 38 39 #Club pores to find adjacent pores 40 if ’%s_poresAdjacent_%s_probeSize.data’%(atomName,probeRadSize) not in os.listdir(data_directory) or \ 41 os.stat(data_directory+’/%s_poresAdjacent_%s_probeSize.data’%( atomName,probeRadSize)).st_size==0: 42 43 poresAdjacentFile=apf.main(data_directory+’/’+poresPositionFile, probeRadSize,atomName) 44 os.system(’mv %s ’%poresAdjacentFile+data_directory) 45 print(’Adjacent Pores are found\n’) 46 else: 47 poresAdjacentFile=’%s_poresAdjacent_%s_probeSize.data’%(atomName, probeRadSize) 48 print(’Adjacent Pores were already calculated\n’) 49 50 51 125 52 #Club adjacent pores to make bigger pores 53 if ’%s_poresBigger_%s_probeSize.data’%(atomName,probeRadSize) not in os.listdir(data_directory) or \ 54 os.stat(data_directory+’/%s_poresBigger_%s_probeSize.data’%(atomName, probeRadSize)).st_size==0: 55 56 poresBiggerFile=bpf.main(data_directory+’/’+poresAdjacentFile, probeRadSize,atomName) 57 os.system(’mv %s ’%poresBiggerFile+data_directory) 58 print(’Bigger Pores are found\n’) 59 else: 60 poresBiggerFile=’%s_poresBigger_%s_probeSize.data’%(atomName, probeRadSize) 61 print(’Bigger Pores were already calculated\n’) 62 63 64 65 #Club bigger pores to find overall pores 66 filesOverAll=[v for v in os.listdir(data_directory) if ’% s_poresOverall_%s_probeSize’%(atomName,probeRadSize) in v] 67 if len(filesOverAll)==0 or 0 in [os.stat(data_directory+’/’+f). st_size for f in filesOverAll]: 68 69 poresOverallFiles=opf.main(data_directory+’/’+poresBiggerFile, probeRadSize,atomName,data_directory) 70 os.system(’mv %s_poresOverall_%s_probeSize* ’%(atomName, probeRadSize)+data_directory) 71 print(’Overall Pores are found\n’) 72 else: 73 poresOverallFiles=filesOverAll 74 print(’Overall Pores were already calculated\n’) 75 126 76 77 78 #Find outer surface coordinates of overall pores 79 filesOuterSurface=[v for v in os.listdir(data_directory) if ’ _OuterSurface’ in v] 80 if len(filesOuterSurface)==0 or 0 in [os.stat(data_directory+’/’+f). st_size for f in filesOuterSurface]: 81 82 poresSurfaceFiles=spf.main(poresOverallFiles,poresPositionFile, probeRadSize,atomName,data_directory) 83 os.system(’mv *_OuterSurface* ’+data_directory) 84 print(’Surface of Pores are found\n’) 85 else: 86 poresSurfaceFiles=filesOuterSurface 87 print(’Surface of Pores were already calculated\n’) 88 89 90 91 #Fit spheres inside the outer surface 92 filesRadius=[v for v in os.listdir(data_directory) if ’ _RadiusInformation’ in v] 93 if len(filesRadius)==0 or 0 in [os.stat(data_directory+’/’+f).st_size for f in filesRadius]: 94 95 poresRadiusFiles=fsp.main(poresSurfaceFiles,probeRadSize, data_directory,atomName) 96 os.system(’mv *_RadiusInformation* ’+data_directory) 97 print(’Radius of spheres are found\n’) 98 else: 99 poresRadiusFiles=filesRadius 100 print(’Radius are already calculated\n’) 101 127 102 103 104 #Plot Histogram of radii 105 filesHistoPlot=[v for v in os.listdir(figure_directory) if ’histo’ in v] 106 if len(filesHistoPlot)==0 or 0 in [os.stat(data_directory+’/’+f). st_size for f in filesHistoPlot]: 107 poresRadiusFiles=[’CO2_RadiusInformation_1.5_probeSize_4.data’] 108 poresRadiusFiles=[’CO2_RadiusInformation_1.5_probeSize_4_Unique.data’ ] 109 mhp.main(poresRadiusFiles,probeRadSize,data_directory,atomName) 110 os.system(’mv *histo* ’+figure_directory) 111 print(’Histograms are found\n’) 112 else: 113 print(’Histograms were already calculated\n’) 114 115 116 117 #Plot Spheres of Histogram 118 filesSpherePlot=[v for v in os.listdir(figure_directory) if ’_sphere’ in v] 119 if len(filesSpherePlot)==0 or 0 in [os.stat(data_directory+’/’+f). st_size for f in filesSpherePlot]: 120 poresRadiusFiles=[’CO2_RadiusInformation_1.5_probeSize_4.data’] 121 poresRadiusFiles=[’CO2_RadiusInformation_1.5_probeSize_4_Unique.data’ ] 122 msh.main(poresRadiusFiles,probeRadSize,data_directory,atomName) 123 os.system(’mv *_sphere* ’+figure_directory) 124 print(’Spheres are drawn\n’) 125 else: 126 print(’Spheres were already drawn’) 128 1 ### porefind.py: This code finds pores of probeSizeRad for one timestep ##### 2 3 import numpy as np 4 import os 5 import sys 6 import itertools 7 import math 8 9 def distance(probe,coords): 10 for i in range(len(coords)): 11 coord=np.array([coords[i][1],coords[i][2],coords[i][3]]) 12 dist=np.linalg.norm(coord-probe) 13 coords[i][4]=dist 14 coordSortedDist=coords[coords[:,4].argsort()] 15 return(coordSortedDist) 16 17 #probeCoords,atomCoords,atomicRadii,probeRadSize 18 def poreFindCalc(probeCoords,coords,atomicRadii,probeRadSize): 19 poresPosition=[] 20 maxCutoff=max(np.array([atomicRadii[j+1][1] for j in range(9)])+ probeRadSize+1) 21 print(’MaxCutOff’,maxCutoff) 22 timesLoopRan=0 23 for i in range(len(probeCoords)): 24 25 probe=probeCoords[i] 26 coordSortedDist=distance(probe,coords) 27 28 for j in range(len(coordSortedDist)): 29 timesLoopRan+=1 30 atomicRadius=atomicRadii[int(coordSortedDist[j][0])][1] 129 31 cutoff=probeRadSize+atomicRadius+1 32 33 if coordSortedDist[j][4]<=cutoff: 34 break 35 elif coordSortedDist[j][4]>1.5*maxCutoff: 36 poresPosition.append([probeCoords[i][0],probeCoords[i][1], probeCoords[i][2]]) 37 break 38 39 print(’Number of times the loop actually ran’,timesLoopRan) 40 return poresPosition 41 42 def main(probeRadSize,xyzFilePath,atomType,atomName): 43 filePath=xyzFilePath 44 timestep=0 45 atomCoords=[] 46 xyzfile=open(filePath) 47 for row in xyzfile: 48 if len(row.split())==3 and row.split()[2]==’5000000’:#timestep for which the data was colledted 49 timestep=1 50 continue 51 if len(row.split())==4 and int(row.split()[0]) not in atomType and timestep==1: 52 atomCoords.append([row.split()[0],float(row.split()[1]),float( row.split()[2]),float(row.split()[3]),0.0]) 53 if timestep==1 and len(row.split())!=4: 54 timestep+=1 55 break 56 xyzfile.close() 57 atomCoords=np.array(atomCoords) 58 atomCoords=atomCoords.astype(float) 130 59 print(np.shape(atomCoords)) 60 # Define unique elements excluding solutes 61 atomtype_list=[’C’, ’H’, ’N’, ’Cu’, ’C’, ’C’, ’N’, ’B’, ’F’] 62 unique_elements,counts_elements = np.unique(np.array([x[0] for x in atomCoords]),return_counts=True) 63 64 atomicRadii={1:[’C’,0.77],2:[’H’,0.31],3:[’N’,0.70],4:[’Cu’ ,1.173],5:[’C’,0.77],6:[’C’,0.77],7:[’N’,0.70],\ 65 8:[’B’,0.80],9:[’F’,0.64]} 66 67 # Define Pore Mesh 68 xlow,xhigh=(float(min([x[1] for x in atomCoords])),float(max([x[1] for x in atomCoords]))) 69 ylow,yhigh=(float(min([x[2] for x in atomCoords])),float(max([x[2] for x in atomCoords]))) 70 zlow,zhigh=(float(min([x[3] for x in atomCoords])),float(max([x[3] for x in atomCoords]))) 71 72 print(’\nMin and max dimensions’,’X’,xlow,xhigh,’Y’,ylow,yhigh,’Z’, zlow,zhigh) 73 74 #Probe Coords Meghgrid 1 75 probeCoords1=np.array([[i*probeRadSize,j*probeRadSize,k* probeRadSize] for i,j,k in itertools.product(\ 76 range(int(math.ceil(xlow/probeRadSize)),int(math.ceil(xhigh/ probeRadSize)),probeRadSize),range(int(math.ceil(ylow/probeRadSize )),int(math.ceil(yhigh/probeRadSize)),\ 77 probeRadSize),range(int(math.ceil(zlow/probeRadSize)),int(math. ceil(zhigh/probeRadSize)),probeRadSize))]) 78 79 80 #Probe Coords Meshgrid 2 131 81 probeCoords2=np.array([[i*probeRadSize,j*probeRadSize,k* probeRadSize] for i,j,k in itertools.product(\ 82 range(int(math.ceil((xlow+1)/probeRadSize)),int(math.ceil((xhigh +1)/probeRadSize)),probeRadSize),range(int(math.ceil((ylow+1)/ probeRadSize)),int(math.ceil((yhigh+1)/probeRadSize)),\ 83 probeRadSize), range(int(math.ceil((zlow+1)/probeRadSize)), int (math.ceil((zhigh+1)/probeRadSize)),probeRadSize))]) 84 85 print(np.shape(probeCoords1), np.shape(probeCoords2)) 86 print(probeCoords1) 87 print(probeCoords2) 88 probeCoords = np.concatenate((probeCoords1, probeCoords2)) 89 print(’Shape of ProbeCoords’,np.shape(probeCoords)) 90 print(’Max times this loop will run’,len(probeCoords)*len( atomCoords),’\n’) 91 poresPosition=poreFindCalc(probeCoords,atomCoords,atomicRadii, probeRadSize) 92 93 temp=sys.stdout 94 filename=’%s_poresPosition_%s_probeSize.data’%(atomName, probeRadSize) 95 os.system(’rm %s’%(filename)) 96 gfile=open(filename,’a’) 97 print(’Number of individual pores’,len(poresPosition)) 98 for i in range(len(poresPosition)): 99 gfile.write(’%s %s %s %s\n’%(i+1,poresPosition[i][0], poresPosition[i][1],poresPosition[i][2])) 100 gfile.close() 101 return(filename,fileNameFinal) 132 1 #### adjacentPorefind.py: This code clubs the adjacent pores found by porefind.py together #### 2 3 import numpy as np 4 import os 5 import sys 6 import itertools 7 import math 8 9 def read(poresPositionFile,probeRadSize): 10 smallPores=[] 11 porefile=open(poresPositionFile) 12 for p in porefile: 13 if len(p.split())==4: 14 smallPores.append([int(p.split()[0]),float(p.split()[1]),float( p.split()[2]),float(p.split()[3]),probeRadSize]) 15 print(’Shape of individual pores’,np.shape(smallPores)) 16 smallPores=np.array(smallPores) 17 return smallPores 18 19 def distance(testPore,otherPores): 20 distMatrix=[] 21 pore=np.array([testPore[1],testPore[2],testPore[3]]) 22 for i in range(len(otherPores)): 23 coord=np.array([otherPores[i][1],otherPores[i][2],otherPores[i ][3]]) 24 dist=np.linalg.norm(coord-pore) 25 distMatrix.append([otherPores[i][0],otherPores[i][1],otherPores[i ][2],otherPores[i][3],dist]) 26 distMatrix=np.array(distMatrix) 27 distMatrix=distMatrix[distMatrix[:,4].argsort()] 28 133 29 return(distMatrix) 30 31 32 def clubPores(smallPores,probeRadSize): 33 adjacentPores=[] 34 for testPore in smallPores: 35 distSortedMatrix=distance(testPore,smallPores) 36 j=0 37 neighbors=[] 38 while distSortedMatrix[j][4]<=probeRadSize: 39 neigh=int(distSortedMatrix[j][0]) 40 neighbors.append([neigh,smallPores[neigh-1][1],smallPores[neigh -1][2],smallPores[neigh-1][3]]) 41 j+=1 42 adjacentPores.append([int(testPore[0]),neighbors]) 43 print(’Shape of adjcent pore’,np.shape(adjacentPores)) 44 return adjacentPores 45 46 47 def writeToFile(adjacentPores,probeRadSize,atomName): 48 temp=sys.stdout 49 filename=’%s_poresAdjacent_%s_probeSize.data’%(atomName, probeRadSize) 50 os.system(’rm %s’%(filename)) 51 gfile=open(filename,’a’) 52 for i in range(len(adjacentPores)): 53 gfile.write(’%s %s\n’%(adjacentPores[i][0],adjacentPores[i ][1])) 54 gfile.close() 55 return(filename) 56 57 ########################## 134 58 def main(poresPositionFile,probeRadSize,atomName): 59 smallPores=read(poresPositionFile,probeRadSize) 60 adjacentPores=clubPores(smallPores,probeRadSize) 61 filename=writeToFile(adjacentPores,probeRadSize,atomName) 62 return(filename) 135 1 #### biggerPorefind.py: This code combines the adjacent pores found by adjacentPorefind.py into bigger bubbles #### 2 3 import numpy as np 4 import math 5 import os 6 import sys 7 import matplotlib.pyplot as plt 8 from mpl_toolkits.mplot3d import Axes3D 9 10 def read(poresAdjacentFile): 11 fileAdjacentPores=open(poresAdjacentFile) 12 bubble=[] 13 neighboursPore=[] 14 singlePore=[] 15 16 for line in fileAdjacentPores: 17 try: 18 fragmentPore=line[line.find("[[")+len("[["):line.find("],")] 19 singlePore.append([float(fragmentPore.split()[k][:-1]) for k in range(len(fragmentPore.split()))]) 20 21 except ValueError: 22 fragmentPore=line[line.find("[[")+len("[["):line.find("]]")+1] 23 print(fragmentPore.split(),[float(fragmentPore.split()[k][:-1]) for k in range(len(fragmentPore.split()))]) 24 singlePore.append([float(fragmentPore.split()[k][:-1]) for k in range(len(fragmentPore.split()))]) 25 neighboursPore.append([]) 26 bubble.append([singlePore[-1],[]]) 27 continue 28 136 29 line=line.replace(fragmentPore.split()[0][:-1]+’ [[’+ fragmentPore+’],’,"") 30 neighbours=[] 31 while len(line.split())!=2: 32 fragNeigh=line[line.find(’[’)+len(’[’):line.find(’]’)] 33 neighbours.append([float(fragNeigh.split()[k][:-1]) for k in range(len(fragNeigh.split()))]) 34 if len(line.split())==4: 35 break 36 line=line.replace(’[’+fragNeigh+’], ’,"") 37 neighboursPore.append(neighbours) 38 bubble.append([singlePore[-1],neighbours]) 39 return(bubble,singlePore,neighboursPore) 40 41 42 ### Calculation of a bigger pore 2 43 def clubPores(probeRadSize,bubble,singlePore,neighboursPore,atomName) : 44 temp=sys.stdout 45 filename=’%s_poresBigger_%s_probeSize.data’%(atomName,probeRadSize) 46 os.system(’rm %s’%(filename)) 47 gfile=open(filename,’a’) 48 checked=[] 49 clubbedPores2=[] 50 51 for i in range(len(bubble)): 52 biggerPores=[] 53 if singlePore[i][0] not in checked: 54 checked.append(singlePore[i][0]) 55 biggerPores.append(singlePore[i][0]) 56 gfile.write(’%s ’%(singlePore[i][0])) 57 pore1=neighboursPore[i] 137 58 59 for j in range(len(pore1)): 60 checked.append(pore1[j][0]) 61 biggerPores.append(pore1[j][0]) 62 gfile.write(’%s ’%(pore1[j][0])) 63 64 pore2=neighboursPore[int(pore1[j][0])-1] 65 for k in range(len(pore2)): 66 if pore2[k][0] not in biggerPores: 67 biggerPores.append(pore2[k][0]) 68 gfile.write(’%s ’%(pore2[k][0])) 69 gfile.write(’\n’) 70 clubbedPores2.append(biggerPores) 71 gfile.close() 72 print(’Shape of bigger pores’,np.shape(clubbedPores2)) 73 return(filename) 74 75 def main(poresAdjacentfile,probeRadSize,atomName): 76 bubble,singlePore,neighboursPore=read(poresAdjacentfile) 77 filename=clubPores(probeRadSize,bubble,singlePore,neighboursPore, atomName) 78 return(filename) 138 1 #### overallPorefind.py: This code combines the contiguous bubbles found by biggerPorefind.py #### 2 3 import numpy as np 4 import math 5 import os 6 import sys 7 8 def calc(poresBiggerFile,probeRadSize,atomName,data_directory): 9 clubbedPores2=[] 10 fileClubbedPore2=open(poresBiggerFile) 11 for line in fileClubbedPore2: 12 if line.strip()!=’’ and len(line.split())!=1: 13 clubbedPores2.append([float(w) for w in line.split()]) 14 print(’Shape of Bigger Pores’,np.shape(clubbedPores2),’\n’) 15 #clubbedPores contains the list of a bigger bubble made of adjacent and their neighbour pores 16 ### clubbing of final pores 17 overlap=[5,10,25,30,40,50] 18 filesToReturn=[] 19 run=True 20 timesLoopRun=0 21 while run is True and timesLoopRun=overlap[ timesLoopRun-1]: 50 checked.append(value[0]) 51 addPore.append([n for n in value if n not in checkPore]) 52 count+=1 53 54 addPore=[j for sub in addPore for j in sub] 55 addPore=np.unique(addPore) 56 addPore=[v for v in addPore if v not in checkPore] 57 flat_list=[item for sublist in [checkPore,addPore] for item in sublist] 58 for elem in flat_list: 140 59 gfile1.write(’%s ’%elem) 60 gfile1.write(’\n\n’) 61 clubbedPores2.append(flat_list) 62 print(’Shape of clubbedPores2’,np.shape(clubbedPores2),’\n’) 63 gfile1.close() 64 if sum(countPore)==0: 65 run=False 66 print(’run’,run) 67 68 temp=sys.stdout 69 filenameFinal=’%s_poresClubAll_%s_probeSize.data’%(atomName, probeRadSize) 70 os.system(’rm %s’%(filenameFinal)) 71 gfile=open(filenameFinal,’a’) 72 73 for k in range(len(clubbedPores2)): 74 aPore=clubbedPores2[k] 75 for j in range(len(aPore)): 76 gfile.write(’%s ’%aPore[j]) 77 gfile.write(’\n’) 78 gfile.close() 79 os.system(’mv %s_poresClubAll_%s_probeSize.data ’%(atomName, probeRadSize)+data_directory) 80 return(filesToReturn) 81 82 def main(poresBiggerFile,probeRadSize,atomName,data_directory): 83 filenames=calc(poresBiggerFile,probeRadSize,atomName,data_directory ) 84 return(filenames) 141 1 #### surfacePorefind.py: This code finds outer surface coordinates of overall pores found by overallPorefind.py #### 2 import numpy as np 3 import os 4 import sys 5 import itertools 6 import math 7 8 def sphere(filewithPores,filewithPoresCoords,probeRadSize,atomName, data_directory): 9 r=probeRadSize 10 pores=[] 11 fileOverAll=open(data_directory+’/’+filewithPores) 12 for line in fileOverAll: 13 if line.strip()!=’’: 14 pores.append([float(w) for w in line.split()]) 15 print(’Shape of OverAll Pores’,np.shape(pores)) 16 17 coords=[] 18 fileCoords=open(data_directory+’/’+filewithPoresCoords) 19 for line in fileCoords: 20 coords.append([int(line.split()[0]),float(line.split()[1]),float( line.split()[2]),float(line.split()[3])]) 21 print(’Shape of All Pores Coords’,np.shape(coords)) 22 23 temp=sys.stdout 24 filename=filewithPores[:-5]+’_OuterSurface.data’ 25 os.system(’rm %s’%(filename)) 26 gfile=open(filename,’a’) 27 poreNum=-1 28 coordsOuter=[[] for k in range(len(pores))] 29 #This array will store coordinates of outer sphere of overall pores 142 30 31 for pore in pores: 32 poreNum+=1 33 pore=np.unique(pore) 34 cirCen=[] 35 for num in pore: 36 num=int(num) 37 cirCen.append([coords[num-1][0],coords[num-1][1],coords[num -1][2],coords[num-1][3]]) 38 39 coordsSphere=[[] for m in range(len(cirCen))] 40 index=0 41 for c in cirCen: 42 u,v=np.mgrid[0:2*np.pi:10j,0:np.pi:5j] 43 x=np.cos(u)*np.sin(v)*r 44 y=np.sin(u)*np.sin(v)*r 45 z=np.cos(v)*r 46 x=x+c[1] 47 y=y+c[2] 48 z=z+c[3] 49 for a,b in zip(u,v): 50 for a1,b1 in zip(a,b): 51 x1=np.cos(a1)*np.sin(b1)*r+c[1] 52 y1=np.sin(a1)*np.sin(b1)*r+c[2] 53 z1=np.cos(b1)*r+c[3] 54 coordsSphere[index].append([x1,y1,z1]) 55 index+=1 56 57 for j in range(len(coordsSphere)): 58 checkPore=coordsSphere[j] 59 for k in range(len(checkPore)): 60 check=0 143 61 for value in [v for v in pore if v!=pore[j]]: 62 if np.linalg.norm(np.array(coordsSphere[j][k])-cirCen[[x[0] for x in cirCen].index(int(value))][1:])3: 12 radius.append(2*float(line.split()[5])) 13 14 histoName=’V_Unique_’+atomName+’_histo_’+str(probeRadSize)+’ _probeRadSize_’+files[36:37]+’_Time_woDensity.pdf’ 15 plt.figure(figsize=(8,6)) 16 plt.hist(radius,bins=np.arange(min(radius),max(radius)+1),facecolor =’blue’,alpha=0.75,edgecolor=’k’,density=False) 17 plt.xlabel(r’Diameter ($\AA$)’,{’color’:’k’,’fontsize’:20}) 18 plt.ylabel(’Count’,{’color’:’k’,’fontsize’:20}) 19 plt.xticks(color=’k’,fontsize=20) 20 plt.yticks(color=’k’,fontsize=20) 21 plt.gca().yaxis.grid(True) 22 23 histoName=’V_Unique_’+atomName+’_histo_’+str(probeRadSize)+’ _probeRadSize_’+files[36:37]+’_Time_wDensity.pdf’ 24 plt.figure(figsize=(8,6)) 25 plt.hist(radius,bins=np.arange(min(radius),max(radius)+1),facecolor =’blue’,alpha=0.75,edgecolor=’k’,density=True) 26 plt.xlabel(r’Diameter ($\AA$)’,{’color’:’k’,’fontsize’:20}) 27 plt.ylabel(’Probability’,{’color’:’k’,’fontsize’:20}) 147 28 plt.xticks(color=’k’,fontsize=20) 29 plt.yticks(color=’k’,fontsize=20) 30 plt.gca().yaxis.grid(True) 31 plt.show() 32 33 def main(poresRadiusFiles,probeRadSize,data_directory,atomName): 34 radiusOverall=[[] for k in range(len(poresRadiusFiles))] 35 for files in poresRadiusFiles: 36 histo(files,probeRadSize,data_directory,atomName) 148 1 #### makeSphereHisto.py: This code creates spheres using radius from the histogram at their actual location in the simulation box #### 2 3 import os 4 import numpy as np 5 import matplotlib.pyplot as plt 6 from mpl_toolkits.mplot3d import Axes3D 7 8 def sphere(files,probeRadSize,data_directory,atomName): 9 fileSphere=open(data_directory+’/’+files) 10 cir=[] 11 r=[] 12 for line in fileSphere: 13 if len(line.split())==6: 14 cir.append([float(line.split()[2][1:-1]),float(line.split() [3][1:-1]),float(line.split()[4][1:-1])]) 15 r.append(float(line.split()[5])) 16 17 sphereName=’V_Unique_’+atomName+’_sphere_’+str(probeRadSize)+’ _probeRadSize_’+files[36:37]+’_thTime.pdf’ 18 coords=[[] for k in range(len(cir))] 19 20 fig=plt.figure(figsize=(16,10)) 21 ax=fig.add_subplot(111,projection=’3d’) 22 ax.set_xlabel(r’x ($\AA$)’,fontsize=16) 23 ax.set_ylabel(r’y ($\AA$)’,fontsize=16) 24 zlabel=ax.set_zlabel(r’z ($\AA$)’,fontsize=16) 25 26 rmax=max(r) 27 for i in range(len(cir)): 28 constant=max(r[i]/rmax,0.5) 29 u,v=np.mgrid[0:2*np.pi:constant*20j,0:np.pi:constant*10j] 149 30 x=np.cos(u)*np.sin(v)*r[i] 31 y=np.sin(u)*np.sin(v)*r[i] 32 z=np.cos(v)*r[i] 33 x=x+cir[i][0] 34 y=y+cir[i][1] 35 z=z+cir[i][2] 36 ax.plot_wireframe(x,y,z,color=’r’) 37 38 ax.view_init(23,70) 39 plt.show() 40 def main(poresRadiusFiles,probeRadSize,data_directory,atomName): 41 for files in poresRadiusFiles: 42 sphere(files,probeRadSize,data_directory,atomName) 150 BIBLIOGRAPHY [1] Ying Diao, Kristina M. Lenn, Wen Ya Lee, Martin A. Blood-Forsythe, Jie Xu, Yisha Mao, Yeongin Kim, Julia A. Reinspach, Steve Park, Alán Aspuru-Guzik, Gi Xue, Paulette Clancy, Zhenan Bao, and Stefan C B Mannsfeld. Understanding polymorphism in organic semiconductor thin films through nanoconfinement. Journal of the American Chemical Society, 136(49):17046–17057, 2014. [2] Hyunjoong Chung and Ying Diao. Polymorphism as an emerging de- sign strategy for high performance organic electronics. J. Mater. Chem. C, 4(18):3915–3933, 2016. [3] Hyunjoong Chung, Shanwen Chen, Nikita Sengar, Daniel W Davies, Guil- laume Garbay, Yves H Geerts, Paulette Clancy, and Ying Diao. Single Atom Substitution Alters the Polymorphic Transition Mechanism in Or- ganic Electronic Crystals. Chemistry of Materials, 31(21):9115–9126, 11 2019. [4] Donald R Jones, Matthias Schonlau, and William J Welch. Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Opti- mization, 13(4):455–492, 1998. [5] Yaset Miguel Acevedo, Nikita Sengar, Meng Min, and Paulette Clancy. A Transferable Molecular Model of Woven Covalent Organic Framework Materials. ACS Applied Materials & Interfaces, 9 2020. [6] J M Haile. Molecular dynamics simulation : elementary methods, 1992. [7] John E Anthony. Organic electronics: Addressing challenges. Nature Ma- terials, 13(8):773–775, 8 2014. [8] V Subramanian, J M J Frechet, P C Chang, D C Huang, J B Lee, S E Molesa, A R Murphy, D R Redinger, and S K Volkman. Progress Toward Devel- opment of All-Printed RFID Tags: Materials, Processes, and Devices. Pro- ceedings of the IEEE, 93(7):1330–1338, 2005. [9] Embracing the organics world. Nature Materials, 12(7):591, 2013. [10] Gerwin H Gelinck, H Edzer A Huitema, Erik van Veenendaal, Eugenio Cantatore, Laurens Schrijnemakers, Jan B P H van der Putten, Tom C T Geuns, Monique Beenhakkers, Jacobus B Giesbers, Bart-Hendrik Huis- man, Eduard J Meijer, Estrella Mena Benito, Fred J Touwslager, Albert W 151 Marsman, Bas J E van Rens, and Dago M de Leeuw. Flexible active-matrix displays and shift registers based on solution-processed organic transis- tors. Nature Materials, 3(2):106–110, 2004. [11] John A Rogers and Zhenan Bao. Printed plastic electronics and paperlike displays. Journal of Polymer Science Part A: Polymer Chemistry, 40(20):3327– 3334, 2002. [12] George Malliaras and Richard Friend. An Organic Electronics Primer. Physics Today, 58(5):53–58, 2005. [13] Scott Himmelberger and Alberto Salleo. Engineering semiconduct- ing polymers for efficient charge transport. MRS Communications, 5(3):383–395, 2015. [14] M Pope and C Swenberg. Electronic Processes in Organic Crystals and Polymers. 1999. [15] Chengliang Wang, Huanli Dong, Wenping Hu, Yunqi Liu, and Daoben Zhu. Semiconducting π-conjugated systems in field-effect transistors: a material odyssey of organic electronics. Chemical reviews, 112(4):2208– 2267, 4 2012. [16] Zhenan Bao, Yi Feng, Ananth Dodabalapur, V R Raju, and Andrew J Lovinger. High-Performance Plastic Transistors Fabricated by Printing Techniques. Chemistry of Materials, 9(6):1299–1301, 6 1997. [17] Hagen Klauk. Organic Electronics: Materials, Manufacturing and Applica- tions. 2006. [18] Henning Sirringhaus. 25th Anniversary Article: Organic Field-Effect Transistors: The Path Beyond Amorphous Silicon. Advanced Materials, 26(9):1319–1335, 2014. [19] Terence L Threlfall. Analysis of organic polymorphs. A review. Analyst, 120(10):2435–2460, 1995. [20] Walter C McCrone. Polymorphism. Physics and chemistry of the organic solid state, 2:725–767, 1965. [21] Masahiko Ando, Tom B Kehoe, Makoto Yoneya, Hiroyuki Ishii, Masahiro Kawasaki, Claudia M Duffy, Takashi Minakata, Richard T Phillips, and 152 Henning Sirringhaus. Evidence for charge-trapping inducing polymor- phic structural-phase transition in pentacene. Advanced materials (Deerfield Beach, Fla.), 27(1):122—129, 1 2015. [22] Stefano Bergantin, Massimo Moret, Gernot Buth, and Francesca P A Fab- biani. Pressure-Induced Conformational Change in Organic Semiconduc- tors: Triggering a Reversible Phase Transition in Rubrene. The Journal of Physical Chemistry C, 118(25):13476–13483, 6 2014. [23] Sheng Bi, Zhengran He, Jihua Chen, and Dawen Li. Solution-grown small-molecule organic semiconductor with enhanced crystal alignment and areal coverage for organic thin film transistors. AIP Advances, 5(7):77170, 2015. [24] Linus Pithan, Caterina Cocchi, Hannes Zschiesche, Christopher Weber, Anton Zykov, Sebastian Bommel, Steven J Leake, Peter Schäfer, Claudia Draxl, and Stefan Kowarik. Light Controls Polymorphism in Thin Films of Sexithiophene. Crystal Growth & Design, 15(3):1319–1324, 3 2015. [25] Yongbo Yuan, Gaurav Giri, Alexander L Ayzner, Arjan P Zoombelt, Stefan C B Mannsfeld, Jihua Chen, Dennis Nordlund, Michael F Toney, Jinsong Huang, and Zhenan Bao. Ultra-high mobility transparent organic thin film transistors grown by an off-centre spin-coating method. Nature com- munications, 5:3005, 2014. [26] Anna M Hiszpanski, Robin M Baur, Bumjung Kim, Noah J Tremblay, Colin Nuckolls, Arthur R Woll, and Yueh-Lin Loo. Tuning Polymorphism and Orientation in Organic Semiconductor Thin Films via Post-deposition Processing. Journal of the American Chemical Society, 136(44):15749–15756, 11 2014. [27] Oana D Jurchescu, Devin A Mourey, Sankar Subramanian, Sean R Parkin, Brandon M Vogel, John E Anthony, Thomas N Jackson, and David J Gund- lach. Effects of polymorphism on charge transport in organic semiconduc- tors. Phys. Rev. B, 80(8):85201, 8 2009. [28] Joel Bernstein. Crystal growth, polymorphism and structure-property relationships in organic crystals. Journal of Physics D: Applied Physics, 26(8B):B66, 1993. [29] Bernhard Wedl, Roland Resel, Günther Leising, Birgit Kunert, Ingo Salz- mann, Martin Oehzelt, Norbert Koch, Antje Vollmer, Steffen Duhm, 153 Oliver Werzer, Gabin Gbabode, Michele Sferrazza, and Yves Geerts. Crys- tallisation kinetics in thin films of dihexyl-terthiophene: the appearance of polymorphic phases. RSC Adv., 2(10):4404–4414, 2012. [30] Basab Chattopadhyay, Christian Ruzié, Roland Resel, and Yves Henri Geerts. Substrate-induced phases: transition from a liquid- crystalline to a plastic crystalline phase via nucleation initiated by the substrate. Liquid Crystals, 41(3):302–309, 3 2014. [31] Benedikt Schrode, Andrew O F Jones, Roland Resel, Natalia Bedoya, Robert Schennach, Yves Henri Geerts, Christian Ruzié, Michele Sfer- razza, Aldo Brillante, Tommaso Salzillo, and Elisabetta Venuti. Substrate- Induced Phase of a Benzothiophene Derivative Detected by Mid-Infrared and Lattice Phonon Raman Spectroscopy. Chemphyschem : a European jour- nal of chemical physics and physical chemistry, 19(8):993–1000, 4 2018. [32] Hiromi Minemawari, Toshikazu Yamada, Hiroyuki Matsui, Jun Ya Tsut- sumi, Simon Haas, Ryosuke Chiba, Reiji Kumai, and Tatsuo Hasegawa. Inkjet printing of single-crystal films. Nature, 475(7356):364–367, 2011. [33] Hideaki Ebata, Takafumi Izawa, Eigo Miyazaki, Kazuo Takimiya, Masaaki Ikeda, Hirokazu Kuwabara, and Tatsuto Yui. Highly Solu- ble [1]Benzothieno[3,2-b]benzothiophene (BTBT) Derivatives for High- Performance, Solution-Processed Organic Field-Effect Transistors. Journal of the American Chemical Society, 129(51):15732–15733, 12 2007. [34] Yanfeng Yue, Yunchao Li, Zhonghe Bi, Gabriel M Veith, Craig A Bridges, Bingkun Guo, Jihua Chen, David R Mullins, Sumedh P Surwade, Shan- non M Mahurin, Hongjun Liu, M Parans Paranthaman, and Sheng Dai. A POM–organic framework anode for Li-ion battery. J. Mater. Chem. A, 3(45):22989–22995, 2015. [35] Yan Zhang, Shihui Yang, Xiaoya Chang, Huinan Guo, Yunwei Li, Mengy- ing Wang, Weinqin Li, Lifang Jiao, and Yijing Wang. MOF based on a longer linear ligand: electrochemical performance{,} reaction kinetics{,} and use as a novel anode material for sodium-ion batteries. Chem. Com- mun., 54(83):11793–11796, 2018. [36] Hiroyasu Furukawa, Kyle E Cordova, Michael O’Keeffe, and Omar M Yaghi. The Chemistry and Applications of Metal-Organic Frameworks. Science, 341(6149), 2013. [37] Weigang Lu, Zhangwen Wei, Zhi-Yuan Gu, Tian-Fu Liu, Jinhee Park, Jihye 154 Park, Jian Tian, Muwei Zhang, Qiang Zhang, Thomas Gentle III, Math- ieu Bosch, and Hong-Cai Zhou. Tuning the structure and function of metal–organic frameworks via linker design. Chem. Soc. Rev., 43(16):5561– 5593, 2014. [38] Beatriz Seoane, Sonia Castellanos, Alla Dikhtiarenko, Freek Kapteijn, and Jorge Gascon. Multi-scale crystal engineering of metal organic frame- works. Coordination Chemistry Reviews, 307:147–187, 2016. [39] Volodymyr Bon, Negar Kavoosi, Irena Senkovska, Philipp Müller, Jana Schaber, Dirk Wallacher, Daniel M Többens, Uwe Mueller, and Stefan Kaskel. Tuning the flexibility in MOFs by SBU functionalization. Dalton Trans., 45(10):4407–4415, 2016. [40] Arijit Halder and Debajyoti Ghoshal. Structure and properties of dy- namic metal–organic frameworks: a brief accounts of crystalline-to- crystalline and crystalline-to-amorphous transformations. CrystEng- Comm, 20(10):1322–1345, 2018. [41] Tse Nga Ng, David E. Schwartz, Leah L. Lavery, Gregory L. Whiting, Beverly Russo, Brent Krusor, Janos Veres, Per Broms, Lars Herlogsson, Naveed Alam, Olle Hagel, Jakob Nilsson, and Christer Karlsson. Scal- able printed electronics: An organic decoder addressing ferroelectric non- volatile memory. Scientific Reports, 2:1–7, 2012. [42] Tsuyoshi Sekitani, Koichiro Zaitsu, Yoshiaki Noguchi, Kiyoshiro Ishibe, Makoto Takamiya, Takayasu Sakurai, and Takao Someya. Printed non- volatile memory for a sheet-type communication system. IEEE Transac- tions on Electron Devices, 56(5):1027–1035, 2009. [43] Hsin-Rong Tseng and Chun-Liang Lin. Organic photovoltaic cell. 183(1986):1–19, 2011. [44] Guillaume Schweicher, Yoann Olivier, Vincent Lemaur, and Yves Henri. What Currently Limits Charge Carrier Mobility in Crystals of Molecular Semiconductors ? pages 595–620, 2014. [45] Jean Pierre Brog, Claire Lise Chanez, Aurelien Crochet, and Katharina M. Fromm. Polymorphism, what it is and how to identify it: A systematic review. RSC Advances, 3(38):16905–16931, 2013. [46] Gregg T. Beckham, Baron Peters, Cı́ndy Starbuck, Narayan Varı́ankaval, and Bernhardt L. Trout. Surface-mediated nucleation in the solid-state 155 polymorph transformation of terephthalic acid. Journal of the American Chemical Society, 129(15):4714–4723, 2007. [47] Anthony M Campeta, Brian P Chekal, Yuriy A Abramov, Paul A Meenan, Mark J Henson, Bing Shi, Robert A Singer, and Keith R Horspool. De- velopment of a Targeted Polymorph Screening Approach for a Complex Polymorphic and Highly Solvating API. Journal of Pharmaceutical Sciences, 99(9):3874–3886, 2010. [48] Ranko M Vrcelj, John N Sherwood, Alan R Kennedy, Hugh G Gallagher, and Thomas Gelbrich. Polymorphism in 2-4-6 Trinitrotoluene 2003. pages 2–7, 2003. [49] Guillaume Schweicher, Vincent Lemaur, Claude Niebel, Christian Ruzié, Ying Diao, Osamu Goto, Wen Ya Lee, Yeongin Kim, Jean Baptiste Ar- lin, Jolanta Karpinska, Alan R. Kennedy, Sean R. Parkin, Yoann Olivier, Stefan C.B. Mannsfeld, Jérôme Cornil, Yves H. Geerts, and Zhenan Bao. Bulky end-capped [1]Benzothieno[3,2-b]benzothiophenes: Reach- ing high-mobility organic semiconductors by fine tuning of the crystalline solid-state Order. Advanced Materials, 27(19):3066–3072, 2015. [50] Hyunjoong Chung, Dmytro Dudenko, Fengjiao Zhang, Gabriele D’Avino, Christian Ruzié, Audrey Richard, Guillaume Schweicher, Jérôme Cornil, David Beljonne, Yves Geerts, and Ying Diao. Rotator side chains trigger cooperative transition for shape and function memory effect in organic semiconductors. Nature Communications, 9(1):1–12, 2018. [51] Ze Fan Yao, Jie Yu Wang, and Jian Pei. Control of π-π Stacking via Crys- tal Engineering in Organic Conjugated Small Molecule Crystals. Crystal Growth and Design, 18(1):7–15, 2018. [52] Chris J. Pickard and R. J. Needs. Ab initio random structure searching. Journal of Physics Condensed Matter, 23(5), 2011. [53] Chris J. Pickard and R. J. Needs. High-pressure phases of silane. Physical Review Letters, 97(4):1–4, 2006. [54] Matthew Habgood, Isaac J. Sugden, Andrei V. Kazantsev, Claire S. Ad- jiman, and Constantinos C. Pantelides. Efficient handling of molecular flexibility in ab initio generation of crystal structures. Journal of Chemical Theory and Computation, 11(4):1957–1969, 2015. 156 [55] Isaac Sugden, Claire S. Adjiman, and Constantinos C. Pantelides. Accu- rate and efficient representation of intramolecular energy in ab initio gen- eration of crystal structures. I. Adaptive local approximate models. Acta Crystallographica Section B: Structural Science, Crystal Engineering and Mate- rials, 72(6):864–874, 2016. [56] Kirkpatrick S, Gelatt C, and Vecchi M. Optimization by Simulated An- nealing. Science, 220(4598):671–680, 1983. [57] John Kendrick, Frank J J Leusen, Marcus A. Neumann, and Jacco Van De Streek. Progress in crystal structure prediction. Chemistry - A European Journal, 17(38):10736–10744, 2011. [58] David J. Wales and Jonathan P.K. Doye. Global optimization by basin- hopping and the lowest energy structures of Lennard-Jones clusters con- taining up to 110 atoms. Journal of Physical Chemistry A, 101(28):5111–5116, 1997. [59] Maximilian Amsler and Stefan Goedecker. Crystal structure prediction using the minima hopping method. Journal of Chemical Physics, 133(22), 2010. [60] Stefan Goedecker. Minima hopping: An efficient search method for the global minimum of the potential energy surface of complex molecular systems. Journal of Chemical Physics, 120(21):9911–9917, 2004. [61] Artem R. Oganov and Colin W. Glass. Crystal structure prediction using ab initio evolutionary techniques: Principles and applications. Journal of Chemical Physics, 124(24), 2006. [62] Artem R. Oganov, Andriy O. Lyakhov, and Mario Valle. How evolution- ary crystal structure prediction works-and why. Accounts of Chemical Re- search, 44(3):227–237, 2011. [63] T Lookman, P V Balachandran, D Xue, G Pilania, T Shearman, J Theiler, J E Gubernatis, J Hogden, K Barros, E Bennaim, and F J Alexander. Infor- mation Science for Materials Discovery and Design, volume 225. 2016. [64] Prasanna V. Balachandran, Dezhen Xue, James Theiler, John Hogden, and Turab Lookman. Adaptive Strategies for Materials Design using Uncer- tainties. Scientific Reports, 6(1):19660, 2016. 157 [65] Shenghong Ju, Takuma Shiga, Lei Feng, Zhufeng Hou, Koji Tsuda, and Junichiro Shiomi. Designing nanostructures for phonon transport via Bayesian optimization. Physical Review X, 7(2):1–10, 2017. [66] Rebecca K Lindsey, Laurence E Fried, and Nir Goldman. ChIMES: A Force Matched Potential with Explicit Three-Body Interactions for Molten Car- bon. Journal of Chemical Theory and Computation, 13(12):6222–6229, 12 2017. [67] Jonathan Vandermause, Steven B Torrisi, Simon Batzner, Yu Xie, Lixin Sun, Alexie M Kolpak, and Boris Kozinsky. On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events. npj Compu- tational Materials, 6(1):20, 2020. [68] Anton O. Oliynyk, Lawrence A. Adutwum, Brent W. Rudyk, Harshil Pisavadia, Sogol Lotfi, Viktor Hlukhyy, James J Harynuk, Arthur Mar, and Jakoah Brgoch. Disentangling Structural Confusion through Machine Learning: Structure Prediction and Polymorphism of Equiatomic Ternary Phases ABC. Journal of the American Chemical Society, page jacs.7b08460, 2017. [69] T Mueller, A Kusne, and R Ramprasad. Science : Recent Progress. Reviews in Computational Chemistry, 29(i), 2016. [70] Tomoki Yamashita, Nobuya Sato, Hiori Kino, Takashi Miyake, Koji Tsuda, and Tamio Oguchi. PHYSICAL REVIEW MATERIALS 2 , 013803 ( 2018 ) Crystal structure prediction accelerated by Bayesian optimization. 013803:1–6, 2018. [71] William L Jorgensen, David S Maxwell, and Julian Tirado-Rives. De- velopment and Testing of the OLPS All-Atom Force Field on Conforma- tional Energetics and Properties of Organic Liquids. J. Am. Chem. Soc., 118(15):11225–11236, 1996. [72] Stephen L Mayo, Barry D Olafson, and William A Goddard. DREIDING: a generic force field for molecular simulations. The Journal of Physical Chem- istry, 94(26):8897–8909, 12 1990. [73] A D MacKerell, D Bashford, M Bellott, R L Dunbrack, J D Evanseck, M J Field, S Fischer, J Gao, H Guo, S Ha, D Joseph-McCarthy, L Kuchnir, K Kuczera, F T K Lau, C Mattos, S Michnick, T Ngo, D T Nguyen, B Prod- hom, W E Reiher, B Roux, M Schlenkrich, J C Smith, R Stote, J Straub, M Watanabe, J Wiórkiewicz-Kuczera, D Yin, and M Karplus. All-Atom 158 Empirical Potential for Molecular Modeling and Dynamics Studies of Pro- teins. The Journal of Physical Chemistry B, 102(18):3586–3616, 4 1998. [74] Donald R Jones. A Taxonomy of Global Optimization Methods Based on Response Surfaces. Journal of Global Optimization, 21(4):345–383, 2001. [75] Simon Streltsov and Pirooz Vakili. A Non-myopic Utility Function for Statistical Global Optimization Algorithms. Journal of Global Optimization, 14(3):283–298, 1999. [76] Norman L Allinger, Young H Yuh, and Jenn Huei Lii. Molecular me- chanics. The MM3 force field for hydrocarbons. 1. Journal of the American Chemical Society, 111(23):8551–8566, 11 1989. [77] TINKER Molecular Modeling, 2015. [78] S Plimpton. Fast Parallel Algorithms for Short – Range Molecular Dynam- ics, 1995. [79] Peter I. Frazier and Jialei Wang. Bayesian optimization for materials de- sign. pages 1–21, 2015. [80] Jonas Mockus. Application of Bayesian approach to numerical methods of global and stochastic optimization. Journal of Global Optimization, 4(4):347– 365, 6 1994. [81] Frank Hutter, Holger H Hoos, and Kevin Leyton-brown. Sequential Model-Based Optimization for General Algorithm Configuration (ex- tended version). Learning and Intelligent Optimization (LION), 2011. [82] Stuart P Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129–137, 1982. [83] J Macqueen. Some methods for classification and analysis of multivariate observations. In In 5-th Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297, 1967. [84] Y Linde, A Buzo, and R Gray. An Algorithm for Vector Quantizer Design. IEEE Trans. Commun., 28:84–95, 1980. [85] Peter J Rousseeuw. Silhouettes: A graphical aid to the interpretation and 159 validation of cluster analysis. Journal of Computational and Applied Mathe- matics, 20:53–65, 1987. [86] L J P van der Maaten and G E Hinton. Visualizing High-Dimensional Data Using t-SNE. 2008. [87] S Kullback and R A Leibler. On information and sufficiency. Ann. Math. Stat., 22:79–86, 1951. [88] Ryan N. Gutenkunst, Joshua J. Waterfall, Fergal P. Casey, Kevin S. Brown, Christopher R. Myers, and James P. Sethna. Universally sloppy param- eter sensitivities in systems biology models. PLoS Computational Biology, 3(10):1871–1878, 2007. [89] Frank H. Herbstein. How precise are measurements of unit-cell dimen- sions from single crystals? Acta Crystallographica Section B Structural Sci- ence, 56(4):547–557, 2000. [90] Denis Gentili, Massimo Gazzano, Manuela Melucci, Derek Jones, and Massimiliano Cavallini. Polymorphism as an additional functionality of materials for technological applications at surfaces and interfaces. Chem. Soc. Rev., 48(9):2502–2517, 2019. [91] Jihua Chen, Ming Shao, Kai Xiao, Adam J Rondinone, Yueh-Lin Loo, Paul R C Kent, Bobby G Sumpter, Dawen Li, Jong K Keum, Peter J Diemer, John E Anthony, Oana D Jurchescu, and Jingsong Huang. Solvent-type- dependent polymorphism and charge transport in a long fused-ring or- ganic semiconductor. Nanoscale, 6(1):449–456, 2014. [92] Yuri Mnyukh. Mechanism and Kinetics of Phase Transitions and Other Reactions in Solids. American Journal of Condensed Matter Physics, 3:89– 103, 1 2013. [93] Miguel A Garcia-Garibay. Molecular crystals on the move: from single- crystal-to-single-crystal photoreactions to molecular machinery. Ange- wandte Chemie (International ed. in English), 46(47):8945–8947, 2007. [94] Qi Jiang, Alexander G Shtukenberg, Michael D Ward, and Chunhua Hu. Non-Topotactic Phase Transformations in Single Crystals of β-Glycine. Crystal Growth & Design, 15(6):2568–2573, 6 2015. [95] Durga Prasad Karothu, James Weston, Israel Tilahun Desta, and Panče 160 Naumov. Shape-Memory and Self-Healing Effects in Mechanosalient Molecular Crystals. Journal of the American Chemical Society, 138(40):13298– 13306, 10 2016. [96] Manas K Panda, Tomče Runčevski, Subash Chandra Sahoo, Alexei A Be- lik, Naba K Nath, Robert E Dinnebier, and Panče Naumov. Colossal pos- itive and negative thermal expansion and thermosalient effect in a penta- morphic organometallic martensite. Nature communications, 5:4811, 9 2014. [97] Satoshi Takamizawa and Yasuhiro Miyamoto. Superelastic Organic Crys- tals. Angewandte Chemie International Edition, 53(27):6970–6973, 2014. [98] P Hohenberg and W Kohn. Inhomogeneous Electron Gas. Phys. Rev., 136(3B):B864–B871, 11 1964. [99] Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavazzoni, Davide Ceresoli, Guido L Chiarotti, Mat- teo Cococcioni, Ismaila Dabo, Andrea Dal Corso, Stefano de Gironcoli, Stefano Fabris, Guido Fratesi, Ralph Gebauer, Uwe Gerstmann, Christos Gougoussis, Anton Kokalj, Michele Lazzeri, Layla Martin-Samos, Nicola Marzari, Francesco Mauri, Riccardo Mazzarello, Stefano Paolini, Alfredo Pasquarello, Lorenzo Paulatto, Carlo Sbraccia, Sandro Scandolo, Gabriele Sclauzero, Ari P Seitsonen, Alexander Smogunov, Paolo Umari, and Re- nata M Wentzcovitch. QUANTUM ESPRESSO: a modular and open- source software project for quantum simulations of materials. Journal of Physics: Condensed Matter, 21(39):395502, 9 2009. [100] Hailian Li, Mohamed Eddaoudi, M O’Keeffe, and O M Yaghi. Design and synthesis of an exceptionally stable and highly porous metal-organic framework. Nature, 402(6759):276–279, 1999. [101] Hideki Hayashi, Adrien P Côté, Hiroyasu Furukawa, Michael O’Keeffe, and Omar M Yaghi. Zeolite A imidazolate frameworks. Nature Materials, 6(7):501–506, 2007. [102] Bert M Weckhuysen and Jihong Yu. Recent advances in zeolite chemistry and catalysis., 10 2015. [103] Tian Tian, Zhixin Zeng, Diana Vulpe, Mirian E. Casco, Giorgio Divitini, Paul A. Midgley, Joaquin Silvestre-Albero, Jin Chong Tan, Peyman Z. Moghadam, and David Fairen-Jimenez. A sol-gel monolithic metal- organic framework with enhanced methane uptake. Nature Materials, 17(2):174–179, 2018. 161 [104] Diego A Gómez-Gualdrón, Yamil J Colón, Xu Zhang, Timothy C Wang, Yu-Sheng Chen, Joseph T Hupp, Taner Yildirim, Omar K Farha, Jian Zhang, and Randall Q Snurr. Evaluating topologically diverse metal–organic frameworks for cryo-adsorbed hydrogen storage. Energy Environ. Sci., 9(10):3279–3289, 2016. [105] Tania Rodenas, Ignacio Luz, Gonzalo Prieto, Beatriz Seoane, Hozanna Miro, Avelino Corma, Freek Kapteijn, Francesc X Llabrés I Xamena, and Jorge Gascon. Metal-organic framework nanosheets in polymer compos- ite materials for gas separation., 1 2015. [106] Thomas M McDonald, Jarad A Mason, Xueqian Kong, Eric D Bloch, David Gygi, Alessandro Dani, Valentina Crocellà, Filippo Giordanino, Samuel O Odoh, Walter S Drisdell, Bess Vlaisavljevich, Allison L Dzubak, Roberta Poloni, Sondre K Schnell, Nora Planas, Kyuho Lee, Tod Pascal, Liwen F Wan, David Prendergast, Jeffrey B Neaton, Berend Smit, Jeffrey B Kor- tright, Laura Gagliardi, Silvia Bordiga, Jeffrey A Reimer, and Jeffrey R Long. Cooperative insertion of CO2 in diamine-appended metal-organic frameworks. Nature, 519(7543):303–308, 3 2015. [107] Peyman Z Moghadam, Joshua F Ivy, Ravi K Arvapally, Antonio M Dos Santos, John C Pearson, Li Zhang, Emmanouil Tylianakis, Pritha Ghosh, Iain W H Oswald, Ushasree Kaipa, Xiaoping Wang, Angela K Wilson, Randall Q Snurr, and Mohammad A Omary. Adsorption and molecular siting of CO(2), water, and other gases in the superhydropho- bic, flexible pores of FMOF-1 from experiment and simulation. Chemical science, 8(5):3989–4000, 5 2017. [108] Jian-Rong Li, Ryan J Kuppler, and Hong-Cai Zhou. Selective gas adsorp- tion and separation in metal-organic frameworks. Chemical Society reviews, 38(5):1477–1504, 5 2009. [109] Jiewei Liu, Lianfen Chen, Hao Cui, Jianyong Zhang, Li Zhang, and Cheng-Yong Su. Applications of metal-organic frameworks in hetero- geneous supramolecular catalysis. Chemical Society reviews, 43(16):6011– 6061, 8 2014. [110] Joseph E Mondloch, Michael J Katz, William C Isley III, Pritha Ghosh, Peilin Liao, Wojciech Bury, George W Wagner, Morgan G Hall, Jared B DeCoste, Gregory W Peterson, Randall Q Snurr, Christopher J Cramer, Joseph T Hupp, and Omar K Farha. Destruction of chemical warfare agents using metal–organic frameworks. Nature Materials, 14:512, 3 2015. 162 [111] Sophie E Miller, Michelle H Teplensky, Peyman Z Moghadam, and David Fairen-Jimenez. Metal-organic frameworks as biosensors for luminescence-based detection and imaging. Interface focus, 6(4):20160027, 8 2016. [112] Lauren E Kreno, Kirsty Leong, Omar K Farha, Mark Allendorf, Richard P Van Duyne, and Joseph T Hupp. Metal-organic framework materials as chemical sensors. Chemical reviews, 112(2):1105–1125, 2 2012. [113] Jared B DeCoste and Gregory W Peterson. Metal-organic frameworks for air purification of toxic chemicals. Chemical reviews, 114(11):5695–5727, 6 2014. [114] N. Scott Bobbitt, Matthew L. Mendonca, Ashlee J. Howarth, Timur Is- lamoglu, Joseph T. Hupp, Omar K. Farha, and Randall Q. Snurr. Metal- organic frameworks for the removal of toxic industrial chemicals and chemical warfare agents. Chemical Society Reviews, 46(11):3357–3385, 2017. [115] Bethany M Connolly, David G Madden, Andrew E H Wheatley, and David Fairen-Jimenez. Shaping the Future of Fuel: Monolithic Metal–Organic Frameworks for High-Density Gas Storage. Journal of the American Chemical Society, 142(19):8541–8549, 5 2020. [116] ARPA-E. DE-FOA-0000672: Methane opportunities for vehicular energy. [117] Qi-Long Zhu and Qiang Xu. Metal–organic framework composites. Chem. Soc. Rev., 43(16):5468–5512, 2014. [118] Hai-Long Jiang, Bo Liu, Ya-Qian Lan, Kentaro Kuratani, Tomoki Akita, Hiroshi Shioyama, Fengqi Zong, and Qiang Xu. From metal-organic framework to nanoporous carbon: toward a very high surface area and hydrogen uptake. Journal of the American Chemical Society, 133(31):11854– 11857, 8 2011. [119] Chunjing Lu, Teng Ben, Shixian Xu, and Shilun Qiu. Electrochem- ical Synthesis of a Microporous Conductive Polymer Based on a Metal–Organic Framework Thin Film. Angewandte Chemie International Edition, 53(25):6454–6458, 2014. [120] Anthony Shoji Hall, Atsushi Kondo, Kazuyuki Maeda, and Thomas E Mallouk. Microporous Brookite-Phase Titania Made by Replication of a Metal–Organic Framework. Journal of the American Chemical Society, 135(44):16276–16279, 11 2013. 163 [121] Jack D Evans, Christopher J Sumby, and Christian J Doonan. Post- synthetic metalation of metal–organic frameworks. Chem. Soc. Rev., 43(16):5933–5951, 2014. [122] Yuanbiao Huang, Zujin Lin, and Rong Cao. Palladium Nanoparticles Encapsulated in a Metal–Organic Framework as Efficient Heterogeneous Catalysts for Direct C2 Arylation of Indoles. Chemistry – A European Jour- nal, 17(45):12706–12712, 2011. [123] Xueshen Wang, Qunqing Li, Jing Xie, Zhong Jin, Jinyong Wang, Yan Li, Kaili Jiang, and Shoushan Fan. Fabrication of Ultralong and Electri- cally Uniform Single-Walled Carbon Nanotubes on Clean Substrates 2009. pages 1–5, 2009. [124] Adrien P Côté, Annabelle I Benin, Nathan W Ockwig, Michael O'Keeffe, Adam J Matzger, and Omar M Yaghi. Porous, Crystalline, Covalent Organic Frameworks. Science, 310(5751):1166 LP – 1170, 11 2005. [125] Enquan Jin, Mizue Asada, Qing Xu, Sasanka Dalapati, Matthew A Addicoat, Michael A Brady, Hong Xu, Toshikazu Nakamura, Thomas Heine, Qiuhong Chen, and Donglin Jiang. Two-dimensional sp2 car- bon{\textendash}conjugated covalent organic frameworks. Science, 357(6352):673–676, 2017. [126] Eric L Spitler and William R Dichtel. Lewis acid-catalysed formation of two-dimensional phthalocyanine covalent organic frameworks. Nature Chemistry, 2(8):672–677, 2010. [127] Yanpei Song, Qi Sun, Briana Aguila, and Shengqian Ma. Opportunities of Covalent Organic Frameworks for Advanced Applications. Advanced Science, 6(2):1801410, 2019. [128] Hani M El-Kaderi, Joseph R Hunt, José L Mendoza-Cortés, Adrien P Côté, Robert E Taylor, Michael O’Keeffe, and Omar M Yaghi. Designed synthesis of 3D covalent organic frameworks. Science (New York, N.Y.), 316(5822):268–272, 4 2007. [129] Yuzhong Liu, Yanhang Ma, Yingbo Zhao, Xixi Sun, Felipe Gándara, Hi- royasu Furukawa, Zheng Liu, Hanyu Zhu, Chenhui Zhu, Kazutomo Sue- naga, Peter Oleynikov, Ahmad S. Alshammari, Xiang Zhang, Osamu Terasaki, and Omar M. Yaghi. Weaving of organic threads into a crys- talline covalent organic framework. Science, 351(6271):365–369, 2016. 164 [130] Yingwei Li and Ralph T Yang. Hydrogen storage in metal-organic and covalent-organic frameworks by spillover. AIChE Journal, 54(1):269–279, 2008. [131] Jin-Tao Yu, Zhe Chen, Junliang Sun, Zhi-Tang Huang, and Qi-Yu Zheng. Cyclotricatechylene based porous crystalline material: Synthesis and ap- plications in gas storage. J. Mater. Chem., 22(12):5369–5373, 2012. [132] San-Yuan Ding, Jia Gao, Qiong Wang, Yuan Zhang, Wei-Guo Song, Cheng-Yong Su, and Wei Wang. Construction of Covalent Organic Frame- work for Catalysis: Pd/COF-LZU1 in Suzuki–Miyaura Coupling Reac- tion. Journal of the American Chemical Society, 133(49):19816–19822, 12 2011. [133] Sasanka Dalapati, Shangbin Jin, Jia Gao, Yanhong Xu, Atsushi Nagai, and Donglin Jiang. An Azine-Linked Covalent Organic Framework. Journal of the American Chemical Society, 135(46):17310–17313, 11 2013. [134] Yuzhong Liu, Yanhang Ma, Jingjing Yang, Christian S Diercks, Nobumichi Tamura, Fangying Jin, and Omar M Yaghi. Molecular Weaving of Co- valent Organic Frameworks for Adaptive Guest Inclusion. Journal of the American Chemical Society, 140(47):16015–16019, 2018. [135] Brian T Koo and Paulette Clancy. Towards optimal packing and diffu- sion of fullerene molecules in the Pc-PBBA covalent organic framework. Molecular Simulation, 40(1-3):58–70, 1 2014. [136] Anastasios I Skoulidas and David S Sholl. Self-Diffusion and Transport Diffusion of Light Gases in Metal-Organic Framework Materials Assessed Using Molecular Dynamics Simulations. The Journal of Physical Chemistry B, 109(33):15760–15768, 8 2005. [137] Anastasios I Skoulidas and David S Sholl. Molecular Dynamics Simu- lations of Self-Diffusivities, Corrected Diffusivities, and Transport Diffu- sivities of Light Gases in Four Silica Zeolites To Assess Influences of Pore Shape and Connectivity. The Journal of Physical Chemistry A, 107(47):10132– 10141, 11 2003. [138] Markus K Dahlgren, Patric Schyman, Julian Tirado-Rives, and William L Jorgensen. Characterization of Biaryl Torsional Energetics and its Treat- ment in OPLS All-Atom Force Fields. Journal of Chemical Information and Modeling, 53(5):1191–1199, 5 2013. 165 [139] Michael J Robertson, Julian Tirado-Rives, and William L Jorgensen. Im- proved Peptide and Protein Torsional Energetics with the OPLS-AA Force Field. Journal of Chemical Theory and Computation, 11(7):3499–3509, 7 2015. [140] Smit Berend Frenkel Daan. Understanding molecular simulation from algo- rithms to applications. [141] G Ciccotti, D Frenkel, and Eds. I.R. McDonalds. Simulation of Liquids and Solids. North-Holland, Amsterdam, 1987. [142] Hongwei Zeng, Yu Liu, and Honglai Liu. Adsorption and diffusion of CO2 and CH4 in covalent organic frameworks: an MC/MD simulation study. Molecular Simulation, 44(15):1244–1251, 10 2018. [143] Yingzhi Zeng, Qianxiao Li, and Kewu Bai. Prediction of interstitial diffu- sion activation energies of nitrogen, oxygen, boron and carbon in bcc, fcc, and hcp metals using machine learning. Computational Materials Science, 144:232–247, 2018. [144] Tian Ge Wang, Jin Jin Cao, and Xiao Fan Gou. Activation energy of oxygen diffusion: A possible indicator of supercurrents through YBa 2 Cu 3 O 7 grain boundaries. Applied Surface Science, 480(June 2018):765–769, 2019. [145] D D Do, L F Herrera, and H D Do. A new method to determine pore size and its volume distribution of porous solids having known atomistic con- figuration. Journal of colloid and interface science, 328(1):110–119, 12 2008. [146] V M Burlakov, Y Tsukahara, G A D Briggs, and A P Sutton. Monte Carlo Simulation of Vapour Deposition of Nonstoichiometric Amorphous Silica. MRS Proceedings, 648:P6.41, 2000. [147] Alexander Schoedel, Zhe Ji, and Omar M Yaghi. The role of metal–organic frameworks in a carbon-neutral energy cycle. Nature Energy, 1(4):16034, 2016. [148] Dalal Alezi, Youssef Belmabkhout, Mikhail Suyetin, Prashant M Bhatt, Łukasz J Weseliński, Vera Solovyeva, Karim Adil, Ioannis Spanopou- los, Pantelis N Trikalitis, Abdul-Hamid Emwas, and Mohamed Ed- daoudi. MOF Crystal Chemistry Paving the Way to Gas Storage Needs: Aluminum-Based soc-MOF for CH4, O2, and CO2 Storage. Journal of the American Chemical Society, 137(41):13308–13318, 10 2015. 166 [149] N Scott Bobbitt, Matthew L Mendonca, Ashlee J Howarth, Timur Islam- oglu, Joseph T Hupp, Omar K Farha, and Randall Q Snurr. Metal–organic frameworks for the removal of toxic industrial chemicals and chemical warfare agents. Chem. Soc. Rev., 46(11):3357–3385, 2017. [150] Peyman Z. Moghadam, Sven M.J. Rogge, Aurelia Li, Chun-Man Chow, Jelle Wieme, Noushin Moharrami, Marta Aragones-Anglada, Gareth Conduit, Diego A. Gomez-Gualdron, Veronique Van Speybroeck, and David Fairen-Jimenez. Structure-Mechanical Stability Relations of Metal- Organic Frameworks via Machine Learning. Matter, 1(1):219–234, 2019. [151] Russell E Morris and Paul S Wheatley. Gas storage in nanoporous ma- terials. Angewandte Chemie (International ed. in English), 47(27):4966–4981, 2008. [152] Avelino Corma. From Microporous to Mesoporous Molecular Sieve Ma- terials and Their Use in Catalysis. Chemical Reviews, 97(6):2373–2420, 10 1997. [153] Patricia Horcajada, Tamim Chalati, Christian Serre, Brigitte Gillet, Cather- ine Sebrie, Tarek Baati, Jarrod F Eubank, Daniela Heurtaux, Pas- cal Clayette, Christine Kreuz, Jong-San Chang, Young Kyu Hwang, Veronique Marsaud, Phuong-Nhi Bories, Luc Cynober, Sophie Gil, Gérard Férey, Patrick Couvreur, and Ruxandra Gref. Porous metal-organic- framework nanoscale carriers as a potential platform for drug delivery and imaging. Nature materials, 9(2):172–178, 2 2010. [154] Thomas F Willems, Chris H Rycroft, Michaeel Kazi, Juan C Meza, and Maciej Haranczyk. Algorithms and tools for high-throughput geometry- based analysis of crystalline porous materials. Microporous and Mesoporous Materials, 149(1):134–141, 2012. [155] Juergen Getzschmann, Irena Senkovska, Dirk Wallacher, Michael Tovar, David Fairen-Jimenez, Tina Düren, Jasper M. Van Baten, Rajamani Kr- ishna, and Stefan Kaskel. Methane storage mechanism in the metal- organic framework Cu 3(btc)2: An in situ neutron diffraction study. Mi- croporous and Mesoporous Materials, 136(1-3):50–58, 2010. [156] Albert P Bartók, Risi Kondor, and Gábor Csányi. On representing chemi- cal environments. Phys. Rev. B, 87(18):184115, 5 2013. [157] Piotr H. Karpinski. Polymorphism of active pharmaceutical ingredients. Chemical Engineering and Technology, 29(2):233–237, 2006. 167 [158] Kaisar Raza. Polymorphism: The Phenomenon Affecting the Performance of Drugs. SOJ Pharmacy & Pharmaceutical Sciences, 2014. [159] Tsai-Ta C Lai, Steven Ferguson, Laura Palmer, Bernhardt L Trout, and Al- lan S Myerson. Continuous Crystallization and Polymorph Dynamics in the l-Glutamic Acid System. Organic Process Research & Development, 18(11):1382–1390, 11 2014. [160] Tsai-Ta C Lai, Jan Cornevin, Steven Ferguson, Nahan Li, Bernhardt L Trout, and Allan S Myerson. Control of Polymorphism in Continuous Crystallization via Mixed Suspension Mixed Product Removal Systems Cascade Design. Crystal Growth & Design, 15(7):3374–3382, 7 2015. [161] Henry C. Herbol, Weici Hu, Peter Frazier, Paulette Clancy, and Matthias Poloczek. Efficient search of compositional space for hybrid or- ganic–inorganic perovskites via Bayesian optimization. npj Computational Materials, 4(1):1–7, 2018. [162] R R Coifman, S Lafon, A B Lee, M Maggioni, B Nadler, F Warner, and S W Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7426–31, 5 2005. [163] Hannes Jónsson, Greg Mills, and Karsten W. Jacobsen. Nudged elastic band method for finding minimum energy paths of transitions. In Classi- cal and Quantum Dynamics in Condensed Phase Simulations, pages 385–404. 168