MODELING AND MACHINE LEARNING
STUDIES OF STRUCTURE-PROPERTY
RELATIONSHIP IN ORGANIC SYSTEMS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Nikita Sengar
August 2021
© 2021 Nikita Sengar
ALL RIGHTS RESERVED
MODELING AND MACHINE LEARNING STUDIES OF
STRUCTURE-PROPERTY RELATIONSHIP IN ORGANIC SYSTEMS
Nikita Sengar, Ph.D.
Cornell University 2021
Organic materials with a judicious choice of functionalization have emerged
as attractive candidates for use as active layers in new electronic technologies.
This includes applications such as flexible displays, wearable electronics and
storage devices for gas separation and the capture of solutes such as chemi-
cal warfare agents. However, their use in the electronics industry is somewhat
limited due to their tendency to pack into multiple, structurally distinct forms
(a phenomenon known as polymorphism). For applications in energy storage
technologies, due to the versatility of synthesis of organic framework materials,
there remains an ongoing need to both elucidate and optimize the principles
that govern performance with respect to size- and chemical- selectivity towards
organic solutes. To address these challenges, we conducted detailed compu-
tational studies to develop a better understanding of the relationship between
nanoscale structure and macroscale properties.
Understanding polymorphism in organic semiconductors (OS) is critically
important since any slight variation in π orbital overlap can lead to drastic dif-
ferences in the charge carrier mobility. But finding polymorphs is a challeng-
ing task, because they are prone to structural reversibility, and have tradition-
ally involved an iterative sampling with the possible structural space driven
by those structures that lead to the lowest energy polymorphs. [1] We have ad-
dressed this issue here by incorporating Bayesian Optimization into Molecu-
lar Dynamics (MD) simulations to predict polymorphs. Our test case was a
high-performing organic semiconductor, bis(trimethylsilyl) [1]benzothieno[3,2-
b]- benzothiophene (diTMS-BTBT).
Our novel approach uncovered the relationship between minimizing the to-
tal energy as a function of a chosen design parameter, and allowed us to identify
the optimal structures by running time-consuming, expensive simulations for
only a fraction (∼15-20 percent) of the entire set of possible candidates (con-
sisting of over 500 structures). Next, we expanded our investigation to use
density functional theory to elucidate the molecular-scale mechanism behind
the polymorphic transition in two related organic semiconductors, ditert-butyl
[1]benzothieno[3,2-b]-benzothiophene (ditBu-BTBT) and diTMS-BTBT. By com-
paring their packing environment, we established a molecular ”design rule”
for selectively accessing both the so-called “nucleation and growth” and “coop-
erative” transition pathways in organic crystals. Finally, we characterized the
structural and physical properties of two exemplars of organic woven materials,
COF-506 and HKUST-1 MOF functionalized with large (10 nm-dia.) Palladium
nanoparticles. Using MD, we explored the propensity of both these materials
to be suitable for small-molecule gas diffusion within their densely interwoven
matrix of structural entities.
Our multiscale computational studies improve our current understanding of
structure-property relationships in organic systems, providing key insight into
the accelerated development of next-generation electronic materials.
BIOGRAPHICAL SKETCH
Nikita Sengar was born and raised in India, where she earned her Bachelors’s
degree in Chemical Engineering from Birla Institute of Technology and Science,
Pilani, in 2014. During the summer of her junior year of undergrad, she got the
opportunity to intern with Prof. Paulette Clancy’s group at Cornell. During this
internship, she developed a particular interest in scientific research, especially in
the field of computational nanoscience, and decided to shift her focus to pursue
research in this field. She moved to Ithaca in 2014 and earned her Master’s
degree in Chemical Engineering in 2016, and continued her research into the
doctoral program under the tutelage of Prof. Paulette Clancy.
While studying at Cornell, Nikita was selected to engage in highly competi-
tive Machine Learning oriented summer schools. In 2015, she received an NSF
travel grant to attend the CECAM Macromolecular Simulation workshop in
Juelich, Germany, where she won a simulation-based “Hackathon” event. In
the Fall of 2018, she participated in the data-driven “MATERIALS 4.0” summer
school at TU Dresden, Germany as the invited speaker. In the summer of 2019,
she engaged in Uber’s 2nd Science symposium in San Francisco, CA.
During her time at Cornell, Nikita conducted internships with Global-
Foundries and Corning Inc. and developed an interest in pursuing research
in an industrial setting. Post Ph.D., she will be joining the Optical Lithogra-
phy group at Intel Corp. as an Imaging Data Scientist to develop AI tools for
accelerated predictions of defects in wafers.
Outside research, Nikita has been involved in multiple outreach programs
aimed at increasing diversity and inclusion at the workplace. She led the team
to formulate and successfully launch a travel grant aimed at promoting diverse
participation in conferences and workshops.
iii
To my parents, who taught me the values of hard work and integrity. To my
brother, who continues to inspire me. Without their love and support, I would
not have made it through.
iv
ACKNOWLEDGEMENTS
First and foremost, I would like to express my deep-felt gratitude to my advisor,
Prof. Paulette Clancy, for her unwavering support and incredible mentorship
throughout the length of this work. Her open-mindedness and positive outlook
towards complex problems have been instrumental in helping me grow as a
researcher. Throughout my graduate career, she has time and again enabled me
to pursue my research passions and follow my career aspirations, for which I
will forever be grateful. I am very fortunate to have pursued my Ph.D. program
under her leadership.
I would also like to thank my committee members, Prof. Perrine Pepiot and
Prof. Tobias Hanrath, for their continued support and mentorship over the last
few years. Further, I would like to thank my experimental collaborators: Prof.
Ying Diao (University of Illinois Urbana-Champaign) and Prof. Jimenez (Uni-
versity of Cambridge, UK), who performed complementary studies related to
my theoretical simulations; I found my discussions with them to be inspiring
and enlightening. Finally, I would like to thank Prof. Peter Frazier and Dr.
Matthias Poloczek, who helped me view things from a Machine Learning per-
spective and expanded my knowledge of Bayesian approaches.
Next, I would like to thank all the members of the Clancy group who main-
tained a warm and welcoming environment within the group, allowing me to
feel at home at work. I would especially like to thank Dr. Yaset Acevedo, Dr.
Victoria Sorg, Dr. Mardochee Reveil, Dr. Henry Herbol, Ryan Heden, Andrew
Ruttinger, Blaire Sorenson, Divya Sharma, Aaron Chen, Seun Romiluyi, and
Isaiah Chen for countless helpful discussions, exchange of ideas, caring, and
tremendous emotional support throughout this research effort. I will forever
cherish this camaraderie.
v
I would also like to acknowledge the Institute of Computational Science and
Engineering, Cornell University, and Maryland Advanced Research Computing
Center for providing me with the necessary computational resources required
for my research endeavor.
I have been fortunate to have made wonderful friends along the way, and
I am grateful for their tremendous moral support, care, and encouragement
throughout my Ph.D. studies. Special thanks to Dr. Neeraj Kulkarni, Dr. Pragya
Shah, Dr. Prateek Sehgal, Dr. Arzoo Katiyar, Dr. Pragya Sharma, Mayuri
Ukidwe, Harnish Naik, Ameya Harmalkar, and Pranay Ladiwala.
Last but not least, I would like to thank my parents and brother for being my
rock-solid support. Without their continued guidance and motivation, I never
would have made it through.
vi
TABLE OF CONTENTS
Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction 1
1.1 Advancement in Organic Electronics . . . . . . . . . . . . . . . . . 1
1.2 Polymorphism in Organic Semiconductors . . . . . . . . . . . . . 2
1.3 Organic Materials for Storage Applications . . . . . . . . . . . . . 6
1.3.1 Covalent & Metal Organic Framework Materials . . . . . . 7
1.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 A Bayesian Optimization-Accelerated Route to Predicting Polymor-
phism in Small-Molecule Organic Semiconductors 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Methods Employed . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . 19
2.3 System and Workflow Details . . . . . . . . . . . . . . . . . . . . . 20
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Polymorph Prediction . . . . . . . . . . . . . . . . . . . . . 27
2.4.2 Identifying Polymorphs . . . . . . . . . . . . . . . . . . . . 31
2.4.3 Correlation with Energy . . . . . . . . . . . . . . . . . . . . 40
2.4.4 Correlation among Lattice Constants . . . . . . . . . . . . 45
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Molecular Design Rules for Accessing Different Polymorphic Transi-
tion Mechanisms in Organic Electronic Crystals 52
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Transition Pathways of Nucleation and Growth versus Coopera-
tive Transition in Organic Electronics . . . . . . . . . . . . . . . . . 54
3.3 Computational Techniques . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.1 Density Functional Theory . . . . . . . . . . . . . . . . . . 58
3.3.2 System Details . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
vii
4 Elucidation of Structure-Property Relationships in a Woven Class of
Materials 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Molecular Morphology of COF-506 . . . . . . . . . . . . . . . . . . 73
4.2.1 Computational Strategy and System Details . . . . . . . . 74
4.2.2 Multiple Time Origin-based Calculations of Self-Diffusion
in COF-506 . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.3 Autocorrelation Functions . . . . . . . . . . . . . . . . . . . 81
4.2.4 Diffusion Results . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2.5 Pore Size Distribution Algorithm . . . . . . . . . . . . . . . 94
4.2.6 Structure of COF-506-Cu . . . . . . . . . . . . . . . . . . . . 102
4.3 Structural Characterization of the HKUST-1 MOF Functionalized
with Pd Nanoparticles as a Composite . . . . . . . . . . . . . . . . 103
4.3.1 System Details . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.2 Pore Size Distribution of the Pd-MOF Composite . . . . . 105
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5 Conclusions and Future Directions 113
5.1 Accomplishments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2.1 Bayesian Optimization for Polymorph Prediction . . . . . 119
5.2.2 Covalent and Metal Organic Framework Materials . . . . 121
A 3D t-SNE and k-means Clustering of Polymorphs 123
B Python Code for Pore Size Distribution Algorithm Discussed in Sec-
tion 4.2.5 124
Bibliography 151
viii
LIST OF TABLES
2.1 Lattice dimensions a, b, c, are given in Å; angles α, β, γ are given
in degrees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Unit cell information for polymorphs of diTMS and ditBu. . . . . 57
4.1 Comparison of self-diffusivities of He, CO2, and THF within
metallated COF-506 over a range of temperatures at 1 bar pressure. 89
4.2 Comparison of self-diffusivities of He, CO2, and THF within
metallated COF-506 over a range of temperatures at 1000 bar
pressure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3 Activation energy and pre-exponential factor calculated for dif-
fusion of Helium in metallated COF-506. A negative value in the
activation energy for 1 bar is attributed to artifacts in simulation. 93
4.4 Activation energy and pre-exponential factor calculated for dif-
fusion of CO2 in metallated COF-506. . . . . . . . . . . . . . . . . 93
4.5 Activation energy and pre-exponential factor calculated for dif-
fusion of THF in metallated COF-506. . . . . . . . . . . . . . . . . 93
4.6 Atomic radii of each atom type in the COF-506-Cu system used
to calculate the porous spaces. . . . . . . . . . . . . . . . . . . . . 95
ix
LIST OF FIGURES
1.1 Publications on polymorphism in the fields of chemistry, mate-
rials science, and physics from the Web of Science. Reprinted
from [2]. Copyright 2016 Royal Society of Chemistry. . . . . . . . 4
2.1 (a) The diTMS-BTBT molecule featuring two bulky tri-methyl
side-chains. (b) and (c) depict molecular arrangements of atoms
inside periodic unit cells of two experimentally identified poly-
morphs of diTMS-BTBT: (b) the alpha phase is the themodynam-
ically metastable one, while (c) beta phase is the thermodynami-
cally stable polymorph. These polymorphs are characterized by
a six-dimensional vector x = {a, b, c, α, β, γ}where a, b, c, α, β, γ are
lattice parameters whose values are shown in Table 2.1. Data is
obtained from Chung et al. [3] Periodic boundary conditions are
used on the simulation cell to emulate the experimental situa-
tion by modeling an infinite crystal. Color key: Carbon in blue,
sulphur in yellow, silicon in gray, and hydrogen in white. . . . . 13
2.2 Flowchart describing the process that combines MD and
BayesOpt to make a predictive distribution of the energy for each
candidate structure. The algorithm starts by randomly creating
a set of input parameters, x. Structures characterized by these in-
puts, x, are then sent as input to the MD simulations. The output
from these MD runs, in the form of the total energy (of the train-
ing set), is used to train the Bayesian model, which calculates the
posterior distribution for the energy function. If the errors in the
prediction are less than Eerror (a user-defined value), this process
stops. Otherwise, this predictive distribution suggests the next
best candidate to test based on a criterion known as “expected
improvement.” [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Predicted distribution of the total energy for all possible candi-
dates of diTMS-BTBT polymorphs. The solid blue line represents
energy values predicted using the BO. The black squares indicate
candidate points that were used to train the model. The blue
shaded region represents one standard deviation above and be-
low the BayesOpt-predicted values. . . . . . . . . . . . . . . . . . 29
2.4 Probability distribution of the standard deviations observed in
the values of energy for diTMS-BTBT polymorphs shown in Fig-
ure 1.3. The sum of all the bar heights equals 1.0 . . . . . . . . . . 29
x
2.5 Diagnostic plots obtained via (a) ‘leave one out or n-fold’ and (b) ‘4-
fold’ cross-validation. The x-axis contains the energy values di-
rectly obtained from MD simulations. The y-axis denotes values
predicted from the Bayesian model. The vertical bars indicate
the error (one standard deviation) in predicting the energy val-
ues. The correspondence of actual to predicted energies shows
that our model works well, with a low root mean squared error,
around 0.75 - 0.80 kcal/mol and mean absolute error less than 1
kcal/mol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 (a) Use of the elbow method to determine the appropriate num-
ber of clusters in the data set. The plot shows the variation
within the cluster as a function of the number of clusters. The
elbow of the curve is located at the point where the variation
starts to level off and represents the optimal number of clusters
to use. (b) “Silhouette analysis” is used to measure how similar
a data point is to its own cluster (“cohesion”) compared to other
clusters (“separation”). Black dotted lines show the average sil-
houette score. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Student’s t-distributed stochastic neighbor embedding (t-SNE)
of our six-dimensional data represented on a two-dimensional
plane for different values of perplexities. Gray dots represent
candidate crystal structures, red and yellow dots, respectively,
denote the thermodynamically metastable alpha and thermody-
namically stable beta phases. . . . . . . . . . . . . . . . . . . . . . 37
2.8 t-SNE low-dimensional embedding of our six-dimensional data,
color coded using the same scheme applied to clusters discov-
ered by k-means clustering algorithm: (a) with two clusters and
(b) with three clusters. . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.9 Energy distribution in kcal/mole for candidate structures plot-
ted using t-SNE for two-dimensional embedding. . . . . . . . . . 42
2.10 Probability density of total energies for the case in which we use
the number of clusters to be 2 in the k-means clustering algo-
rithm. The top figure shows results for the cluster (polymorph)
1 and the bottom shows that for the cluster (polymorph) 2. . . . . 43
2.11 Probability density of total energies for the case in which we use
the number of clusters as 3 in our k-means clustering algorithm.
Top, middle, and bottom figures show results for the derivative
of polymorph 1, as well as for polymorphs 1 and polymorph 2,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.12 Distribution of lattice constants of polymorphs found by the k-
means clustering algorithm using the number of clusters as two.
(a) Cluster 1 (top subfigure) contains the experimentally found
stable beta phase, and (b) cluster 2 (bottom subfigure) contains
the experimentally found metastable alpha phase. . . . . . . . . . 47
xi
2.13 Distribution of lattice constants of polymorphs found by the
k-means clustering algorithm using the number of clusters as
three. (a) Derivative polymorph (top subfigure), (b) cluster 1
(middle subfigure) that contains the experimentally found sta-
ble beta phase and (c) cluster 2 (bottom subfigure) contains the
experimentally found metastable alpha phase. . . . . . . . . . . . 49
3.1 Two polymorphic transition pathways of nucleation and growth
and cooperative transition in single crystals of diTMS and ditBu.
(a) DSC of diTMS powder showing reversible polymorphic tran-
sition. The molecular structure is shown in the inset. (b) DSC of
ditBu powder showing reversible polymorphic transition. The
molecular structure is shown in the inset. (c) Cross-polarized
optical microscopy (CPOM) images of a single crystal of diTMS
α form. The α form transitions to the β form upon heating after
approximately 415 s. Upon cooling, the β form does not transi-
tion back. (d) CPOM images of a single crystal of ditBu LT form
(blue birefringence). The LT form rapidly transitions to the HT
form (brown birefringence) upon heating in approximately 1.25
s. Upon cooling, the HT form reversibly transitions to the LT
form in 0.38 s. Reprinted from [3]. Copyright 2019 American
Chemical Society. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Energy barrier associated with rotating a side-chain in a sin-
gle crystal of diTMS-α. Simulation snapshots of key rotational
points are shown on the graph. Color key in the snapshots: red,
cyan and orange reflect the carbon atoms in the side-chain that
were rotated in this study; gray, yellow, pink, and white indicate
carbon, sulphur, silicon, and hydrogen atoms, respectively. . . . 64
3.3 Energy barrier associated with rotating a side chain in a sin-
gle crystal of diTMS-β. Simulation snapshots of key rotational
points are shown on the graph. Color key in the snapshots: red,
cyan and orange reflect the carbon atoms in the side-chain that
were rotated in this study; gray, yellow, pink, and white indicate
carbon, sulphur, silicon, and hydrogen atoms, respectively. . . . 65
3.4 Energy barrier associated with rotating a side-chain in a sin-
gle crystal of ditBu-LT. Simulation snapshots of key rotational
points are shown on the graph. Color key in the snapshots: red,
cyan and orange reflect the carbon atoms in the side-chain that
were rotated in this study; gray, yellow, and white indicate car-
bon, sulphur, and hydrogen atoms, respectively. . . . . . . . . . . 66
xii
3.5 Energy barrier associated with rotating a side-chain in a sin-
gle crystal of ditBu-HT. Simulation snapshots of key rotational
points are shown on the graph. Color key in the snapshots: red,
cyan and orange reflect the carbon atoms in the side-chain that
were rotated in this study; gray, yellow, and white indicate car-
bon, sulphur, and hydrogen atoms, respectively. . . . . . . . . . . 67
3.6 Comparison of energy barrier of rotation for diTMS α and β
phases (top) and ditBu low temperature (LT) and high tempera-
ture (HT) forms (bottom). The more interlocked side-chain envi-
ronment of diTMS (which has a Si atom in the side-chain) shows
a much higher energy barrier compared to the more flexible en-
vironment of ditBu (which has C in the side-chain). . . . . . . . . 68
4.1 Pictorial decomposition of the two interwoven strands compris-
ing COF-505. Each color represents a different strand. Copper
atoms are shown as red spheres. In the diagrams labeled “mesh
1” and “mesh 2” there are eight independent organic chains,
leading to sixteen chains in the image labeled as “COF-505”. [5] . 75
4.2 (a)Metallated COF-505. Snapshot was taken from an MD simu-
lation. (b) Unit cell of COF-506 achieved by replicating the unit
cell twice in the b- and c- directions. [5] . . . . . . . . . . . . . . . 76
4.3 Representation of a data set storing values at a total number of
L times, ti → {i = 0, L − 1}. The method of multiple time origins
loops over all available time origins, tk → {k = 1,M} to calcu-
late the time correlation function. This case only represents the
schematic of one time delay of t = 3 ∆t. However, in practise, the
method averages data for multiple values of time delay ranging
from ∆t = {1, L − 1} [6]. . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Mean squared displacements (MSD) of solutes (a) Helium, (b)
CO2 and (c) THF with respect to time. The red curves indicate the
MSD measured using a single time-origin, known not to be rep-
resentative of the average result. The blue graphs improve the
statistics by measuring the MSD using multiple (100s of) time-
origins (explained in section 4.2.2). The drop-off at the end of
the simulations reflects the fewer origins left by that time and
are not representative of accurate values at that point. . . . . . . 86
4.5 (a) Self-diffusivity of He in COF-506-Cu from 200 to 500 K at two
pressures (1 atm (blue) and 1000 atm (green)). (b) Self-diffusivity
of CO2 in COF-506-Cu from 200 to 500 K and two pressure values
as denoted by the inset color key. (c) Self-diffusivities of THF in
COF-506-Cu at 1 atm as a function of temperature (200 to 500 K).
Error bars represent the standard error. Dot-dashed lines join the
data points as a guide to the eye. . . . . . . . . . . . . . . . . . . . 88
xiii
4.6 Plots depicting the Arrhenius relation between the self-diffusion
coefficient and the reciprocal of temperature for (a) He, (b) CO2,
and (c) THF diffusing in COF-506-Cu. The temperature varies
from 200 to 500 K in 100 K intervals. Dotted lines denote the best
fit to the Arrhenius data. . . . . . . . . . . . . . . . . . . . . . . . . 92
4.7 (a) A simulation snapshot of COF-506-Cu from MD calculations
showing patches of porous spaces inside the weave. Color key:
Carbon in red, hydrogen in blue, nitrogen in yellow, copper in
white, boron in cyan, fluorine in green, helium in orange. (b) A
schematic depicting a probe in blue that rasters the simulation
box with atoms (in yellow) to identify the porous regions. . . . . 96
4.8 A two-dimensional representation of the probe mesh-grid used
in this study. Probes of radius 1.5 Å are depicted as grids 1 and
2 shown in colors blue and red, respectively. The mesh grid is
designed this way to ensure that no region inside the simulation
box is left untested while searching for porous spaces. . . . . . . 97
4.9 This Figure shows a small section of a bigger (actual) simulation
box. (a) A single porous sphere of radius 1.5 Å. This porous
space is identified by the probe as a volume not occupied by
any atom. (b) Next, adjacent and overlapping porous spaces are
identified, as shown in the red, yellow, blue, and green spheres.
Since the probe size chosen for this method is 1.5 Å, these identi-
fied spheres have equal size and have a radius of 1.5 Å. (c) Then
the extent of the space covered in aggregate by the overlapping
and adjacent porous spaces (found in (b)) is identified. Yellow
dots define the outer surface of this enclosed porous volume. (d)
Once this region has been identified, as in (c), a sphere is fitted
inside this space in order to calculate the radius of this spherical
pore. Radius from all such discontiguous regions are calculated,
analyzed, and plotted as a histogram in Fig. 4.10. . . . . . . . . . 100
4.10 (a) Representation of pores located inside a COF-506-Cu weave
predicted by computer simulation. (b) Pore size distribution for
the same system as illustrated in (a). Pore diameters were esti-
mated assuming spherical pores. The histogram shows that most
pores are less than 10 Angstroms in diameter. . . . . . . . . . . . 101
4.11 (a) Side and (b) top views of a simulation snapshot of the
HKUST-1 and Pd nanoparticle system. Palladium shown in blue
is placed inside a HKUST-1 crystal to mimic the experimental
setup. Color key: Carbon shown in gray, copper in brown, oxy-
gen in red, hydrogen in white, and palladium in blue. . . . . . . 106
xiv
4.12 Colors representing the pores identified by the PSD algorithm
created in this thesis and described in Section 4.2.5. Due
to the exceedingly penetrable nature of the HKUST-1 and Pd
nanocomposite, the algorithm incorrectly combines the contigu-
ous porous spaces. Therefore, in this study, we have employed
a more robust approach based on Monte Carlo sampling to con-
duct pore size calculations. . . . . . . . . . . . . . . . . . . . . . . 108
4.13 Geometric representation of the polyhedral surface mesh (shown
in white) around HKUST-1 and a 5 nm-dia. Pd nanoparticle. The
Zeo++ software uses Voronoi decomposition to identify surface
mesh and accessible volume in the system of atoms and uses this
information to calculate the pore size distribution. . . . . . . . . . 109
4.14 Comparison of pore size distribution in the HKUST-1/Pd
nanocomposite for different values (shown as Samples in the in-
set) of Monte Carlo samples per unit cell. The distribution starts
to converge as the number exceeds 5000. . . . . . . . . . . . . . . 110
4.15 Distribution of pore size analysis in the HKUST-1/Pd nanocom-
posite conducted over Monte Carlo samples of 5000 and 50000
(but not 500) per unit cell. . . . . . . . . . . . . . . . . . . . . . . . 110
A.1 t-SNE low-dimensional embedding of our six-dimensional data.
Data points are color coded based on the clusters discovered
by k-means clustering algorithm. Orange denotes the cluster
(polymorph) 1 that contains the experimentally found stable beta
phase, blue indicates cluster (polymorph) 2 that contains the ex-
perimentally found metastable alpha phase, and green indicates
the derivative polymorph of cluster 1. . . . . . . . . . . . . . . . . 123
xv
CHAPTER 1
INTRODUCTION
1.1 Advancement in Organic Electronics
Organic electronics have garnered a lot of attention from the industry and the
research community due to the advantages and versatility they offer for a wide
variety of applications. These include smartphone displays, biochemical sen-
sors, radio frequency identification (RFID) tags, flexible light sources, portable
solar cells, and curved television screens. [7–11] Use of organic semiconductors
often allows solution processability, enabling lower fabrication and energy costs
as well as large-area deposition on flexible substrates. [12] The almost limitless
possibilities of functionalizing organic materials also give easy access to be able
to functionally fine-tune their chemical structure to match desirable properties
like improved solubility in organic solvents, electronic structure design, and
chemical stability. Furthermore, innovations in the discovery of new materials
in the last thirty years have led to tremendous improvement of over six orders
of magnitude in the charge carrier mobility, a signature metric for high perfor-
mance. In all, organic semiconductors compare well to, essentially at par with,
polycrystalline silicon and metal-oxide transistors in terms of hole mobility. [13]
Although first discovered in the 1940s, a major breakthrough in the field
of organic semiconductors came in the 1960s when Martin Pope and col-
leagues studied the electronic structure of anthracene. [14] Thereafter, discov-
eries of metallic-like conductivity in tetrathiafulvalene I and tetracyanoquin-
odimethane II,14 complexes, ensured the viability of small organic semicon-
ducting molecules becoming the next generation of new materials. [7, 12, 14]
1
The organic field-effect transistor (OFET), which is an integral building block
for organic circuits, soon became an active focus of study. [15] In 1996, Bao and
colleagues [16] developed the first solution-printed high-performing transistor
(40.01 cm2 V-1 s-1), based on the use of poly(3-alkylthiophene) (P3HT). Since
then, with research breakthroughs continuing to improve the charge carrier
mobility, [17] the hole mobilities of organic thin-film transistors have exceeded
those of benchmark thin-film amorphous silicon devices [18] that range from
0.5–1 cm2 V-1 s-1.
1.2 Polymorphism in Organic Semiconductors
Small-molecule organic semiconductors typically contain conjugated π systems,
overlapping π orbitals bridging the adjacent single bonds. This arrangement
allows for delocalization of the π-electrons across all the aligned π orbitals of
the molecule. These materials are highly sensitive to intermolecular π orbital
overlap for charge transport properties. Any slight variation in the π orbital
overlap can lead to orders of magnitude difference in their ability to transport
charge.
Other dominant forces between particles arise from interactions in or-
ganic semiconductors that include van der Waals and weak electrostatic (i.e.,
quadrupole) forces which can, nonetheless, be structure-directing. These forces
are much weaker than ionic or covalent interactions and non-specific compared
to hydrogen bonding. Consequently, these materials are prone to adopting mul-
tiple crystalline packing forms at near ambient conditions, a phenomenon called
polymorphism. Polymorphism is quite prevalent in organic crystals. Kofler,
2
Kuhnert–Brandstätter, and Burger, from the Innsbruck school, indicated that ap-
proximately one-third of organic substances exhibit polymorphism under nor-
mal pressure conditions. [19] McCrone [20] gave a more expansive view; he
wrote that every compound has polymorphic forms, and that the number of
polymorphic forms, in general, is proportional to the time and money spent
in research on that compound! Many leading organic semiconductors have
been shown to exhibit polymorphism, including pentacene, [21] rubrene, [22]
TIPS-pentacene, [23] sexithiophene (6T), [24] and BTBT derivatives including
C8-BTBT [25].
It becomes critically important to control polymorphism in organic semi-
conductors since different structures often have distinctly different physical
properties, such as the preferred crystal habit, melting point, electronic, optical,
and mechanical properties. This increased interest in studying polymorphism,
specifically in the field of chemistry, material science, and physics, is evident
in the substantial increase in the number of publications from less than 200 in
1995 to more than 900 in 2014, as seen in Fig. 1.1. Control of polymorphs in the
pharmaceutical industry is a huge regulatory concern.
Charge carrier mobility is directly associated with differences in the crystal
packing, where even slight variations can lead to a significant impact on the de-
vice performance. For example, the hole mobility in TIPS-pentacene thin film
transistors was measured to be as high as 11 cm2 V-1 s-1 in its non-equilibrium
polymorph, whereas its equilibrium form yielded hole mobility on the order
of 1 cm2 V-1 s-1. [1] Given this huge impact of structural changes on the elec-
tronic properties, polymorphism can open the door to modulating crystal pack-
ing without changing the chemical composition of the compound as a means to
3
Figure 1.1: Publications on polymorphism in the fields of chemistry, materials
science, and physics from the Web of Science. Reprinted from [2]. Copyright
2016 Royal Society of Chemistry.
maximize device performance. In this way, polymorphism provides an excel-
lent tool to investigate the effects of solid-state packing on charge transport. [26]
It allows access to multiple distinctly packed states without the need for chemi-
cal modification of the parent compound and enables researchers to establish a
direct relationship between packing and charge transport. [27]
Bernstein emphasized the role of polymorphism in studying charge trans-
port in organic crystals. [28] For polymorphic systems, the only variable is their
crystal arrangement and not the chemical composition; as a result, differences
in their electronic properties can be directly related to structural differences. In
organic electronics, polymorphism is of great interest due to the direct impact of
structural arrangement on the charge transport in thin-film field-effect devices.
In these devices, the charge transport typically occurs near the dielectric surface
within a few molecular layers of the organic semiconductor. In this region near
the substrate, the molecular packing differs from the bulk phase and is known
as a substrate-induced phase (SIP). [29] SIP is a polymorphic form that is only
4
found near the vicinity of the substrate and not observed in the bulk of the crys-
tal. The mechanism behind the formation of these polymorphs is still under
investigation. “Thin-film polymorphs” often discovered in thin-film geometry
could, in fact, be a SIP or a bulk-phase polymorph that became kinetically fa-
vored in the thin-film geometry. It is important to investigate the structural
arrangement of SIPs as charge transport is significantly influenced mostly by
the crystal packing in the vicinity of the substrate. [30] However, conducting a
full structural resolution of SIPs is challenging as conventional techniques like
single crystal or powder X-ray diffraction methods cannot be applied. Instead,
a combination of methods like grazing incidence X-ray diffraction, surface-
enhanced Raman spectroscopy, and computational approaches have been used
to obtain the structural representation of SIPs. [31]
The core structure of [1]benzothieno[3,2-b][1]-benzothiophene (BTBT) has
emerged as a promising template for functionalization to generate high-
performing air-stable organic semiconductor molecules. C8-BTBT, in particular,
has demonstrated an average hole mobility between 3–16 cm2 V-1 s-1 with an
ultra-high mobility of 43 cm2 V-1 s-1. [25, 32] Introduction of solubilizing alkyl
chains of various lengths in the core’s long-axial direction has enhanced lateral
inter-molecular interactions. [33] This also reduced inter-molecular distances
and facilitated rapid growth of crystal and easier fabrication of polycrystalline
thin films from solution.
Yuan et al., [25] isolated a metastable form of C8-BTBT through an off-center
spin-coating method and fabricated thin-film transistors as compared to the sta-
ble form obtained via drop-casting. For C8-BTBT, many variables, including the
degree of alignment in the film, film thickness, the nature of inter-grain connec-
5
tivity, and metastable molecular packing, impacted its charge transport proper-
ties. In particular, the authors pinpointed the role of crystal packing on charge
transport to be challenging. They discovered a large drop in hole mobility from
26 cm2 V-1 s-1 to below 4.1 cm2 V-1 s-1 after relaxation of the metastable struc-
ture, but the process disrupted the crystal alignment. The crystal structure of
the metastable form was difficult to isolate in its pure phase.
1.3 Organic Materials for Storage Applications
Next, we examine organic materials for their applications in gas uptake and
storage. Increasing global demands for clean and high-density gas storage con-
tinue to push storage technologies like batteries and super-capacitors to their
limits. These technologies form the current most promising electrical energy
storage methods due to the ease of their portability and compact size for on-
demand usage. However, despite their advantages, the chemical and physical
limitations of existing building-block materials pose a challenge to improve-
ment in performance and require new, creative solutions. For example, con-
ductive polymers offer cheap, scalable, and functionally tunable materials but
lack chemical and physical stability for device implementation. In contrast, in-
organic materials like silicon and metal oxides are robust and offer redox-active
sites. However, slow-moving ion diffusion restrains charge/ discharge rate ca-
pabilities and leads to mechanical instability. These limitations have triggered
the development of new materials with advanced energy storage capacities.
6
1.3.1 Covalent & Metal Organic Framework Materials
Covalent and Metal-Organic Frameworks (COFs and MOFs) are emerging as a
promising class of materials to meet the growing need for next-generation en-
ergy storage technologies. They have garnered a great deal of attention due
to the enormous potential in the design space they offer in terms of the spatial
assembly of molecular organic building threads. [34, 35] With porous networks
built from organic linkers, these materials allow positional control over build-
ing blocks in two or three dimensions, unlike polymers. This allows for a fine-
tuning of chemical and physical properties and selective synthesis with high
regularity. The nanoscale confinements and networks of porous channels and
spaces created in the COF and MOF scaffolds present ideal sites for solute gas
storage, as well as separation and delivery tasks. Active sites and large surface
areas are useful for catalysis and sensing applications. Additionally, the regular
connectivity of the organic linkers makes these materials strong candidates for
applications in charge carrier transport, including optoelectronics and electro-
chemical energy storage.
With judicious selection of linkers and metal nodes in the case of MOFs,
these materials offer exceptional ability to impart functionality. Multiple re-
views have highlighted the abundance of available synthetic routes to modulate
chemical structure, stability, pore size distribution, and flexibility of the frame-
work. [36, 37] Certain crystallographic phases and size/morphology of MOFs,
in particular, can be controlled to modify the surface chemistry. [38] The abil-
ity to fine-tune chemical and physical properties is a defining strength of these
framework classes of porous materials, as it provides precise control over solute
molecules’ chemistry for energy storage applications. Additionally, the electro-
7
chemical stability of framework materials can be improved by a judicious selec-
tion of synthetic parameters. For example, MOFs made of redox-inactive nodes
and smaller but more rigid linkers demonstrate greater thermal and chemical
stability. In contrast, high porosity and flexibility enable greater ion storage
and transport. Mechanical properties in these materials can be further tuned
by selective use of flexible linkers, tuning of host-guest interaction strength, the
introduction of multiple metallic nodes in case of MOFs, and control of crys-
tal size. [39, 40] For example, the use of these porous materials as electrodes,
structural rigidity is necessary to avoid dendrite formation and prevent short-
circuiting. On the other hand, malleability would be important in applications
where the active material undergoes volume changes in order to maintain struc-
tural integrity in devices.
1.4 Dissertation Outline
In this thesis, we study organic systems with two main objectives:
• To investigate and predict in advance polymorphic behavior in organic
semiconductor systems.
• To uncover atomistic-level phenomena responsible for the extraordinary
storage uptake and diffusive behavior of solute molecules in Covalent and
Metal-Organic Framework materials.
We address these open questions using computational techniques to probe the
underlying atomistic mechanisms responsible for the observed phenomena. In
Chapter 2, we investigate the occurrence of polymorphism in diTMS-BTBT, a
8
high-performing organic semiconductor and a member of the BTBT family. We
developed a novel Machine Learning-based approach that combines atomistic
simulations to accelerate the prediction of polymorphs in advance. We show
that our approach is significantly computationally superior in terms of effi-
ciency with reasonable accuracy compared to more expensive methods com-
monly used to conduct such studies. These studies then lead into Chapter 3,
where we extend our problem statement even further, more specifically to un-
derstanding the molecular-scale mechanism behind the polymorphic transition.
We study transition mechanisms in ditBu-BTBT and diTMS-BTBT (both mem-
bers of the BTBT family) crystals and establish a molecular design rule to access
both nucleation and growth and cooperative polymorphic transition pathways
in organic crystals. In Chapter 4, we broaden our investigation to explore or-
ganic framework materials for their use in storage applications. We investi-
gate their structural morphology and elucidate how that influences the diffu-
sion of solute molecules through these porous materials. Finally, in Chapter
5, we present our final conclusions and the accomplishments of this thesis and
discuss possible future work.
9
CHAPTER 2
A BAYESIAN OPTIMIZATION-ACCELERATED ROUTE TO PREDICTING
POLYMORPHISM IN SMALL-MOLECULE ORGANIC
SEMICONDUCTORS
Determining the most stable structures of polymorphs for an organic semi-
conductor and any associated kinetically metastable structures is a difficult
process experimentally; energy differences between these structures are of-
ten well below kT. Computational predictions also involve a slow, painstak-
ing process since there are typically six or seven variables (unit cell param-
eters and processing conditions) to consider in determining the structure. In
this study, we demonstrate the predictive capability of Bayesian Optimization
techniques to shortcut this process. Bayesian Optimization builds a predictive
model based on the relationship between a material’s structure and its prop-
erties and guides the choice of material design using as few “experiments”
as possible. Here, we used such a machine learning technique to quickly
identify all low-energy structures of a high-performing organic semiconductor,
bis(trimethylsilyl) benzothieno[3,2-b][1]1benzothiophene (diTMS-BTBT). This
down-selection process allowed us to perform expensive Molecular Dynamics
simulations for only a small fraction (15-20%) of the entire set of possible can-
didates (comprising over 500 structures). Our approach identified both of the
experimentally known polymorphs found by Chung et al. [3]. It also suggested
the likelihood of a third polymorph, which has not yet been observed exper-
imentally. This approach represents a fast and efficient means to predict and
screen thermodynamically stable and metastable polymorphs of organic semi-
conductors. Indeed, the method is general enough to be equally applicable to
other polymorphic systems.
10
2.1 Introduction
Commercial applications of organic semiconductors date as far back as the 1980s
when they found use as photo-receptor materials for photocopiers and laser
printers. More recently, they have been used in nanoscale electronics, alterna-
tive energy sources, low-cost and large-area flexible plastic circuits, displays,
and disposable sensors [41–43]. This growing interest in organic semiconduc-
tors stems from the ability to deposit thin films on a variety of low-cost sub-
strate materials and tune their mechanical and electrical properties, especially
charge mobility, by tailoring their molecular structure. As a materials class,
organic semiconductors have demonstrated acceptably good hole mobilities,
2 2
greater than ∼ 10 cmV s [32, 44] (c.f. that for crystalline silicon ∼ 450
cm
V s and for
2
amorphous silicon ∼ 1 cmV s ). However, a major difficulty in deploying organic
semiconductors is their predilection to pack into multiple, structurally distinct
crystal structures, a phenomenon known as polymorphism. A review by Brog,
Chanez, Chrochet et al. [45] emphasized the importance of identifying polymor-
phism in a broader class of materials, including organic compounds and inor-
ganic/metallic materials, not only in the context of science but also to control the
structure of the compound to guarantee its properties. In organic semiconduc-
tors, these polymorphs may have widely differing electrical or other properties,
potentially threatening the reproducibility of their synthesis and the reliability
of the devices they form.
Identifying and controlling which organic electronic polymorph will be
present under a given set of processing conditions is critically important since
any slight variation in π-orbital overlap can lead to orders of magnitude dif-
ference in charge carrier mobility. Despite this importance, very few experi-
11
mental studies exist that investigate the origin and mechanism behind poly-
morphism. [1,46–48] Major challenges arise from the low energy barriers, ultra-
fast kinetics, and structural reversibility involved. But the incentive to obtain
a deeper understanding of the link between processing conditions (e.g., strain
and temperature) and energetically preferred structure lies in the promise that
this could open the door to stabilize metastable polymorphs into a particularly
high mobility polymorphic structure, or one that maximizes (or minimizes) any
other desired property.
We chose to predict all the polymorphs of single crystals of a par-
ticularly high-performing p-type organic semiconductor, bis(trimethylsilyl)
benzothieno[3,2-b][1]1benzothiophene (diTMS-BTBT). [49] This is a member of
an interesting class of benzothiophene molecule organic semiconductors whose
bulky end-groups can be altered to tune the crystal packing of the molecule.
[49–51] We conducted a detailed computational study to predict stable and
metastable polymorphs of diTMS-BTBT, which was found by the Diao group
2
at UIUC to exhibit remarkably high mobilities [49], ∼ 7.1 cmV s , and polymorphic
behavior. [3] Figure 2.1 shows the two experimentally identified polymorphs
they identified, and Table 2.1 lists their lattice parameters. We will use this in-
formation as validation of the predictive capability of our calculations. To re-
iterate, this experimental information was not used in the identification of the
computational predictions of the polymorphs. This is a pure prediction from
the Bayesian optimization algorithm.
A traditional experimental approach to “materials discovery” typically in-
volves synthesizing new materials and using chemical intuition and experience
to improve upon the molecular design by testing the physical properties of a
12
13
(a)
(b) (c)
Figure 2.1: (a) The diTMS-BTBT molecule featuring two bulky tri-methyl side-
chains. (b) and (c) depict molecular arrangements of atoms inside periodic unit
cells of two experimentally identified polymorphs of diTMS-BTBT: (b) the al-
pha phase is the themodynamically metastable one, while (c) beta phase is the
thermodynamically stable polymorph. These polymorphs are characterized by
a six-dimensional vector x = {a, b, c, α, β, γ}where a, b, c, α, β, γ are lattice param-
eters whose values are shown in Table 2.1. Data is obtained from Chung et al. [3]
Periodic boundary conditions are used on the simulation cell to emulate the ex-
perimental situation by modeling an infinite crystal. Color key: Carbon in blue,
sulphur in yellow, silicon in gray, and hydrogen in white.
14
Lattice Constants Alpha Phase Beta Phase
a (Å) 7.55 6.35
b (Å) 10.10 11.36
c (Å) 13.59 14.31
α (°) 82.75 87.65
β (°) 82.17 90
γ (°) 88.52 90
Table 2.1: Lattice dimensions a, b, c, are given in Å; angles α, β, γ are given in
degrees.
tested candidate. Computationally, this task is also slow and tedious. It essen-
tially requires us to find the combination of candidate polymorphs (optimiz-
ing six dimensions of unit cell lengths and angles for a given temperature) that
leads to the lowest energy structure, which we identify as leading to the most
thermodynamically stable of the roughly 500 possible combinations of lattice
parameters. Without using some sort of accelerated search process, this would
necessarily have to involve a systematic search of all possible lattice parameters.
This is very wasteful in terms of time, effort, and resources. Although a number
of accelerated methods exist, and are popular, including random search [52,53],
quasirandom search [54, 55], simulated annealing [56], Monte Carlo parallel
tempering algorithms [57], basin hopping [58], minima hopping [59, 60], and
the evolutionary algorithm [61,62], finding the global energy minimum in large
complex systems remains a major challenge in the field of structure prediction.
In this study, we used a combination of Molecular Dynamics (MD) and
Bayesian Optimization (BayesOpt), the latter a well-established global opti-
mization technique, to narrow the search space to a tractable number of promis-
ing options without changing the fundamental nature of the problem [63]. Ma-
chine Learning models, like BayesOpt, are being increasingly applied, in combi-
nation with molecular simulation and other computational methods, as a means
to predict material properties [64,65], model interatomic potentials [66,67], pre-
dict crystal and lattice structures [68] and understand property-structure rela-
tionships in amorphous materials [69].
Closest to our study, a recent paper by Yamashita et al. [70] studied the effec-
tiveness of a BayesOpt-based approach for crystal structure prediction in two
systems, NaCl and Y2Co17 whose crystal structures were known. They demon-
15
strated that their BayesOpt method was able to identify the rock salt structure
as the most stable arrangement for NaCl in 30-40% fewer trials, compared to a
purely random search. In contrast to their work, we do not have a repository of
available crystal structures that diTMS-BTBT would like to adopt; instead, our
goal is to predict them a priori. A Bayesian paradigm works best for such cases.
It encodes our intuition with respect to the values that the unknown quantity
of interest is going to adopt in the so-called prior distribution, and then uses
this “prior,” together with training set information, to make predictions of the
unknowns. To our knowledge, this is the first time that a BayesOpt method has
been employed to predict the crystal structures of organic semiconductors and
can be straightforwardly extended to study any molecular systems in which the
existence of polymorphism is suspected to be a prevalent phenomenon.
2.2 Methods Employed
2.2.1 Molecular Dynamics
Molecular Dynamics (MD) is a computational technique to simulate the dynam-
ical evolution of atoms and molecules in space and time. The trajectories of
these interacting species are determined by numerically solving Newton’s equa-
tions of motion, where forces between the particles and potential energy are
described by molecular mechanics force fields. Solving these trajectories also
allows us to determine macroscopic thermodynamic properties of the system
based on the principle of ergodicity, which states: the statistical ensemble aver-
ages are equal to time averages of the system collected in a simulation. There-
16
fore, MD is a deterministic simulation technique where trajectories of the atoms
are completely determined given an initial set of atomic positions and veloci-
ties. This way, MD has emerged as a powerful tool for predicting macroscopic
properties of a system by allowing insight into molecular motion at an atomic
scale.
MD solves for Newton’s laws of equation, most notably:
Fi = miai (2.1)
for each atom (i) in a system of N atoms. Here, mi is the atom mass, ai its
acceleration, and Fi is the force acting upon the atom due to the interactions
with other atoms. In order to calculate the forces acting on the atoms, a suit-
able model representing potential energy as a function of atomic positions of
the form V(r1,r2,. . . rN) must be defined. This function is translationally and
rotationally invariant and is dependent on the relative position of atoms with
respect to each other. The functional forms of these potentials are provided by
previously published force field models. [71–73] Forces can then be derived as
gradients of the potential with respect to atomic displacements:
Fi = −∇riV(r1, ....., rN) (2.2)
Of the various available time integration algorithms in MD, the Velocity Verlet
algorithm is the most popularly used method. This algorithm derives the posi-
tion next in time by adding two third-order Taylor expansions for the positions
r(t), one forward and one backward in time as:
1 1
r(t + δt) = r(t) + v(t)∆t + a(t)∆t2 + s(t)∆t3 + O(∆t4) (2.3)
2 6
1 1
r(t − δt) = r(t) − v(t)∆t + a(t)∆t2 − s(t)∆t3 + O(∆t4) (2.4)
2 6
17
Eqs. 2.3 and 2.4 are added to get the most basic representation of Verlet algo-
rithm:
r(t + ∆t) = 2r(t) − r(t − ∆t) + a(t)∆t2 + O(∆t4) (2.5)
Here, a is the acceleration, v is velocity and s the third derivative of r with
respect to time, t. Acceleration is defined using Eq. 2.2 as:
−∆V(r(t))a(t) = (2.6)
m
Although MD is a powerful technique to simulate thermodynamic proper-
ties of bulk systems, it is limited in terms of the length and time scales that the
system can access due, essentially, to constraints imposed by the computational
resources to accommodate the tiny time steps in the simulation.
The way bulk crystals and very large systems are simulated in MD is by in-
voking the concept of “periodic boundary conditions” (PBC). In PBC, the sim-
ulation box is replicated indefinitely by rigid translation in all three Cartesian
directions. If a particle is located at position r in the box, then this particle as-
sumes positions at r + la + mb + nc, where l, m, n are integers and range from
-∞ to ∞, and a, b, c are the vectors associated with the edges of the box. All
these “image” particles move together, and only one is represented in the MD
simulations, thereby cutting down the computational cost while simulating an
infinite system. Here, each particle not only interacts with other atoms in the
current box but also with their images in nearby boxes. This way, PBC virtually
eliminates surface effects in the system and also makes the forces invariant to
the translation of the box with the positions of the particles.
18
2.2.2 Bayesian Optimization
Bayesian optimization is a powerful technique for solving the extrema of so-
called “black box” objective functions that are expensive to evaluate. This tech-
nique becomes especially useful when there is not a closed-form expression
for the objective function, but where one can evaluate this function at sample
points, possibly with noise. It is particularly advantageous when evaluation of
the objective function is costly, and its functional form is non-convex. It is one
of the most efficient approaches used in the Machine Learning community in
terms of the number of function evaluations required. [4, 74, 75] This is due to
its ability to incorporate prior belief about the objective problem, which helps
in directing future sampling, and for trading off between exploration and ex-
ploitation of the search space. It is built on “Bayes’ theorem”, which states that
the posterior probability of a model M, given data, E, is proportional to the like-
lihood of E given M times by the prior probability of M:
P(M|E) ∝ P(E|M)P(M) (2.7)
In Bayesian optimization, the prior probability encodes our belief about the
functional space of possible objective functions. It essentially encodes prior
knowledge on properties like function smoothness, which makes some poten-
tial objective functions more plausible than others.
Let f(xi) be the observation of the objective function at sample point, xi. As
more data is collected, D1:t = {x1:t, f(x1:t} (where t denotes sequences of data), the
likelihood function becomes P(D1:t|f ). The likelihood function asks, given our
prior, how likely is the data we have observed? If our prior belief encodes that
our objective function is a smoothly varying function with no noise, then the
19
observations with high variance and oscillations are considered less likely than
those which narrowly deviate from the mean. Next, we can combine our prior
and likelihood function to calculate our posterior as:
P( f |D1:t) ∝ P(D1:t| f )P( f ) (2.8)
The posterior updates our beliefs about the unknown objective function.
Next, to efficiently direct future sampling, Bayesian optimization uses a so-
called “acquisition function” to determine the location of xt+1 to sample next. It
determines this point in sample space through an intelligent trade-off between
exploration (where the objective function is very uncertain) and exploitation
(where the objective function is expected to be high (or low), depending on
whether the algorithm is maximizing or minimizing the objective). This trade-
off, along with the prior feature of Bayesian optimization techniques, aims to
reduce the number of evaluations of the objective function. Moreover, this also
assists in setting when the objective function is non-convex with multiple local
maxima or minima. More details of the prior, posterior, and acquisition function
are discussed in section 2.3.
2.3 System and Workflow Details
In this study, the objective function aims to identify sets of unit cell characteris-
tics (optimizing six dimensions of unit cell lengths and angles) that lead to the
lowest energy of the simulation cell, (y). Traditionally, each combination of unit
cell characteristics is subjected to an energy minimization simulation run fol-
lowed by a long NVT run. The energy corresponding to each candidate choice
20
is then collected to generate an energy landscape, whose minima typically cor-
respond to lower energy or stable polymorphs.
Figure 2.2 describes the overall iterative optimization process. Step 1 in-
cludes generating training set data. Step 2 performs MD simulations on the
training set. Step 3 models the total energy function using Gaussian process
regression utilizing the data acquired thus far and generates predicted energy
distribution. Step 4 recommends the next design setting to test (x) by optimizing
an “acquisition function” derived from the Gaussian process. Simulations are
performed on the suggested new material design, and the corresponding total
energy value (y) is collected. This new observation pair, (x, y), is appended to
the data set. Steps 2-4 are repeated in an iterative loop until the uncertainties in
the energy fall below pre-determined, user-defined values. Steps 1, 3, and 4 are
described in the following sections.
Step 1: Training Set Generation
As noted above, design of the diTMS-BTBT crystal unit is parameterized by a
6-dimensional vector x = [a, b, c, α, β, γ]. We constructed the computational de-
scription of training set unit cells by randomly selecting values for a, b, c, α, β, γ.
We constrained the values of these six lattice parameters to stay within the
range of the lattice parameters of the two experimentally known polymorphs of
diTMS-BTBT, as shown in Table 2.1, while also allowing a ± 2 units “safety” in-
terval in case the computational predictions fell outside the experimental ones.
Note that these constraints only help aid the speed of the computational search,
but are not necessary to find the characteristics of the polymorphs.
21
22
Start
Step 1: Gener-
ate training set:
x = [x1, x2, x3. . . ..xn]
Step 2: Conduct
MD simulations
xi, Ei
Step 3: Perform Gaus-
sian process regression
Energy distribution
predicted on the test set
≥ E Step 4: Acquisition
Errors in prediction error function suggests next
best design to test, xk
< Eerror Bayesian Model
Stop
Figure 2.2: Flowchart describing the process that combines MD and BayesOpt
to make a predictive distribution of the energy for each candidate structure. The
algorithm starts by randomly creating a set of input parameters, x. Structures
characterized by these inputs, x, are then sent as input to the MD simulations.
The output from these MD runs, in the form of the total energy (of the training
set), is used to train the Bayesian model, which calculates the posterior distribu-
tion for the energy function. If the errors in the prediction are less than Eerror (a
user-defined value), this process stops. Otherwise, this predictive distribution
suggests the next best candidate to test based on a criterion known as “expected
improvement.” [4]
To determine the location of any other possible polymorphs, we scanned
the energy landscape as we varied the lattice constants a, b, c, α, β, γ in a non-
systematic manner as dictated by BayesOpt. We explored the effect of varying
these six parameters without assuming any prior knowledge of the locations of
the polymorphs. A total of about 500 combinations of test and training points
were generated: We used 100 points to train the model to be able to make pre-
dictions for the remaining 400 points. Training unit cell candidates, each with
one choice of [a, b, c, α, β, γ] parameters, were submitted to an energy minimiza-
tion calculation and an MD simulation using NVT ensemble (Step 2) to train
the Bayesian model. The total energy of each MD-generated structure, and the
corresponding unit cell structures, were then used as a training set to train our
Bayesian model.
Step 2: Molecular Dynamics Simulations
In our a priori prediction of the structure of polymorphs of organic crystals, our
approach essentially involved providing a suitable inter-and intra-molecular
potential for diTMS-BTBT molecules and then using this potential (force field)
to predict the energy of the system as the dimensions of the unit cell are var-
ied (unsystematically) under the control of the BayesOpt method in order to
discover low-energy structures. We modeled diTMS-BTBT molecules using
Allinger’s MM3 force field field [76] since we have found MM3 to work well as
a model of small organic semiconducting molecules. [1] Indeed, that paper fea-
tured MD simulations performed using an open-source software package called
TINKER [77] to predict polymorphs of TIPS-pentacene, but without using an
accelerated search method. The MM3 force field is not available as a force field
23
option in LAMMPS [78], so we again used TINKER for our test case here. We
employed a canonical (NVT) ensemble, in which the number of molecules N,
volume V, and temperature (set at T = 428 K) were kept constant. This temper-
ature was chosen since experiments showed that both the α and β polymorphs,
and the transition between them, occur at ∼428 K. Periodic boundary conditions
were used in all three Cartesian directions. A unit cell comprised of 96 atoms
connected across the periodic boundaries to approximate modeling an infinite
crystal and hence emulate the larger scale of experimental studies. All the MD
simulations were run for 5 ns with a time step of 1 fs (5 million iterations). Each
simulation was allowed to equilibrate for 2 ns, while production data were col-
lected from the last 3 ns. This length of time for data collection was sufficient to
reduce the error in the energy of the unit cell to approximately ±0.5 kcal/mol,
which is small in comparison to a typical magnitude of the energy of the unit
cell relevant to our study (∼ 270 kcal/mole).
Bayesian Optimization
Step 3: Gaussian Process Regression
Here, the unknown quantity of interest is the total energy value associated with
each polymorph candidate. Since energy is a function of the spatial arrange-
ment of atoms inside the crystal cell, we denote it as f (x), corresponding to a
design parameter, x. The goal here is to find a set of designs, x, for which f (x)
is small, which is essentially our objective function. However, in reality, our
observations are often obscured by noise. Therefore, we extend our objective
24
function to model observations of the form:
y(x) = f (x) +  (2.9)
where the error ε is normally distributed, with mean 0 and variance λ2. λ2 is
treated as one of the hyperparameters, and is learned adaptively in the process.
BayesOpt is an excellent tool to optimize expensive and complex objective
functions and performs inference by placing a prior probability distribution on
unknown quantities of interest. This prior distribution encodes our intuition or
domain expertise regarding the most likely values that the unknown quantity of
interest will adopt. A common method to model an unknown function, y (used
in this work), is using a Gaussian process as a prior, that is, y ∼ GP(0,K(x, x)),
such that the prior probability of function values (y1, ......., yn) is a multivariate
Gaussian as given in Eq. 2.10: ( )
| | −1p(y ) = 2πK exp yT −11:n 1:n K y1:n (2.10)2
where y1:n are the energy values computed via MD simulations and corre-
sponding to the design parameters, x1:n. K is a “kernel” function that measures
the similarity between two data points based on their locations in the parameter
space. It should be able to encode the belief that points xi and x j, near to each
other, must have similar values of f (xi) and f (x j), and vice versa. We chose a
Squared Exponential kernel as it performs well in modeling smoothly varying
functions. It takes the form shown in Eq. 2.11 and is employed in this work to
compute the distance/similarity between data points.

k(xi, x j) = φ2 exp ∑d (x 2i,k − x j,k)  (2.11)
k 1 2l
2 
= k
25
where lk is the characteristic length scale associated with each dimension
of input x and defines the region of influence of a point within the parame-
ter space. φ2 is the output variance and controls the expected deviation of the
function from its average value. This way, our kernel is parameterized by d + 1
hyperparameters [φ2, l1, ...., ld].
Next, we compute the inference or “posterior distribution” on the unknown
target values yn+1 at test points xn+1. We do this by employing Bayes’ rule on the
observed data and prior distribution function [79] such that it takes the form
shown in Eq.2.12.
( )
y 2n+1|y1:n ∼ N µn+1(xn+1| x1:n, y1:n) σn+1(xn+1| x1:n, y1:n)
µn+1 = kT (K + λ2I)−1 y1:n
(2.12)
σ2 T 2 −1n+1 = k(xn+1, xn+1) − k (K + λ I)
k = [k(xn+1, x1), .......k(xn+1, xn+1)]
From this calculation, we obtain a posterior mean (the prediction), called µn+1,
and a posterior standard deviation, σn+1.
Step 4: Acquisition function
In order to improve the quality of the prediction, BayesOpt guides the choice
of experiments and suggests a new design parameter (x) to test using an acqui-
sition function. This function trades-off exploitation of the high predicted mean
with exploration using a high predicted variance. We have used the well known
Expected Improvement method [80] as our acquisition function. It defines the
improvement over the current best value using the form shown below:
26
[
f (x) − f ∗ ( ) ( )]n f (x) − f ∗ f (x) − f ∗EI(x) = σ(x) φ n n+ + ϕ (2.13)
σ(x) σ(x) σ(x)
where f ∗n is the best observed value, f (x) is the predicted value, and φ and ϕ
are standard Gaussian cumulative distribution functions and probability distri-
bution function, respectively.
We used a BayesOpt-based pySMAC optimizer [81] to maximize the acqui-
sition function such that the resulting maximum is the next best recommended
simulation setting.
2.4 Results
2.4.1 Polymorph Prediction
Our overarching task is to determine how to identify the more stable poly-
morphs. We begin with an investigation of predicting the energies of various
posited structures in order of decreasing energy. Figure 2.3 shows the result of
our predicted energy distributions sorted in order of increasingly higher energy
for candidate structures of diTMS-BTBT. Each point on the x-axis represents
some combination of a, b, c, α, β, γ and its corresponding energy value is plot-
ted on the y-axis. We start with ∼ 20 training points and iterate our model 100
times, optimizing hyperparameters after every 20 iterations. At each of these
100 iterations, we include a new point, as suggested by the acquisition function.
Our model made predictions for all the ∼500 candidate points using a small
training set of ∼120 candidate structures (i.e., running only 120 expensive and
27
time-consuming simulations). Taking into consideration that we prepared an
exhaustive space of as many as 500 structures, this result can be seen to be very
efficient, finding the optimal solution using less than a quarter of the search
space.
We used estimates of the standard deviation in our results for predicting
energy values associated with the design parameters to provide us with a mea-
sure of the accuracy of our results. In Figure 2.4, we plot a histogram to study
which values of uncertainties in the predicted energies are more likely to oc-
cur. Our analysis revealed that the majority of the standard deviations in the
energy lie roughly within the range of 1-2 kcal/mole, although a few percent
exhibit standard deviations of ∼3 kcal/mol. Deviations of just 1-2 kcal/mole
are comparable to intrinsic errors within MD itself and are small in comparison
to the magnitude of total energies generated from MD relevant to this system.
This was a satisfactory and encouraging test of the accuracy of the BayesOpt-
generated predictions.
As a second test to validate our results, we performed a standard “leave one
out” or “n-fold” cross-validation. In this technique, we iterate through all the
data points, x1:t, y1:t, and for each i  {1, ..., t}, we train our BayesOpt model on all
of the data points except xi, yi, and then use this information, together with xi, to
predict what the value of yi should be. Hyperparameters were then re-estimated
and optimized each time, leaving out the data point (xi,yi). Figure 2.5 is plotted
with actual energy values on the x-axis, and predicted values (with error bars)
on the y-axis. In order to make the comparison between the actual and predicted
values, we ran MD simulations on all the data points to get the actual energies.
Figure 2.5(a) shows that the fit between actual and predicted points is excellent,
28
29
290 Training Points
BO
285
280
275
270
265
0 100 200 300 400 500
All Possible Combinations: x
Figure 2.3: Predicted distribution of the total energy for all possible candidates
of diTMS-BTBT polymorphs. The solid blue line represents energy values pre-
dicted using the BO. The black squares indicate candidate points that were used
to train the model. The blue shaded region represents one standard deviation
above and below the BayesOpt-predicted values.
1.0
0.8
0.6
0.4
0.2
0.00 1 2 3 4
Standard Deviation(Kcal/mol)
Figure 2.4: Probability distribution of the standard deviations observed in the
values of energy for diTMS-BTBT polymorphs shown in Figure 1.3. The sum of
all the bar heights equals 1.0
Total Energy (Kcal/mol)
Probability
30
290 R2 = 0.9908
285 RMS = 0.754 kcal/molMAE = 0.528 kcal/mol
280
275
270
265 n-fold
265 270 275 280 285 290
Actual from MD runs (Kcal/mol)
(a)
290 R2 = 0.9892
285 RMS = 0.812 kcal/molMAE = 0.579 kcal/mol
280
275
270
265 4-fold
265 270 275 280 285 290
Actual from MD runs (Kcal/mol)
(b)
Figure 2.5: Diagnostic plots obtained via (a) ‘leave one out or n-fold’ and (b) ‘4-fold’
cross-validation. The x-axis contains the energy values directly obtained from
MD simulations. The y-axis denotes values predicted from the Bayesian model.
The vertical bars indicate the error (one standard deviation) in predicting the
energy values. The correspondence of actual to predicted energies shows that
our model works well, with a low root mean squared error, around 0.75 - 0.80
kcal/mol and mean absolute error less than 1 kcal/mol.
Predicted from BO (Kcal/mol) Predicted from BO (Kcal/mol)
with very small values for the root mean squared (RMS) error (0.75 kcal/mol)
and mean absolute error (MAE) (0.53 kcal/mole). The R2 value, between actual
and predicted curves, is an encouraging ∼ 0.97.
We also performed “S-fold” cross-validation with S=4, see Figure 2.5(b)). It
involves taking the available data and partitioning it into S groups (in the sim-
plest case, these are of equal size). Then S-1 of the groups is used to train a set of
models that are then evaluated on the remaining group. This procedure is then
repeated for all S possible choices for the held-out group, and the performance
scores from the S runs are then averaged. S-fold cross-validation is a less biased
way of estimating how the model is performing in making predictions on data
(not used during the training of the model) using a limited sample set.
2.4.2 Identifying Polymorphs
Our model predicts the existence of a vast number of possible crystal-packing
arrangements following a comprehensive sampling of different unit cell sizes in
six-dimensional space. Our algorithm suggests that most of these crystals are
roughly within 1 kcal/mol of each other. This is due to sampling all energet-
ically relevant conformational isomers for the flexible molecule, diTMS-BTBT.
To isolate groups of structurally similar polymorphs, we used a clustering al-
gorithm called “k-means clustering,” which is one of the most popularly used
unsupervised machine learning algorithms. [82, 83] This approach provides a
way to group similar data points together and discover underlying patterns in
the data. This grouping of structurally similar crystals essentially represents an
ensemble of candidates that collectively signify a “polymorph.” That is, many
31
candidates may be so structurally similar that they cannot be separated experi-
mentally.
Unsupervised machine learning methods draw inferences from the data
sets by using information from only the input vectors and without referring
to known, or “labelled,” outcomes. This is what we wanted to achieve in our
system, namely, to make groupings of polymorphs that were structurally simi-
lar without taking into account the associated energy values. k-means cluster-
ing achieves this by looking for a fixed number (k) of clusters in the data set.
Here, a cluster refers to a collection of crystal candidates aggregated together
because of their structural similarities. k-means clustering is a vector quanti-
zation method [84] that partitions n data points into k clusters, where each data
point gets assigned to the cluster with the nearest mean (cluster centroid). These
centroids serve as prototypes of the cluster. This partitioning results in the seg-
regation of the data set into Voronoi cells. These Voronoi cells depict regions in
the data plane that are partitioned on the basis of the similarity between each
region. k-means clustering takes k as input, which refers to the number of cen-
troids needed in the dataset. It then minimizes within-cluster variances (as the
sum of squared Euclidean distances between each member of the cluster and its
centroid) as its “cost function” and allocates each data point to the nearest cen-
troid. In other words, the k-means algorithm identifies k number of centroids in
the data space, and then allocates every data point to the nearest cluster, while
keeping the variances within the cluster as small as possible.
One of the disadvantages of k-means clustering is that the number of clusters
or centroids (k) needs to be supplied beforehand. One way to solve this problem
is to visualize the data and guess (visually) at the data points in order to see
32
cluster patterns. But since, in most cases, real-world data are typically multi-
dimensional, it becomes difficult to visualize this and, as a result, the optimum
number of clusters is no longer an obvious choice. A more mathematical way to
approach this problem is to plot the relationship between the number of clusters
with respect to its cost function. This is called a “Within Cluster Sum of Squares”
(WCSS). It allows us to select the most suitable value of the number of clusters
as being given by the point on the plot at which the change in WCSS begins to
level off. This method is called “elbow method” due to the typical shape of this
graph.
We use the elbow method to determine the optimal number of clusters for
our system. Figure 2.6(a) shows that the cost function starts to level off as the
number of clusters increases from 2. When we changed the cluster value from
1 to 2, the cost function value reduced very sharply. This decrease in the sum of
squares reduces and eventually reaches “equilibrium” as we increase the num-
ber of clusters from 4 onwards. The cluster value where this change in cost
function becomes constant is typically chosen as the appropriate cluster value.
For our system, this becomes a choice between 2 and 3 as the correct number of
clusters.
As a second test, we used a method called a “silhouette score,” based on the
“silhouette coefficient” as a means to justify our choice of the number of clus-
ters. [85] The silhouette coefficient measures the separation distance between
the resulting clusters. It is calculated using the average distance between each
data point i and all the other data points in the cluster to which the data point i
belongs, represented as (a). The mean distance between i and the nearest cluster
that the point i is not a part of is denoted as (b). The silhouette coefficient is then
33
34
16000
14000
12000
10000
8000
6000
4000
2000
1 2 3 4 5 6
Number of clusters
(a)
2
1
0.0 0.2 0.4 0.6 0.8 1.0
3
2
1
0.0 0.2 0.4 0.6 0.8 1.0
Silhouette Coefficient Values
(b)
Figure 2.6: (a) Use of the elbow method to determine the appropriate number
of clusters in the data set. The plot shows the variation within the cluster as
a function of the number of clusters. The elbow of the curve is located at the
point where the variation starts to level off and represents the optimal number
of clusters to use. (b) “Silhouette analysis” is used to measure how similar a
data point is to its own cluster (“cohesion”) compared to other clusters (“sepa-
ration”). Black dotted lines show the average silhouette score.
Cluster Labels Cluster Labels Sum of Squared Distance
calculated as:
(b − a)
(2.14)
max(a, b)
The silhouette plot provides a quantitative measure of how close data points
in one cluster are to points in the neighboring clusters—the value of the coeffi-
cient ranges from -1 to 1. A coefficient value of near +1 indicates that the data
point in that cluster is far away from the neighboring clusters. A value close
to 0 indicates that the data point is on, or very near to, the decision boundary
between the neighboring clusters. Negative values indicate that the data point
has been assigned to the wrong cluster.
The elbow method suggested that the best choice for the optimum number
of clusters in our study should be either 2 or 3. We conducted a silhouette anal-
ysis to investigate this observation further. Figure 2.6(b) shows that, for cluster
size 2, the silhouette analysis is unequivocal in clustering the data. This is mani-
fested due to the high silhouette score of 0.71 (shown as a black dotted line in the
figure) and the presence of both clusters above this average score. The thickness
of the silhouette plot can be used to visualize the cluster size. If we choose the
number of clusters to be 2, our analysis shows that cluster number 1 is roughly
twice the size of cluster 2. We hypothesize here that cluster 1 is composed of a
set of mini-clusters.
We extended our silhouette analysis to test our hypothesis but now using
the number of clusters as 3, as suggested by the elbow method. This analysis
reveals interesting observations. Firstly, the average silhouette score is lower,
around 0.45. Since this score is greater than zero (which is the mid-point of the
35
range of values that a silhouette score can take) and roughly lies in the middle
of 0 to 1, it suggests that the two clusters are different, albeit with slight dif-
ferences in their configuration. Secondly, the coefficient values associated with
each cluster lie beyond the average silhouette score. These two observations
justify the choice of 3 as the number of clusters in our system. And lastly, we
observe that the division into two clusters comes by splitting cluster number 1
in the top sub-figure of Figure 2.6, thus validating our hypothesis. This obser-
vation suggests that there are two distinct set of polymorphs, with one set split
into two derivatives.
Next, we used a different method, a t-distributed Stochastic Neighbor Em-
bedding (t-SNE) method, to visualize the clusters found by the k-means clus-
tering algorithm. [86] t-SNE is a powerful statistical tool to visualize high-
dimensional data. It is a non-linear dimensionality-reduction technique that
models the similarity between the points to joint probabilities to finds low-
dimensional embedding of high-dimensional data. It tries to minimize the
Kullback-Leibler divergence [87] between the joint probabilities of the low-
dimensional embedding and the actual high-dimensional data. In other words,
t-SNE models high-dimensional data to a low-dimensional embedding in a way
that similar objects are modeled by neighboring points, and dissimilar objects
are modeled by distant points with high probability. Kullback-Leibler diver-
gence is also called relative entropy and measures the difference between two
probability distributions. t-SNE’s cost function is not convex, and so it yields
different results for different initializations.
t-SNE has a tuneable parameter called “perplexity,” which balances the at-
tention given between local and global aspects of the data. The perplexity value
36
37
Perplexity 2 Perplexity 5 Perplexity 10
100 100400
50
200
0
0
0
50
200
100
100
400
500 0 500 100 0 100 0 100
Perplexity 30 Perplexity 50 Perplexity 100
40
40 7.5
5.0
20
20
2.5
0 0
0.0
20 20 2.5
50 0 0 20 0 20
Figure 2.7: Student’s t-distributed stochastic neighbor embedding (t-SNE) of
our six-dimensional data represented on a two-dimensional plane for different
values of perplexities. Gray dots represent candidate crystal structures, red and
yellow dots, respectively, denote the thermodynamically metastable alpha and
thermodynamically stable beta phases.
38
Alpha phase
Beta phase
40
20
0
20
40 30 20 10 0 10 20 30
t-SNE1
(a)
Alpha phase
Beta phase
40
20
0
20
40 30 20 10 0 10 20 30
t-SNE1
(b)
Figure 2.8: t-SNE low-dimensional embedding of our six-dimensional data,
color coded using the same scheme applied to clusters discovered by k-means
clustering algorithm: (a) with two clusters and (b) with three clusters.
t-SNE2 t-SNE2
has a complex effect on the resulting embedding in low-dimensional space. It
influences the variance of the distribution and is essentially the number of near-
est neighbors for each data point. The original t-SNE paper [86] recommends
perplexity values between 5 and 50. We conducted a sensitivity analysis for dif-
ferent values of perplexity and plotted the resulting t-SNE distribution in Figure
2.7. For perplexity values 2, 5, and 10, local variations dominate. For values out
of range like 100, the analysis shows merged clusters. Values between 30-50
show typical cluster behavior. For our system, a perplexity value of 30 gave
consistent results, so we chose that number for our future calculations. In Fig-
ure 2.8, we plot clusters calculated by k-means clustering algorithm using the
number of clusters values as 2 and 3 on two dimensional embedding generated
by the t-SNE method. Appendix A shows a three dimensional embedding of the
data with number of clusters value as three as calculated by k-means clustering
algorithm.
Each cluster indicates an ensemble of polymorphs that they represent. Gen-
erally, the typical way of thinking about polymorphs is that there are a small
handful of polymorphs for a given system, namely the ones that are found by
experiments. But the simulations tell us that changing the value of one or all of
the unit cell parameters by small amounts produces a set of candidates that are
very close in energy to one another. Basically, we need to rethink the prevalent
conception of polymorphs in terms of the energy landscape for a given materi-
als system. There are not just a few relatively deep and distinguishable “energy
wells” in the landscape, there are many wells that are similar and may be easily
accessible kinetically, for example, using shear in the study by Diao et al. on a
similar organic semiconductor system. [1] Ultimately, experiments are limited
by the resolution of their instruments and, while an experimental resolution of
39
50 meV or so (¡1.0 kcal/mol) is impressive, the MD simulations are capable of
providing data with much smaller energy differences that we are able to deter-
mine the energy of all possible choices of unit cell parameters. In the following
sections, we discuss the correlation of polymorphs in terms of their energy and
lattice constants.
2.4.3 Correlation with Energy
In order to examine the range of values that the total energy can take for com-
binations of unit cell parameters (a, b, c, α, β, γ) in diTMS-BTBT, we plotted a
heat map distribution showing the probabilities of total energy values on the
low-dimensional embedding generated by t-SNE technique in Figure 2.9. We
note that there is a clear preference for lower energy values in cluster number
1, which is shown in orange in Figure 2.8(b). This cluster also contains the ex-
perimentally identified beta polymorph, which is thermodynamically the more
stable one of the two found experimentally. Next, cluster number 2, shown as
blue in Figure 2.8(b), exhibits a stronger preference for higher energy values.
This group of polymorphs contains the experimentally identified alpha poly-
morph, which is thermodynamically metastable. If we identify the number of
clusters as three in the k-means clustering algorithm, Figure 2.8 suggests that
a third polymorph shown in green might exist as a derivative of the original
cluster 1.
We observe in Figure 2.9 that this cluster, although structurally similar to
cluster number 1, tends to have a wider range of energy values. To further probe
into the energy distribution associated with each cluster of structures, we plotted
40
histograms of energy values in Figures 2.10 and 2.11, assuming the number of
clusters to be two or three, respectively. These histograms indicate how likely
are the energy values for a particular set of candidate structures. The histograms
are color-coded using the same scheme utilized in Figure 2.8 to isolate the clus-
ters. In Figure 2.10, we note that for cluster 1 (orange), our model predicts the
experimentally identified beta polymorph to be in the lowest energy bin. This
result matches with the experimental finding, which determined the beta poly-
morph to be the stable one. The graph also shows that the distribution peaks at
an energy value of ∼ 267 kcal/mole. Since the dominant forces of interactions in
organic crystals are weak non-bonded forces (van der Waals and electrostatic),
we attribute this observation to the possibility of various kinetically accessible
conformations of the crystal, leading to a complex polymorphic landscape with
multiple structures located within a very small energy window.
For the second cluster (blue), the majority of the structures align with the
energy value of the metastable alpha polymorph, as depicted by the energy
peaks. In Figure 2.11, where we chose the number of clusters to be three, we note
that the peak of the distributions lie at roughly ∼ 265, 267, and 270 kcal/mol.
This suggests that the green cluster, which we believe is the derivative of the
original cluster 1 (shown as orange in Figure 2.8(a)) is not only structurally close
to the group 1, but also exhibits similar energy values albeit spread over a wider
range (265-275 kcal/mole). And that the three peaks are within 2-3 kcal/mol of
each other. We suspect that, due to these observations, this third polymorph is
probably not observable experimentally.
41
42
290
Alpha phase
Beta phase
40 285
20 280
275
0
270
20
265
40 30 20 10 0 10 20 30
t-SNE1
Figure 2.9: Energy distribution in kcal/mole for candidate structures plotted
using t-SNE for two-dimensional embedding.
t-SNE2
43
0.20
Beta Phase
0.15
0.10
0.05
255 260 265 270 275 280 285 290 295
0.20
Alpha Phase
0.15
0.10
0.05
255 260 265 270 275 280 285 290 295
Energy (Kcal/mol)
Figure 2.10: Probability density of total energies for the case in which we use the
number of clusters to be 2 in the k-means clustering algorithm. The top figure
shows results for the cluster (polymorph) 1 and the bottom shows that for the
cluster (polymorph) 2.
Probability Density
44
0.20
Beta Phase
0.15
0.10
0.05
255 260 265 270 275 280 285 290 295
0.20
0.15
0.10
0.05
255 260 265 270 275 280 285 290 295
0.20
Alpha Phase
0.15
0.10
0.05
255 260 265 270 275 280 285 290 295
Energy (Kcal/mol)
Figure 2.11: Probability density of total energies for the case in which we use
the number of clusters as 3 in our k-means clustering algorithm. Top, middle,
and bottom figures show results for the derivative of polymorph 1, as well as
for polymorphs 1 and polymorph 2, respectively.
Probability Density
2.4.4 Correlation among Lattice Constants
Each of the points in a given cluster has a set of lattice parameters associated
with it. We now analyze these parameters in order to identify any preferential
(commonly occurring) lattice parameters in a given cluster. In a sense, this can
be thought of as either the “degeneracy” of preferred lattice constants or as a
measure of the “stiffness” of the parameters, using Sethna et al.’s terminology
for parameters. [88] This analysis is shown in Figures 2.12 and 2.13 given that
the number of clusters is 2 or 3, respectively. From Figure 2.12’s depiction of
the distributions of unit cell parameters, it is clear that there are, indeed, highly
stratified and energetically preferred unit cell parameters.
For the case identified as having two clusters, we conclude that group 1
(shown in orange) which corresponds to having the lower energy values, the
lattice parameters prefer to have values for a in the range of 12.2 − 14.7 Å, b
between 5.8 − 7.1 Å, and c in the range 10.6 − 11.9 Å. While these ranges in the
individual parameters would be considered to be broad from an experimental
viewpoint, there are clearly only certain combinations of a, b, c parameters that
lead to low-energy candidates. Group 2, which associates with the metastable
polymorph, prefers to have values for a in the range 6.9 − 8.1 Å, b between
9.3 − 10.5 Å, and c in the range 10.6 − 11.2 Å.
Using the number of clusters as three, we see in Figure 2.13 that the lattice
constant values for group 1 (orange) and its derivative (green) lie in similar
ranges. This validates our analysis using a k-means clustering algorithm which
suggested these two clusters are structurally close. We notice particular differ-
ences in the lattice length a and angles α and β. For the derivative polymorph
(green), a values range from 12.2 − 14.8 Å while for the original polymorph (or-
45
ange) that contains the experimental beta polymorph, prefers values of a clus-
tered around 14Å. Similarly, for the angle α, the derivative polymorph adopts a
range of values from 83− 88 degrees, while for the original one, the values align
around 89 degrees. For angle β, the derivative polymorph has values between
86 − 89 degrees and the original one ranges from 89 − 91 degrees.
The overall result here is to recommend to researchers that they constrain
their domain search for other polymorphs within the parameter ranges iden-
tified above to look for the presence of other thermodynamically stable or
metastable polymorphs. We also draw readers’ attention to Herbstein’s re-
view [89], which warns that large discrepancies can be involved in measuring
lattice lengths and angles. These discrepancies can be associated with differ-
ences in approach taken by different experimental techniques and difficulties
involved in low-temperature measurements.
In contrast to the relatively wide range in lattice lengths, the unit cell angles,
α, β, and γ that correspond to low energies prefer to adopt values within a rela-
tively narrow range, between 84 − 90 degrees for α, 89 − 92 for β and 88 − 90 for
γ. In Sethna’s parlance [88], this would suggest that the energy of the candidate
lattice is “stiffer” in, i.e., is more sensitive to, γ than either α or β. Lenn et al.
found a similar pattern for TIPS-pentacene. [1]
The experimentally observed values of all six parameters lie exactly
on our BayesOpt-predicted results that correspond to the lowest ener-
gies. Experiments show that the stable beta phase has lattice constants
{14.28, 6.29, 11.15, 90.00, 91.47, 90.00}, while the metastable β phase is char-
acterized by {7.43, 10.04, 13.53, 82.83, 82.08, 88.63}; this is indicated in Figures
2.12 and 2.13.
46
47
1.0 1.0 1.0
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
6 8 10 12 14 16 6 8 10 10 11 12 13 14 15
1.0 a(Å) 1.0 b(Å) 1.0 c(Å)
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
85 90 85 90 85 90
(°) (°) (°)
(a)
1.0 1.0 1.0
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
6 8 10 12 14 16 6 8 10 10 11 12 13 14 15
1.0 a(Å) 1.0 b(Å) 1.0 c(Å)
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
85 90 85 90 85 90
(°) (°) (°)
(b)
Figure 2.12: Distribution of lattice constants of polymorphs found by the k-
means clustering algorithm using the number of clusters as two. (a) Cluster
1 (top subfigure) contains the experimentally found stable beta phase, and (b)
cluster 2 (bottom subfigure) contains the experimentally found metastable alpha
phase.
Probability Probability Probability Probability
48
1.0 1.0 1.0
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
6 8 10 12 14 16 6 8 10 10 11 12 13 14 15
1.0 a(Å) 1.0 b(Å) 1.0 c(Å)
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
85 90 85 90 85 90
(°) (°) (°)
(a)
1.0 1.0 1.0
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
6 8 10 12 14 16 6 8 10 10 11 12 13 14 15
1.0 a(Å) 1.0 b(Å) 1.0 c(Å)
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
85 90 85 90 85 90
(°) (°) (°)
(b)
Probability Probability Probability Probability
49
1.0 1.0 1.0
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
6 8 10 12 14 16 6 8 10 10 11 12 13 14 15
1.0 a(Å) 1.0 b(Å) 1.0 c(Å)
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0.0 0.0 0.0
85 90 85 90 85 90
(°) (°) (°)
(c)
Figure 2.13: Distribution of lattice constants of polymorphs found by the k-
means clustering algorithm using the number of clusters as three. (a) Deriva-
tive polymorph (top subfigure), (b) cluster 1 (middle subfigure) that contains
the experimentally found stable beta phase and (c) cluster 2 (bottom subfigure)
contains the experimentally found metastable alpha phase.
Probability Probability
2.5 Conclusions
We have developed an algorithm based on a Bayesian Optimization technique
that we have shown is capable of accurately predicting low-energy crystal struc-
tures that are likely to be thermodynamically stable or metastable, using an ef-
ficient machine learning-based search. In our test case, for diTMS-BTBT, our
approach was able to select viable potential candidates from hundreds of po-
tential structures by considering under a quarter of them. This demonstrated
that a BayesOpt approach can significantly reduce the number of trials of can-
didate materials, leading to a greatly reduced computational cost. Running a
sequential and comprehensive search on complex systems, like the one studied
in this work, would be close to intractable computationally or experimentally.
It also has the significant advantage of being self-directed without user inter-
vention or monitoring. In addition, our BayesOpt model is highly scalable and
can lend itself to further development (such as the consideration of multiple
information sources or multiple objective functions).
From the point of view of the code’s ability to accurately predict thermody-
namically stable and metastable polymorphs, we showed the accuracy of the
method to be within the uncertainty of MD simulations (around 1-2 kcal/mol)
and sufficiently accurate to identify promising candidates for experimental in-
vestigation. We observed that the lattice parameters fell into narrow ranges
that would be preferred by stable polymorphs, rather than metastable or un-
stable combinations of unit cell parameters. As a result, our machine learning
approach was able to predict unit cell parameters of both polymorphs that had
been identified experimentally. The lattice parameter values we predicted for
these two polymorphs using our accelerated search were in excellent alignment
50
with those found experimentally. In addition to the two polymorphs identified
experimentally, we suggest that a third polymorph as a derivative of the beta
phase might exist. But stabilizing it would require higher temperatures with
slower heating-cooling cycles. Finally, it is worth remembering that, while ex-
periments only identified two polymorphs, the computations have shown again
(as for TIPS-pentacene, another organic semiconductor with a π-π core) that a
“forest” of other kinetically stabilized polymorphs exist, separated by energies
well below kT. While these are unlikely to be identifiable experimentally, this
reminder of the richness of the situation may help us to rethink our views on
polymorphic characteristics.
51
CHAPTER 3
MOLECULAR DESIGN RULES FOR ACCESSING DIFFERENT
POLYMORPHIC TRANSITION MECHANISMS IN ORGANIC
ELECTRONIC CRYSTALS
The following chapter is based on a paper published in the Journal of Chemistry of
Materials. [3]
3.1 Introduction
Polymorphism is a condition in which a compound can adopt multiple crys-
talline structural forms. Due to the differences in the structural packing of the
polymorphs, these materials exhibit different physical properties. [90] In the
case of organic semiconductors, polymorphs can have different charge carrier
mobilities varying in several orders of magnitude. [2] This has attracted con-
siderable research interest in the scientific community, [1, 26, 27, 91] especially
to understand two major aspects of polymorphism in organic electronics: (a) to
exploit this phenomenon as a way to enhance the device performance without
changing the chemical structure and (b) to understand the fundamental rela-
tionships between charge transport and crystal structure. However, since poly-
morphs are often prone to structural reversibility, specific access and isolation
of these structures in a consistent manner for large-area production presents a
great challenge. Towards this aim, a molecular level understanding of the poly-
morphic transition mechanism and pathways is essential, but is rarely studied.
Two distinct polymorphic transitions have been studied before, known as
the ’nucleation and growth’ and ’cooperative’ transitions. The ’nucleation and
52
growth’ transition transforms in a molecule-by-molecule fashion [92] causing
significant changes in the “daughter crystal” structures. On the other hand,
the “cooperative” transition [93] allows slight changes enabling single-crystal-
to-single-crystal transitions and, thereby, preserving the structural integrity of
the crystal. Nucleation and growth usually disrupts the integrity of the “par-
ent” crystal, [94] while cooperative transition allows a close orientational rela-
tionship between the “parent” and “daughter” crystal structure. The “nucle-
ation and growth” transition correlates to larger differences in the polymorph
structures, whereas the “cooperative” transition relates to smaller differences.
However, such a correlation does not explain the molecular origin of this “co-
operative” behavior. Therefore, in this work, we address this molecular-level
question to better understand the cooperative transition in organic systems.
Cooperative polymorphic transitions in molecular crystals are typically trig-
gered by mechanical and/or thermal energy that lead to interesting properties
such as the “shape memory” effect, [95] thermosalient effect, [96] and supere-
lasticity. [97] However, studies investigating this polymorphic transition remain
significantly low given their rare occurrence in molecular crystals. Reports in-
vestigating the molecular mechanisms underlying cooperative transitions are
even more scarce. Cooperative transitions in organic electronics was first
discovered in ditert-butyl [1]benzothieno[3,2-b]- benzothiophene (ditBu-BTBT)
and 6,1 3 - bis -(triisopropylsilylethynyl)pentacene (TIPS-pentacene) single crys-
tals. [50] A major advantage of utilizing cooperativity in organic electronics
is that such a mechanism preserves structural integrity and allows rapid, re-
versible modulation of electronic properties. In this work, we hypothesize that
the bulky side-chain rotation provokes this rare transition. With a molecular-
level understanding of the mechanism behind the cooperative transition, we
53
can articulate molecular design rules to prompt cooperativity for future appli-
cations in organic electronics and molecular devices.
In this work, we investigate our hypothesis of side-chain rotation as a molec-
ular design rule for triggering a cooperative transition in molecular crystals. We
additionally demonstrate that the polymorphic transition mechanism is sensi-
tive to even a single atom substitution within the side-chain. We substitute a
carbon atom in the side-chain of ditBu-BTBT that exhibits a cooperative tran-
sition to silicon in the new system, bis(trimethylsilyl) BTBT (diTMSBTBT). The
bulkier silicon atom in the side-chain of diTMS-BTBT alters the neighboring
environment to inhibit side-chain rotation, thereby converting the initial coop-
erative transition mechanism to one of nucleation and growth. As a result, by
a simple one-atom substitution in the side-chain, we have access to two poly-
morphic transition pathways. Using this simple molecular design rule gives the
advantage of accessing both polymorphic transition pathways and selectively
utilize their advantages in organic electronic applications. In the following sec-
tions, diTMS-BTBT and ditBu-BTBT will be abbreviated as diTMS and ditBu,
respectively.
3.2 Transition Pathways of Nucleation and Growth versus Co-
operative Transition in Organic Electronics
A polymorphic transition usually occurs in molecular crystals following the
“nucleation and growth” mechanism. Recently Chung et al. [50] discovered a
cooperative polymorphic transition mechanism in single crystals of ditBu or-
ganic semiconductor triggered by bulky side-chain rotation. Using differential
54
calorimetry scanning (DSC) technique, they discovered polymorphs in ditBu
and diTMS, and demonstrated that polymorphic transitions happen at 151 °C
(Figure 3.1a) and 71 °C (Figure 3.1b) upon heating for diTMS and ditBu, re-
spectively. Note that the transition temperature significantly increased with the
change of just a single atom from carbon to silicon in the side-chains.
For the case of diTMS, our experimental collaborators (H. Chung and Y.
Diao at UIUC) [3] discovered that, during the annealing process, the α crys-
tals demonstrated polymorphic transitions (Figure 3.1c), while the β crystals
showed no transition but sublimed. The α-to-β transition in diTMS was marked
by a single nucleation site, with broad growth directions of the new polymorph,
and a slow irreversible transition that lasted several minutes (Figure 3.1c). On
the other hand, ditBu crystals showed a typical phase boundary line formation,
a directional propagation of the new polymorph, and a fast reversible transi-
tion that lasted seconds (Figure 3.1d). diTMS single crystals followed a single-
crystal-to-polycrystal transition behavior, whereas ditBu single crystals showed
single-crystal-to-single-crystal transition. The loss in the crystalline integrity of
the diTMS crystal is caused by the large structural differences between the par-
ent and daughter forms, typically observed in the nucleation and growth mech-
anism. [94] However, due to the structural similarities between the polymorphs
in ditBu, the crystals are able to preserve the structural integrity for multiple
thermal cycles of cooperative transitions. [50]
Therefore, the substitution of a single atom in the side-chains from silicon to
carbon, with otherwise identical conjugated cores, leads to accessing two poly-
morphic transition pathways. A cooperative transition preserved the structural
integrity and allowed reversible access to the polymorphs, leading to the shape
55
56
Figure 3.1: Two polymorphic transition pathways of nucleation and growth and
cooperative transition in single crystals of diTMS and ditBu. (a) DSC of diTMS
powder showing reversible polymorphic transition. The molecular structure is
shown in the inset. (b) DSC of ditBu powder showing reversible polymorphic
transition. The molecular structure is shown in the inset. (c) Cross-polarized
optical microscopy (CPOM) images of a single crystal of diTMS α form. The
α form transitions to the β form upon heating after approximately 415 s. Upon
cooling, the β form does not transition back. (d) CPOM images of a single crystal
of ditBu LT form (blue birefringence). The LT form rapidly transitions to the
HT form (brown birefringence) upon heating in approximately 1.25 s. Upon
cooling, the HT form reversibly transitions to the LT form in 0.38 s. Reprinted
from [3]. Copyright 2019 American Chemical Society.
57
diTMS-α diTMS-β ditBu-LT ditBu-HT
T (°C) 25 27 25 97
a (Å) 7.55 6.35 6.15 6.33
b (Å) 10.10 11.36 10.65 10.42
c (Å) 13.59 14.31 14.18 14.63
α (°) 82.75 87.65 87.79 86.25
β (°) 82.17 90 90 90
γ (°) 88.52 90 90 90
V (Å3) 1020.14 1030.34 928.46 962.47
Table 3.1: Unit cell information for polymorphs of diTMS and ditBu.
memory and function memory effects studied by Chung et al. [50] This serves
as an ideal platform to investigate the impact of subtle changes of side-chains
on the polymorphic transition pathway.
3.3 Computational Techniques
Understanding the molecular mechanism behind polymorphic phase transi-
tions is essential for controlling molecular packing for high-performance or-
ganic electronics. In this study, we conducted Plane Wave Density Functional
Theory (DFT) simulations to investigate the polymorphic transition pathways
in diTMS and ditBu. We test our hypothesis of side-chain rotation as a molecu-
lar design rule for triggering cooperative transition by calculating the rotational
energy barrier associated with the polymorphs in the two systems. Using these
calculations, we establish that, by comparing the molecular environment be-
tween the two systems, we can pinpoint the molecular origin of each transition
mechanism.
3.3.1 Density Functional Theory
DFT is a quantum mechanical computational modeling technique that is used
to investigate the electronic environment and nuclear structure of many-body
systems. This theory is employed to solve Schroedinger’s equation as given by
Eq. 3.1
ĤΨ = EΨ (3.1)
58
where E is the energy, Ψ the wavefunction, and Ĥ the Hamiltonian operator for
a system of M nuclei and N electrons. Eq. 3.2 gives the mathematical expression
of the Ĥ, that includes the sum of the kinetic energy of all particles in the system
and the electrostatic interactions between them. rij , riA and RAB denote electron-
electron, electron-nucleus and nucleus-nucleus distances, respectively, ∇ is the
Laplace operator and ZA is the charge associated with nucleus A.
1 ∑N 1 ∑M ∑N ∑M ∑N ∑N ∑M ∑M
Ĥ = − ∇2i − ∇2
ZA 1 ZAZB
A + + + (3.2)2 2 r
i=1 A=1 i=1 A=1 iA
r R
i=1 j>1 i j A=1 B=1 AB
Since DFT allows the prediction and calculation of materials behavior based
on the principles of quantum physics, it is the most powerful and accurate form
of computational modeling technique to study materials’ properties and struc-
ture. Using this theory, the properties of a many-electron system, including
ground-state energies, reaction paths, atomization energies, etc., can be deter-
mined with atomistic resolution. DFT is among the most popularly used meth-
ods in condensed-matter physics, computational physics, and computational
chemistry. This technique has been successfully employed for studying a wide
range of problems ranging from simple atomic systems to molecular complexes,
solids, quantum, and classical fluids, etc. However, its accuracy comes at the
cost of extensive computational resources which limits the system size, i.e., the
number of atoms that can practically be studied, to a few hundred atoms at best.
The Schrödinger equation gives the time evolution of a wave function, which
is a mathematical description of the quantum state of an isolated physical sys-
tem. However, for any realistic system, it is impossible to calculate the exact so-
lutions of this equation. A few approximations like Born-Oppenheimer and re-
59
lated concepts come in handy. The Born-Oppenheimer approximation assumes
that the electrons move in the field of fixed nuclei. This essentially sets the ki-
netic energy of the nuclei to zero and their potential energy to a constant Vext.
DFT also uses the concept of “electron density” that is described as the proba-
bility of finding electrons within a volume element d→−r . Another important con-
ceptualization that helped in the development of DFT is the variational princi-
ple for the ground state, which states that the ground-state energy is a functional
(functions of another function) of the spatially dependent electron density and
the nuclear potential Vext. However, despite advances in the new concepts and
assumptions, the use of Schrödinger’s equation to solve for materials properties
still remained evasive.
Hohenberg and Kohn proposed two fundamental theorems that made it pos-
sible to solve Schrödinger’s equation. The first theorem states that the external
potential Vext, and hence the total energy, is a unique functional of the electron
density n(r). This means that with the knowledge of the electron density, it is
possible to compute all properties of the system, including total energy, atomic
structure, etc. The proof can be consulted in the original paper. [98] The sec-
ond Hohenberg-Kohn theorem states the ground-state energy can be obtained
variationally, such that the density that minimizes the total energy is the ex-
act ground-state density. This means that any electron density that satisfies the
boundary conditions on the system will lead to an upper bound on the ground-
state energy of the system unless that electron density is the true ground-state
electron density itself. The development of these theorems marked the forma-
tion of DFT.
Due to these theorems, it is possible to theoretically solve for only the sim-
60
pler electron density, which is a 3-dimensional function of x, y and z and forego
computing the complex 3N-dimensional wavefunction (where N is the num-
ber of electrons in the system). However, the exact form of the functional that
maps electron density and energy is unknown in practice. In reality, the self-
interaction correction, exchange and Coulomb correlation of the kinetic energy
and any electron-electron interactions are completely unknown. Here, approxi-
mations based on the Kohn-Sham (KS) equations (Eq. 3.3) become necessary. In
a KS formulation, the orbitals Ψi(r) of a reference non-interacting system with
the same electron density as the real interacting one are used. An additional
term, Vxc, called the exchange-correlation potential is introduced to account for
all the unknown contributions to the Hamiltonian. Vext and VH in Eq. 3.3 are,
respectively, the external potential due to the presence of the nuclei and the
Coulomb potential.
 1 ∑N − ∇2 + Vext(r) + VH[n] + Vxc[n] Ψi(r) = iΨi(r) (3.3)2
i=1
The KS equations help with solving the intractable problem of N interacting
particles to a much simpler approach of solving N non-interacting particles us-
ing the exchange-correlation contribution. However, despite this simplification,
the exact formulation of the exchange-correlation potential is still unknown.
Some approximations like The Local Density Approximation (LDA) and Gener-
alized Gradient Approximation (GGA) become necessary. LDA approximates
the exchange-correlation potential as dependent only on the electron density at
each point. One limitation of this idea is manifested as a significant underesti-
mation of the bandgap energy of semiconductors. GGA improves upon LDA by
including the gradient of the electron density to account for non-homogeneity.
61
3.3.2 System Details
To test our hypothesis, we conducted plane wave DFT simulations to obtain
geometry-optimized structures and the corresponding energetics of the rota-
tional barrier of the side-chains in diTMS and ditBu. All the simulations were
performed using the Quantum Espresso code. [99] Our system consisted of a
periodic unit cell lattice containing 96 atoms. Due to the intensive nature of
the calculations involved for a system even of this size, simulating bigger sys-
tem sizes was essentially out of reach even with access to a local supercomputer.
Perhaps more importantly, since the rotation of the bulky side-chains will affect,
and be affected by, chiefly, its nearest neighbors, we expect that no significant
changes in the energy barriers are likely to result if we increased the size of the
unit cell, but were unable to test this assumption. We used a cutoff energy of 60
Ry and a 2 × 4 × 2 k-mesh for the LT and HT phases of ditBu and the α phase
of diTMS, and we used a 4 × 3 × 2 k-mesh for the β phase of diTMS to achieve
a targeted convergence of below 0.005 eV in energy. The simulations were per-
formed using Perdew-Zunger pseudopotentials within the LDA level of theory.
3.4 Results
We created a test set of distinctly different initial configurations that were ob-
tained by rotating one side-chain in a crystal in 5° increments until a full rotation
of 120° was made (threefold symmetry). Each of these initial configurations was
then run in a geometry-optimized DFT simulation (using Quantum ESPRESSO)
such that the side-chain atoms under consideration remained fixed at their ini-
tial positions, while the rest of the atoms were allowed to relax around this
62
rotational displacement. This was done to make sure that there were no arti-
facts related to van der Waals repulsion. Each of these simulations resulted in a
calculation of a characteristic energy value associated with a particular angle of
rotation induced in the system. The results for both polymorphs of diTMS and
ditBu are shown in Figures 3.2 - 3.6.
For diTMS, the rotation energy was much larger for the α form, in which the
maximum was ca.17.9 kcal/mol, than for the β form, ca.9.1 kcal/mol (figure 3.2,
3.3, 3.6). We observe that the rotation energy decreases in the β form after a large
change in structure via nucleation and growth. For ditBu, the rotational energy
for the LT form is ca.3.3 kcal/mol, while for the HT form, it is ca.1.7 kcal/mol
(Figure 3.4, 3.4, 3.6). After a cooperative transition occurs, the molecular envi-
ronment favors side-chain rotation, hence lowering the rotational energy barrier
for ditBu side-chains. Comparing the diTMS with the ditBu system before tran-
sition, the rotational energy of the α form of diTMS (ca.17.9 kcal/mol1) is more
than a factor of 5 higher than that of the LT form of ditBu (ca.3.3 kcal/mol),
confirming our observation that the diTMS side-chains are more interlocked
compared to the ditBu side chains. After transition, the rotation energy of the
β form of diTMS (ca.9.1 kcal/mol) is also higher than that of the HT form of
ditBu (ca.1.7 kcal/mol), indicating that the diTMS side-chains remain static af-
ter nucleation and growth, while the ditBu side-chains become disordered after
cooperative transition.
A single-atom substitution of the ditBu bulky side chain leads to stronger
intra- and interlayer interactions and thus a more restricted packing environ-
ment of the diTMS system that constrains the rotation of the side-chains. This
mechanistic analysis confirms our hypothesis that the rotation of the tBu side-
63
64
Figure 3.2: Energy barrier associated with rotating a side-chain in a single crys-
tal of diTMS-α. Simulation snapshots of key rotational points are shown on
the graph. Color key in the snapshots: red, cyan and orange reflect the carbon
atoms in the side-chain that were rotated in this study; gray, yellow, pink, and
white indicate carbon, sulphur, silicon, and hydrogen atoms, respectively.
65
Figure 3.3: Energy barrier associated with rotating a side chain in a single crystal
of diTMS-β. Simulation snapshots of key rotational points are shown on the
graph. Color key in the snapshots: red, cyan and orange reflect the carbon
atoms in the side-chain that were rotated in this study; gray, yellow, pink, and
white indicate carbon, sulphur, silicon, and hydrogen atoms, respectively.
66
Figure 3.4: Energy barrier associated with rotating a side-chain in a single crys-
tal of ditBu-LT. Simulation snapshots of key rotational points are shown on the
graph. Color key in the snapshots: red, cyan and orange reflect the carbon atoms
in the side-chain that were rotated in this study; gray, yellow, and white indicate
carbon, sulphur, and hydrogen atoms, respectively.
67
Figure 3.5: Energy barrier associated with rotating a side-chain in a single crys-
tal of ditBu-HT. Simulation snapshots of key rotational points are shown on
the graph. Color key in the snapshots: red, cyan and orange reflect the carbon
atoms in the side-chain that were rotated in this study; gray, yellow, and white
indicate carbon, sulphur, and hydrogen atoms, respectively.
68
25
diTMS  form
diTMS  form
20
15
10
5
0 0 20 40 60 80 100 120
Side chain rotation angle (°)
7
ditBu LT form
6 ditBu HT form
5
4
3
2
1
0
0 20 40 60 80 100 120
Side chain rotation angle (°)
Figure 3.6: Comparison of energy barrier of rotation for diTMS α and β phases
(top) and ditBu low temperature (LT) and high temperature (HT) forms (bot-
tom). The more interlocked side-chain environment of diTMS (which has a Si
atom in the side-chain) shows a much higher energy barrier compared to the
more flexible environment of ditBu (which has C in the side-chain).
Energy Barrier (kcal/mol) Energy Barrier (kcal/mol)
chains drives a cooperative transition, whereas hindering the rotation of TMS
side-chains leads to a nucleation and growth transition. Therefore, promoting
the rotation of side-chains can be used as a molecular design rule to access the
rarely observed cooperative transition. Modifying the side-chain can also serve
as a useful tool to access both polymorphic transition pathways.
3.5 Conclusions
The use of plane wave DFT simulations allowed us to study the experimen-
tal findings of cooperative polymorphic transition in ditBu and nucleation and
growth mechanism in diTMS, and rationalize them in terms of rotational side-
chain barrier in the molecule. Our results show that less bulkiness in the side-
chain of tBu causes a low rotational barrier, on the order of 2-4 kcal/mole, in
both LT and HT phases. This triggers a cooperative transition mechanism and
also allows for rapid structural reversibility between the polymorphs. However
in TMS, which involves the presence of a bulkier atom, Si, in the side-chain leads
to a more restricted packing environment for the molecule, thereby constraining
the rotation of the side-chains. A higher rotational barrier locks the side-chains
in place and forces a nucleation and growth mechanism for the transition. This
mechanistic insight confirms our hypothesis and establishes side-chain rotation
as a molecular design objective to access the rarely observed transitional path-
ways and that can be selectively accessed by this simple molecular design rule.
We anticipate that our study has broad implications beyond organic electronics.
This study is the first to demonstrate that polymorphic transition pathways
are extremely sensitive even to a single atom substitution in the molecular struc-
69
ture, and that bulky side-chain rotation is a simple molecular design rule to
modulate the transition mechanism. In this work, we demonstrate selective trig-
ger of both the nucleation and growth as well as cooperative transition mecha-
nism by a single atom substitution.
Our computational approach validates the experimental findings and ex-
plains the underlying molecular principles behind polymorphic phase transi-
tion. Using plane wave DFT calculations, we were able to accurately probe the
molecular scale of phase transitions and develop a valuable insight into un-
derstanding the correlation between molecular design packing and preferred
transitional pathways.
70
CHAPTER 4
ELUCIDATION OF STRUCTURE-PROPERTY RELATIONSHIPS IN A
WOVEN CLASS OF MATERIALS
Parts of this chapter are inspired by a paper published in the Journal of Applied Materi-
als and Interfaces. [5]
4.1 Introduction
Metal-organic frameworks (MOFs) represent the first class of nanoporous ma-
terials synthesized by the reversible coordination of bonds between metal
ions and organic ligands. [36, 100–102] In the past decade or so, these ma-
terials have garnered huge interest from the scientific community for their
widespread use in a variety of applications, including gas storage, [103, 104]
separation, [105–108], catalysis [109, 110] sensing [111, 112] and the capture of
toxic chemicals. [113, 114] Due to their high porosity and the accessibility of
their high surface area, MOFs have gained a reputation for being suitable for
reliable, safe, and low-energy means of dense gas fuel storage. For example,
HKUST-1 [Cu3(1,3,5-benzenetricarboxylate)2] showed an outstanding volumet-
ric methane uptake capacity of 261 cm3 (STP) cm−3 (65 bar, 298 K), [115] making
it the only nanoporous material to effectively reach the the U.S. Department of
Energy’s ambitious goal of 263 cm3 (STP) cm−3 at 65 bar and 298 K. [116]
Although MOFs hold huge promise for use in their pure form, their combi-
nation with confined nanoparticles during crystal growth have produced com-
posites with exciting new properties. [117] Examples include nanostructured
71
carbon, [118] polymers, [119] metal oxides, [120] and metals, [121] encapsu-
lated into aggregates of MOF crystals and used for applications including catal-
ysis [122,123] and gas storage. [118] A particularly promising example involves
HKUST-1 crystals (a MOF) functionalized with confined Palladium nanoparti-
cles. This composite has shown one of the highest reported hydrogen gas up-
take capacities in preliminary experiments conducted by our collaborators, Prof.
David Fairen-Jimenez and Bethany M. Connolly, at the University of Cambridge
(unpublished work). Adsorption-based storage of hydrogen offers cutting-edge
technology, as it provides a cost-effective way to store the gas at much lower
pressure and in much higher amounts than in high-pressure (ca. 140 bar) tanks.
Another member of the porous materials family, called Covalent Organic
Frameworks (COFs), has emerged as among the most promising nanoporous
materials with desirable features, such as tuneable topology, low mass density,
organic metal-free backbone, well-defined porosity, a huge accessible surface
area, and high mechanical stress resistance. [124–127] COFs, similar in notation
to MOFs, are synthesized of organic building units (“connectors” and “linkers”)
woven in a fabric-like fashion to generate extended ordered structures in both
2D [124] and 3D. [128] Strong covalent bonds, rendered through the interlac-
ing of lightweight organic threads, impart COFs with outstanding properties of
low density and high elasticity, and additionally, the ability to retain their shape
under deformation and stress. As the first example of its kind, COF-505 (devel-
oped by Yaghi and co-workers [129]), exhibited a ten-fold increase in elasticity
between its metallated form, an integral part of its synthesis, and its stretchy,
demetallated woven form. Additionally, unlike polymers, COFs offer the sig-
nificant advantage of pre-designing their structure pertinent for the desired ap-
plication. Due to these noteworthy advantages, COFs have found applications
72
in gas storage, [130, 131] photoelectronics, [126] catalysis, [132] and chemosens-
ing. [133]
In this study, we present a detailed computational analysis of structure-
property relationships in two high-performing nanoporous materials- COF-506
and a nanocomposite of HKUST-1 with Pd nanoparticles. Both these mate-
rials have shown to exhibit outstanding properties which make them excel-
lent choices for gas storage and separation applications. For COF-506, we in-
vestigate its pore size distribution and, relatedly, study the diffusion of solute
molecules inside the woven material. For the HKUST-1 functionalized with Pd
nanocomposites, we conduct pore size distribution calculations to understand
the origin of the composite’s outstanding capacity and hence its high density
gas storage.
4.2 Molecular Morphology of COF-506
Synthesized in 2018 by Yaghi and co-workers [134], COF-506 has demonstrated
a remarkable ability to take up dyes and other small molecules. In this section,
we lay out our computational strategy, along with the approaches we under-
took, including diffusional studies of solute particles inside the material and
then its pore size analysis, to provide a molecular-scale understanding of the
above observed phenomena.
73
4.2.1 Computational Strategy and System Details
Covalent organic weaves are an exciting class of materials that not only exhibit
unusual properties but can also be easily tailored for desired applications. Such
engineering could be achieved by changing the chemistry of the building com-
ponents or their functionalization. However, developing a critical understand-
ing of atomic-scale conformations in porous materials is essential for the con-
struction of a desired woven material. Even if this process is guided by experi-
ence and intuition, a trial-and-error design method would be very tedious and
may not fully exploit the untapped potential of possible combinations of linkers
and connectors.
The ideal way to facilitate this process is to computationally model these
materials, preferably with an atomic-scale force field model, describing both
short- and long-range inter- and intra- molecular forces of interactions. This ap-
proach using a Molecular Dynamics (MD) technique has been used to describe
the diffusion of small molecules in MOFs and Silica zeolites. [135–137] The ac-
curacy of the structural and dynamical properties, as predicted by atomic-scale
MD, is largely determined by the effectiveness of the chosen force field. Exam-
ples of popularly used force fields include Dreiding, [72] CHARMM, [73] and
OPLS. [71]
In their seminal work of 2016, the Yaghi group at Berkeley synthesized a new
material called COF-505 [129] and later developed a similarly constructed COF
called COF-506 [134](examined in this study) in 2018. At the time, none of the
previously published force fields contained sufficient parameters to describe all
the atomic interactions in this complex material, COF-505 (Fig. 4.1). In a sepa-
rate study, a member of the Clancy group, Dr. Yaset Acevedo, conducted a force
74
75
Figure 4.1: Pictorial decomposition of the two interwoven strands comprising
COF-505. Each color represents a different strand. Copper atoms are shown as
red spheres. In the diagrams labeled “mesh 1” and “mesh 2” there are eight
independent organic chains, leading to sixteen chains in the image labeled as
“COF-505”. [5]
76
(a) (b)
Figure 4.2: (a)Metallated COF-505. Snapshot was taken from an MD simulation.
(b) Unit cell of COF-506 achieved by replicating the unit cell twice in the b- and
c- directions. [5]
field parameterization study and developed the roughly 60 missing parameters
for COF-505 to create an expanded parameter set for the OPLS force field. OPLS
was chosen due to its widespread use for organic molecules and for its numer-
ous parameter set expansions. [138,139] OPLS has the distinction of having one
of the most inclusive lists of parameterized atom types. OPLS was originally
parameterized to be accurate for condensed phases of organic molecules and
was validated to fit against experimental data for density and heat of vaporiza-
tion of organic liquids. These properties made it an ideal choice to model wo-
ven COF-505. The developed parameters for COF-505 were validated against
its experimentally determined crystal structure, and elasticity since these were
essentially the only experimental information known about this new material.
More details on the development and optimization methodology, including the
full parameter set, can be found in [5].
In their seminal work of 2016, the Yaghi group at Berkeley synthesized a new
material called COF-505 [129] and later developed a similarly constructed COF
called COF-506 [134](examined in this study) in 2018. At the time, none of the
previously published force fields contained sufficient parameters to describe all
the atomic interactions in this complex material, COF-505 (Fig. 4.1). In a sepa-
rate study, a member of the Clancy group, Dr. Yaset Acevedo, conducted a force
field parameterization study and developed the roughly 60 missing parameters
for COF-505 to create an expanded parameter set for the OPLS force field. OPLS
was chosen due to its widespread use for organic molecules and for its numer-
ous parameter set expansions. [138, 139] Due to similarity in terms of chemical
composition between COF-506 and COF-505, we have used the same force field
model and parameters for COF-506 that were developed for COF-505. In the
area of force field development, transferability of parameters is a key challenge.
77
But this is rarely achieved in practice, especially when these molecular models
are extended to tackle more specialized and complex systems. Our use of ex-
actly the same parameter set for COF-506, allowed us to validate the model in-
dependently of the fitting process. Compared to COF-505, metallated COF-506
unit cell has a monoclinic geometry, with a slight tilt in the x−z direction. Struc-
tural information, including lattice parameters and fractional coordinates of the
COF-506- Cu unit cell were adopted by powder X-ray diffraction. [134] In spite
of the different crystal structure of COF-506 (compared to COF-505), the model
developed for COF-505 model worked extremely well in predicting the proper-
ties of COF-506. This was shown more encouragingly in its ability to provide an
accurate quantitative prediction of the value for the uptake of tetrahydrofuran
(THF) by COF-506, predicting a value that was 85 percent of the experimental
value. [5] Despite no parameter fitting conducted for COF-506, achieving a THF
uptake capacity so close in agreement with the experimental value validates
our choice of using a COF-505 model to represent COF-506. And therefore, all
simulations in this study were performed using the developed OPLS parameter
extension in LAMMPS [78] (Sandia’s open source MD code) software.
4.2.2 Multiple Time Origin-based Calculations of Self-Diffusion
in COF-506
The importance of woven materials lies, to a considerable extent, in their effec-
tiveness for separations and other properties related to their porous nature. In
order to understand this structural-property relationship, we studied the diffu-
sional characteristics of solute molecules inside COF-506.
78
Diffusion can be described by Fick’s law, which states that the flux, j, of the
diffusing species is proportional to the negative gradient in the concentration of
that species c [140]:
j = −D∇c (4.1)
where D is the constant of proportionality and is called the diffusion coefficient
or diffusivity. In this research study, we calculated self-diffusivities for three
representative solutes, Helium, CO2 and THF, in COF-506-Cu (i.e., a metallated
form of COF-506 with Cu as the central metal atoms that help facilitate the in-
terweaving of the threads).
In order to compute these diffusivities, we first compute the concentration
profiles of each species. Assuming the species to be at the origin of the coordi-
nate frame at t = 0, the time evolution of the concentration profile according to
Fick’s law can be written as:
∂c(r, t)
+ ∇. j(r, t) = 0 (4.2)
∂t
Combining Eq. 4.1 with Eq. 4.2 gives:
∂c(r, t) − D∇2c(r, t) = 0 (4.3)
∂t
with the boundary condition
c(r, 0) = δ(r)
where δ(r) is the Dirac delta . Solving Eq. 4.3 w(e get:1 r2 )
c(r, t) = exp − (4.4)
(4πDt)d/2 4Dt
Here d is the dimensionality of the system. For c(r, t), we do not need to
know its value, just the time-depende∫nce of its second moment:
〈r2(t)〉 = dr c(r, t) r2 (4.5)
79
where the imposed condition ∫is:
dr c(r, t) = 1
The time evolution of 〈r2(t)〉 can be obtained by multiplying Eq. 4.3 by r2 and
integrating over all space∫, which gives: ∫
∂
dr r2 c(r, t) = D dr r2 ∇2c(r, t) (4.6)
∂t
The left-hand side of the above Eq. is simply
∂〈r2(t)〉
∂t
Applying partial integration to the right-hand side, we obtain
∂〈 ∫r2(t)〉
= D ∫ dr r2 ∇2c(r, t)∂t ∫
= D ∫ dr ∇ .(r2∇c(r, t)) − D∫ dr ∇ r2 .∇c(r, t)
= D dS∫(r2 ∇c(r, t)) − 2D d∫r r.∇c(r, t) (4.7)
= 0 − 2D ∫dr (∇.r c(r, t)) + 2D dr (∇ .r) c(r, t)
= 0 + 2dD dr c(r, t)
= 2dD
The elegant form obtained in Eq. 4.7 was first derived by Einstein. [141] In
the above Eq., D is a macroscopic property, while 〈r2(t)〉 has a microscopic inter-
pretation. This quantity is known as the “mean squared displacement” which
the species has traveled over the course of time, t. This relationship allows the
diffusion coefficient to be calculated from molecular simulation data.
For every particle i, we measure the distance traveled in time t, ∇ ri(t), and
we plot the mean square of these displace∑ments as a function of time t:
〈∇ 1
N
r(t)2〉 = ∇r 2
N i
(t) (4.8)
i=1
80
The slope of the best fit curve to the linear region of the mean squared displace-
ment curve gives us an estimate of the diffusion coefficient.
Using the relation obtained in Eq. 4.8, we calculated diffusion coefficient,
DT , of the motion of solute molecules across the woven material, as a function
of time, r(t):
1 d ∑N [ ]
DT = lim 〈 ri(t + t0) − r (t 2i 0) 〉 (4.9)2nN t→∞ dt
i
In Eq. 4.9, DT is the translational diffusion coefficient, n is the dimensionality
of the system, and N is the number of particles. ∆r(t) = r(t + t0) − r(t0) defines
the translational displacement of the particle from time t0 to time t. The angle
brackets in Eq. 4.9 indicate a thermodynamic average over many starting times,
t , constituting an ensemble of information. In our case, r(t) constitutes the total
displacement averaged over solute particles.
4.2.3 Autocorrelation Functions
The motion of solute molecules (He, CO2, THF) can be highly correlated, espe-
cially at the beginning of the simulation when the system has not reached an
equilibrium state and is, perhaps, in a metastable state. Time correlation func-
tions [6] measure how the value of a dynamic quantity, A(t), may be related to
another quantity, B(t). If A(t) and B(t) represent two different time-independent
quantities, the time correlation function∫C(t) can be defined as:τ
C(t) = lim→ ∞1 A(t0)B(t0 + t) dt0 (4.10)
τ τ 0
The integral represents an average accumulated over many time origins t0,
with each origin taken from a system at equilibrium. The quantity A is sampled
81
82
Figure 4.3: Representation of a data set storing values at a total number of L
times, ti → {i = 0, L − 1}. The method of multiple time origins loops over all
available time origins, tk → {k = 1,M} to calculate the time correlation function.
This case only represents the schematic of one time delay of t = 3 ∆t. However,
in practise, the method averages data for multiple values of time delay ranging
from ∆t = {1, L − 1} [6].
at the time-origin, while B is sampled after a delay time t. Thus, the correlation
function depends on the length of the delay. However, as it is an equilibrium
quantity, it is independent of the time- origin. The time average can then be
written as:
C(t) = 〈A(t0)B(t0 + t)〉 (4.11)
If A and B are unrelated, then they are uncorrelated, and then C can be written
as:
C → 〈A〉〈B〉 (4.12)
A and B are said to be correlated if A(t0) causes, or contributes to, B(t0 +
t). When A and B are different quantities, then C is called a “cross-correlation
function.” Finally, if A and B denote the same quantity, then C is called an auto-
correlation function. C measures how the value of A at t0 + t is correlated with
its value at t0.
Auto-correlation functions can generally be divided into two types:
• A one-particle function where any dynamic quantity A(t) is a property of
the individual molecules, for example, the velocity auto-correlation func-
tion is this type of function.
• A collective function in which A(t) depends on the accumulated contribu-
tions from all the molecules in a system.
Since the solute molecules need to find porous pathways inside the woven
material in order to traverse through the structure, the motion of these particles
is typically slow and can result in a lot of noise in the data. And therefore, as
83
a good practice, the approach of sampling multiple time-origins was employed
in this study. [6] This method samples the quantity at thousands of different ori-
gins, combines the samples algebraically, and averages them over many sample
times. Data from multiple overlapping time-origins are averaged and fitted us-
ing linear regression to find the diffusivity. Any point in time can be considered
as the time origin of the calculation. Center of mass (COM) positions of the
molecules are saved as a function of time, and the mea∑n squared displacement
(MSD) for a given time-origin is calculated [xave = 1/n xi (summed over x = 1
to n)](n is the number of windows with the same time delay). The solid lines
indicated in Fig. 4.4 indicate the displacement of solute molecules in COF-506-
Cu as calculated using the multiple time-origins method, while the dashed lines
indicate a linear regression fit used to calculate their diffusivities. A schematic
representation of the algorithm is shown in Fig. 4.3.
4.2.4 Diffusion Results
A key objective in simulating COF-506-Cu was to investigate the uptake and
motion of guest molecules through its woven structure. To that end, we looked
at the tendency of molecules to diffuse in the COF-506-Cu, using He, CO2, and
THF as the solutes, tested at four temperatures (200 to 500 K in 100 K intervals)
and pressure values of −1000 × 105 Pa, 1 × 105 Pa, and 1000 × 105 Pa. Data
for each gas for ranges of temperatures and pressures were collected over a pe-
riod of 10 ns. Fig. 4.4 shows example calculations depicting diffusivities of He,
CO2 and THF in metallated COF-506-Cu. As seen in the Figure, the diffusional
data is highly correlated given the fact that solute particles need to find porous
pathways inside the COF material to be able to traverse through it. In order
84
to remedy this problem, the method of multiple time-origins was undertaken
(discussed in section 4.2.2). Note that diffusion over these short time frames for
solute molecules is probably not accurate beyond providing the order of mag-
nitude of the diffusion coefficient; however, the trends should be instructive.
The blue lines in the Fig. 4.4 indicate the displacements of the solute inside a
COF-506 as calculated using the multiple time-origins method, while the black
line indicates a linear regression fit used to calculate their diffusivities. For com-
parison, the red curve denotes MSD of solutes as calculated using a single time-
origin method, which is much more inaccurate.
At 300 K and 1 atm, the self-diffusivities of He and CO2 in COF-506-Cu are
13 × 10−6 and 2.8 × 10−8 cm2/s, respectively (Fig. 4.5 (a) and (b)). The non-
interpenetrated structure of COF-506-Cu facilitates the free movement of small
guest molecules inside the weave. However, compared to its diffusion rate in
rigid COFs, such as TS-COFs and PI-COFs, or MOFs such as MOF-5, diffusion
in COF-506-Cu is slower by an order of magnitude. [136, 142]
Despite the large range in temperature, the diffusion coefficient changes un-
remarkably (e.g., ranging from 0.5 − 2.2 × 10−5 cm2/s over 300 K at 1 atm for
Helium). Pressure and temperature changes display little effect below 400 K.
The guest-accessible pores within the COF-506-Cu weave can undergo more
flexibility in terms of expansion and contraction in pore size due to its single
layer structure.
Self-diffusion of THF in COF-506-Cu (Fig. 4.5 (c)) at 300 K and 1 atm is
5 × 10−9 cm2/s, which is far slower than diffusion of He and CO2 in COF-506,
by factors of 600 and 60, respectively. Closer observation of THF’s diffusional
trajectories provides some valuable insights into its interactions with the COF-
85
86
5
Helium at 200K and 1×105Pa Co2 at 400K and 1000×10 Pa
Single Time Origin
Single Time Origin Multiple Time Origin
3000 Multiple Time Origin
D= 8.508e-06 2/ D= 1.349e-07 cm
2/s
cm s 40
2500
30
2000
1500 20
1000 10
500
0
0
0 1 2 3 4 5 0 1 2 3 4 5
time (ns) time (ns)
(a) (b)
THF at 300K and -1000×105Pa
30 Single Time Origin
Multiple Time Origin
D= 4.326e-08 cm2/s
25
20
15
10
5
0
0 1 2 3 4 5
time (ns)
(c)
Figure 4.4: Mean squared displacements (MSD) of solutes (a) Helium, (b) CO2
and (c) THF with respect to time. The red curves indicate the MSD measured
using a single time-origin, known not to be representative of the average result.
The blue graphs improve the statistics by measuring the MSD using multiple
(100s of) time-origins (explained in section 4.2.2). The drop-off at the end of the
simulations reflects the fewer origins left by that time and are not representative
of accurate values at that point.
r2 (Å2)
r2 (Å2)
r2 (Å2)
87
Helium
2.5 Pressure 1 ×105 Pa
Pressure 1000 ×105 Pa
2.0
1.5
1.0
0.5
0.0 200 250 300 350 400 450 500
Temperature (K)
(a)
CO2
7 Pressure 1 ×105 Pa
Pressure 1000 ×105 Pa
6
5
4
3
2
1
0 200 250 300 350 400 450 500
Temperature (K)
(b)
Diffusion Coefficient (10 7cm2/s) Diffusion Coefficient (10 5cm2/s)
88
THF
40
Pressure 1 ×105 Pa
35
30
25
20
15
10
5
200 250 300 350 400 450 500
Temperature (K)
(c)
Figure 4.5: (a) Self-diffusivity of He in COF-506-Cu from 200 to 500 K at two
pressures (1 atm (blue) and 1000 atm (green)). (b) Self-diffusivity of CO2 in
COF-506-Cu from 200 to 500 K and two pressure values as denoted by the inset
color key. (c) Self-diffusivities of THF in COF-506-Cu at 1 atm as a function of
temperature (200 to 500 K). Error bars represent the standard error. Dot-dashed
lines join the data points as a guide to the eye.
Diffusion Coefficient (10 9cm2/s)
89
Self-diffusivity at 1 bar Pressure
Temperature Helium CO2 THF
(K)
200 8.51e-06 2.03e-08 3.21e-09
300 1.30e-05 2.88e-08 5.00e-09
400 3.86e-06 2.28e-07 5.95e-09
500 2.26e-05 1.39e-07 3.88e-08
Table 4.1: Comparison of self-diffusivities of He, CO2, and THF within metal-
lated COF-506 over a range of temperatures at 1 bar pressure.
Self-diffusivity at 1000 bar Pressure
Temperature Helium CO2
(K)
200 2.22e-07 6.20e-09
300 6.93e-07 5.77e-08
400 1.25e-06 1.35e-07
500 2.98e-06 6.32e-07
Table 4.2: Comparison of self-diffusivities of He, CO2, and THF within metal-
lated COF-506 over a range of temperatures at 1000 bar pressure.
506-Cu weave. Despite encountering a somewhat tortuous path, He and CO2
can readily find a diffusional path through the COF weave. In contrast, the non-
aromatic (but cyclic) nature of THF, in concert with its lone pair of electrons on
the oxygen atom, results in strong binding between THF and copper ions, which
significantly constrains the mobility of the THF molecules. This electrostatic
effect is believed to play a more significant role than the size difference between
the two types of molecules.
Next, in order to understand the physics behind the diffusional behavior, we
conducted Arrhenius calculations for diffusivities of representative solutes, He,
CO2, and THF, inside COF-506-Cu. A typical Arrhenius Eq. looks like:
(−E )
D = D0 exp
a (4.13)
kt
where D0 is the pre-exponential factor and Ea is the activation energy. Gen-
erally, a lower activation energy means fewer barriers to diffusion. Thus the
solute molecules are more able to diffuse and, consequently, show much more
enhancement in their motion. Activation energies can exhibit a wide range of
values, for example 17.4 to 55 kcal/mol for diffusion of N2, CO2, and O2 in BCC,
FCC, HCP metals, [143] 4.6 to 6.9 kcal/mol for O2 diffusion in YBa2Cu3O7 grain
boundaries. [144] Our activation energies shown in tables 4.3, 4.4, and 4.5 fall
into that “ballpark.”
For all three solutes, the results exhibit noisy variations of the diffusion co-
efficient with the inverse of temperature (Fig. 4.6,) albeit all three solutes follow
typical Arrhenius behavior: (a) He diffusion at 1 bar is roughly invariant with
temperature, as one might expect for such a small solute moving in a flexible
90
91
Helium
11
12
13
14
5
15 Pressure 1 ×10  PaPressure 1000 ×105 Pa
20 25 30 35 40 45 50
104/T (K 1)
(a)
CO2
15
16
17
18
Pressure 1 ×105 Pa
19 Pressure 1000 ×10
5 Pa
20 25 30 35 40 45 50
104/T (K 1)
(b)
ln D (cm2/s) ln D (cm2/s)
92
THF
17.0
17.5
18.0
18.5
19.0
19.5
Pressure 1 ×105 Pa
20 25 30 35 40 45 50
104/T (K 1)
(c)
Figure 4.6: Plots depicting the Arrhenius relation between the self-diffusion
coefficient and the reciprocal of temperature for (a) He, (b) CO2, and (c) THF
diffusing in COF-506-Cu. The temperature varies from 200 to 500 K in 100 K
intervals. Dotted lines denote the best fit to the Arrhenius data.
ln D (cm2/s)
93
Helium
Pressure (bar) D (cm20 /s) Ea (kcal/mol)
1 5.55e-06 -0.15
1000 1.18e-05 1.62
Table 4.3: Activation energy and pre-exponential factor calculated for diffusion
of Helium in metallated COF-506. A negative value in the activation energy for
1 bar is attributed to artifacts in simulation.
CO2
Pressure (bar) D0 (cm2/s) Ea (kcal/mol)
1 7.77e-07 1.53
1000 7.64e-06 2.87
Table 4.4: Activation energy and pre-exponential factor calculated for diffusion
of CO2 in metallated COF-506.
THF
Pressure (bar) D0 (cm2/s) Ea (kcal/mol)
1 4.61e-08 0.83
1000 4.61e-08 0.61
Table 4.5: Activation energy and pre-exponential factor calculated for diffusion
of THF in metallated COF-506.
framework; (b) CO2 diffusion shows noisy, but roughly linear, relationships; (c)
the temperature-dependence of diffusion of THF in COF-506-Cu is consistent
with typical Arrhenius behavior, except at the lowest temperature where the
diffusion looks unexpectedly and unexplainably high. However, these are rel-
atively short timescale simulations for the diffusion of solute molecules, so it is
probably more prudent to only look at trends and not look for quantitative data
beyond the order of magnitude of the diffusion. Finding trends and order of
magnitude diffusion characteristics were the goals of the diffusion study here.
4.2.5 Pore Size Distribution Algorithm
We developed a new algorithm to fully characterize the pore size distribution
(PSD) for COF-506-Cu and expand our understanding of the morphology of the
weave. We first defined a spherical probe of radius 1.5 Å to check whether the
space inside the simulation cell is porous or not. We scanned over the entire area
in the simulation box and identified those regions which are not occupied by
atoms. Fig. 4.7 (a) shows a simulation snapshot with Helium diffusing through
porous pathways inside metallated COF-506. Fig. 4.7 (b) represents the idea of
the pore-finding methodology employed in this study. The schematic shows a
probe (in blue) that “rasters” across the simulation box and identifies any re-
gions not occupied by atoms (shown in yellow).
In order to be considered a porous space, we set the distance between the
probe point and the center of a neighboring atom to be greater than the sum of
(a) the atomic radius of the atom associated with that distance, (b) the radius of
the probe and (c) 1 Å, as shown below:
94
Atom Radius (Å)
C 0.77
H 0.31
N 0.70
Cu 1.173
B 0.80
F 0.64
Table 4.6: Atomic radii of each atom type in the COF-506-Cu system used to
calculate the porous spaces.
d = ratom + rprobe + 1 (4.14)
where ratom is the atomic radius of each atom type and rprobe is the radius of
the probe. These two terms are necessary to ensure that there is no overlap
between any atoms and the area being searched by the probe. A 1Å window
size is added to ensure that the point is far enough away from any neighboring
atoms that another atom could easily be inserted at that point. Table 4.6 shows
the atomic radii of each atom type present in the system.
If the point searched is determined to be within a porous space, the coordi-
nates of that point are stored, and then we search for the next point that does not
overlap with space. This next point is set up such that it lies at a distance of the
probe radius from the center of the previous probe. This “backtracking” is ap-
plied to ensure that we define the extent of the pores with as fine a resolution as
possible. A representation of such a probe mesh-grid is shown in Fig. 4.8. Once
all such porous spaces have been identified, we group them into contiguous
spaces, which then constitute pores. Note that these independent, contiguous
spaces will be disjointed from each other. After all the porous spaces have been
95
96
(a)
(b)
Figure 4.7: (a) A simulation snapshot of COF-506-Cu from MD calculations
showing patches of porous spaces inside the weave. Color key: Carbon in red,
hydrogen in blue, nitrogen in yellow, copper in white, boron in cyan, fluorine in
green, helium in orange. (b) A schematic depicting a probe in blue that rasters
the simulation box with atoms (in yellow) to identify the porous regions.
97
Probe Mesh Grid
Grid 1
12 Grid 2
10
8
6
4
2
0
−2
−4
−6
−8
−10
−12
−14−12−10 −8 −6 −4 −2 0 2 4 6 8 10 12 14
X
Figure 4.8: A two-dimensional representation of the probe mesh-grid used in
this study. Probes of radius 1.5 Å are depicted as grids 1 and 2 shown in colors
blue and red, respectively. The mesh grid is designed this way to ensure that
no region inside the simulation box is left untested while searching for porous
spaces.
Y
98
12.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5
8.0
8.0
7.5
7.0
8.0 7.5 6.57.0 6.5 6.0x ( 5.5Å 6.0) 5.5 5.0 5.0
(a)
12.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5
8.0
8.0
7.5
7.0
8.0 7.5 6.57.0
x 6.5
6.0
 (Å 6.0 5.5) 5.5 5.0 5.0
(b)
y ( yÅ  ) (Å)
z (Å) z (Å)
99
12.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5
8.0
8.0
7.5
7.0
8.0 7.5 6.57.0 6.5 6.0x ( 5.5Å 6.0) 5.5 5.0 5.0
(c)
12.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5
8.0
8.0
7.5
7.0
8.0 7.5 6.57.0
x 6.5
6.0
 (Å 6.0 5.5) 5.5 5.0 5.0
(d)
y ( yÅ  ) (Å)
z (Å) z (Å)
Figure 4.9: This Figure shows a small section of a bigger (actual) simulation
box. (a) A single porous sphere of radius 1.5 Å. This porous space is identified
by the probe as a volume not occupied by any atom. (b) Next, adjacent and
overlapping porous spaces are identified, as shown in the red, yellow, blue, and
green spheres. Since the probe size chosen for this method is 1.5 Å, these iden-
tified spheres have equal size and have a radius of 1.5 Å. (c) Then the extent of
the space covered in aggregate by the overlapping and adjacent porous spaces
(found in (b)) is identified. Yellow dots define the outer surface of this enclosed
porous volume. (d) Once this region has been identified, as in (c), a sphere is
fitted inside this space in order to calculate the radius of this spherical pore. Ra-
dius from all such discontiguous regions are calculated, analyzed, and plotted
as a histogram in Fig. 4.10.
100
101
60
50
40
30
20
5
10
15
y 20(Å) 25
30 0
10
35 20
30
40 40 x (Å)
50
(a)
8
7
6
5
4
3
2
1
0
6 8 10 12 14 16 18
Diameter (Å)
(b)
Figure 4.10: (a) Representation of pores located inside a COF-506-Cu weave pre-
dicted by computer simulation. (b) Pore size distribution for the same system
as illustrated in (a). Pore diameters were estimated assuming spherical pores.
The histogram shows that most pores are less than 10 Angstroms in diameter.
Count
z (Å)
separated into pores, any pores with a volume less than 500 Å3 are discarded,
as experimental techniques are not currently capable of identifying pores below
this size. The remaining pores are used to find the pore size distribution. A
step-by-step depiction of this algorithm is presented in Fig. 4.9. In addition, the
Python code developed here to encode this algorithm is provided in Appendix
B. Our algorithm provides the advantages of simplicity while providing high
accuracy over Monte Carlo-like [145] and grid-based [146] methods and can be
readily implemented for calculating pores in porous materials.
4.2.6 Structure of COF-506-Cu
These calculations form an integral part of understanding solute diffusion in-
side COF materials. Our calculations (Fig. 4.10) suggest that most (about 80%)
of the pores inside this metallated COF weave are between 6 and 10 Å in diam-
eter. A small fraction of the pores can be up to 20 Å in diameter. Pores of such
sizes allow small molecules such as He and CO2 to traverse the weave with rel-
ative ease; however, larger molecules, such as THF, will find it more difficult to
navigate the dense woven material. This significantly hindering the motion of
larger solutes inside these materials, making for a more tortuous path through
the woven material. These results help explain our self diffusivity results in
Figs. 4.5 and 4.6.
102
4.3 Structural Characterization of the HKUST-1 MOF Function-
alized with Pd Nanoparticles as a Composite
A long-standing challenge for the use of hydrogen as a transportation fuel has
been to design storage systems that efficiently and safely store the gas and that
allow its extraction at practical temperatures and pressure. [147] From all the
existing adsorbents posited for this task, MOFs, constructed by the coordina-
tion of metal atoms with organic linkers, are arguably the most promising class
of hydrogen-storage materials due to their large surface areas and pore vol-
umes. [148] Our collaborators, Prof. David Jimenez and Bethany Connolly at
Cambridge, UK, have used a high-performing MOF called HKUST-1 which, in
combination with palladium nanoparticles, has shown exceptional uptake ca-
pacity for clean fuels like hydrogen and methane gases (unpublished results).
Experiments show that the uptake of these gases increases as the percentage of
palladium nanoparticles is increased. This uptake trend is reversed when the
processing pressure is higher than 40 atm., a typical pressure-swing character-
istic.
Previous computational studies [149, 150] have shown promising results for
high-throughput screening of MOF candidates that have shown their high po-
tential for the capture of, and catalysis to decompose, toxic molecules. How-
ever, while this previous work has laid the foundation for molecular simulation
approaches to exploit, it has not tackled the problem of rational design of the
interface that is created between the MOF framework and the nanoparticles.
This is an important omission as the details of this interface are likely to be plan
a role, potentially a defining role, in the creation of MOF-nanoparticle designs
103
that attempt to optimize the storage of clean fuels, like hydrogen and natural
gas. Similarly, studies investigating the molecular mechanism behind the great
extent of gas uptake are limited by the current state-of-the-art lack of knowledge
of the nature of this interface.
With this aim in mind, we conducted a detailed atomic-scale MD study
to understand the molecular origin behind such experimentally observed phe-
nomena. We conduct pore size distribution calculations to characterize the over-
all morphology of a composite of HKUST-1 and Pd nanoparticles. Developing
this understanding is critical to explaining the dynamics of solute molecules,
especially hydrogen or carbon oxide or methane, inside these porous materi-
als, and also opens the door for a more conscious design of MOFs with tunable
structural properties, such as topology, pore size, shape, and surface chemistry.
4.3.1 System Details
All simulations were performed in LAMMPS [78] (Sandia’s open-source MD
code) using an OPLS [71] force field. OPLS reproduces the bulk properties of or-
ganic systems, typically with an excellent match to relevant experimental prop-
erty values. However, it did not contain all the necessary parameters required
to model all HKUST-1 and Pd nanocomposite interactions. All the remaining
parameters were calculated by my colleague, Ryan Heden, who is a graduate
student in Paulette Clancy’s research lab. The simulation set-up (shown in Fig.
4.11 (a) and (b)) is created by inserting a large-radius Pd crystal into the HKUST-
1 lattice and deleting any MOF monomers whose atoms are located within 3Å
of a Pd atom. That is, any atoms belonging to the MOF crystal that are uncom-
104
fortably close to the Pd nanoparticle are removed. The combined system is then
run for 500 picoseconds using a microcanonical NVE ensemble with a small 0.1
femtosecond as the time step. The time step is kept deliberately small for this
relaxation run to allow any atoms that are too close to the Pd surface to move
away. In an experimental setting, a Pd cube of size 10 nm in diameter is used.
However, given the constraints of our computational resources for such a rel-
atively huge Pd nanoparticle, we employed a small but representative model
that employed a Pd nanoparticle of diameter 5 nm to conduct our theoretical
calculations.
4.3.2 Pore Size Distribution of the Pd-MOF Composite
The pore size distribution (PSD) is one of the main properties used to charac-
terize microporous materials like MOFs. This property essentially determines
their permeability to solute molecules, the adsorption capacity, and the capacity
of these materials to be used for industrial applications that involve gas sepa-
ration, [108] gas storage, [151] catalysis, [152] or drug delivery. [153] Since all
the main applications for MOFs involve the adsorption of guest molecules, tai-
loring the size of the pores is critical to enable more effective use of the avail-
able pore volume. Establishing a relationship between the structure and the
storage/catalytic capacity is key to guiding rational design and synthesizing of
high-performing MOF materials.
We used an open source freely available software, Zeo++ (www.zeoplusplus.org),
[154] to conduct PSD calculations. The PSD algorithm developed in Section 4.2.5
works best for the case where the pore size distribution is similar in concept to
105
106
(a) (b)
Figure 4.11: (a) Side and (b) top views of a simulation snapshot of the HKUST-
1 and Pd nanoparticle system. Palladium shown in blue is placed inside a
HKUST-1 crystal to mimic the experimental setup. Color key: Carbon shown
in gray, copper in brown, oxygen in red, hydrogen in white, and palladium in
blue.
the cavities in Swiss cheese, where pores are strictly defined and discontiguous.
For highly porous systems like ours, where the pores are interconnected as seen
in Fig. 4.11, a more robust approach is needed to strictly isolate such regions
and calculate their distribution.
Fig. 4.12 shows the application of the PSD algorithm developed in Section
4.2.5 for the HKUST-1/Pd nanocomposite. As can be seen, the algorithm per-
forms well in identifying porous spaces. Our algorithm offers advantages over
other more complicated methods [145, 146] in terms of simplicity while provid-
ing high accuracy. However, the limitation of the approach we developed is
that it includes the ability to correctly coalesce the identified contiguous pores
when the nature of the material is exceedingly penetrable, as is the case in our
nanocomposite. Therefore, in this study, we felt that a more robust approach
than the one developed in this thesis, namely one that used a Monte Carlo sam-
pling method (like Zeo++) should be applied in this case.
The Zeo++ software is based on Voronoi decomposition, which provides a
graphical representation of void spaces in a periodic simulation box of atoms.
This resulting network is analyzed to generate diameters of the largest-included
sphere and the largest-free- sphere. The nodes in this network are analyzed
for accessibility, and the resulting information is used to conduct Monte Carlo
sampling of accessible surfaces (Fig. 4.13), volumes, pore size distribution. We
calculated, characterized, and identified accessible (void) regions and calculated
their corresponding sizes.
To assess the convergence of the method, we ran the Zeo++ algorithm on our
system (shown in Fig. 4.11) for a range of Monte Carlo sampling approaches.
From the results in Fig. 4.14, it is immediately evident that the computation
107
108
Figure 4.12: Colors representing the pores identified by the PSD algorithm cre-
ated in this thesis and described in Section 4.2.5. Due to the exceedingly pene-
trable nature of the HKUST-1 and Pd nanocomposite, the algorithm incorrectly
combines the contiguous porous spaces. Therefore, in this study, we have em-
ployed a more robust approach based on Monte Carlo sampling to conduct pore
size calculations.
109
Figure 4.13: Geometric representation of the polyhedral surface mesh (shown in
white) around HKUST-1 and a 5 nm-dia. Pd nanoparticle. The Zeo++ software
uses Voronoi decomposition to identify surface mesh and accessible volume in
the system of atoms and uses this information to calculate the pore size distri-
bution.
110
0.40 Samples=500Samples=5000
0.35 Samples=50000
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0 5 10 15 20 25 30
Pore width (Å)
Figure 4.14: Comparison of pore size distribution in the HKUST-1/Pd nanocom-
posite for different values (shown as Samples in the inset) of Monte Carlo sam-
ples per unit cell. The distribution starts to converge as the number exceeds
5000.
0.5
0.4
0.3
0.2
0.1
0.0
0 5 10 15 20 25 30
Pore width (Å)
Figure 4.15: Distribution of pore size analysis in the HKUST-1/Pd nanocom-
posite conducted over Monte Carlo samples of 5000 and 50000 (but not 500) per
unit cell.
Distribution Distribution
starts to converge as the number of Monte Carlo samples per unit cell are varied
from 5000 to 50000. The final results shown in Fig. 4.15 depict that the majority
(∼ 80%) of channels inside the material typically consist of pores width, pore
diameters, ranging from ∼ 11 to 13 Å. Smaller 6 Å- diameter cavities are also
present in the structure, typically between the larger cavities. Our results are in
agreement with the experimental pore size distribution in pure HKUST-1, where
cavities are known to range from ∼ 10 to 14 Å, along with some smaller pores of
size 6 Å. [155] This observation indicates that encapsulation of Pd nanoparticles
inside HKUST-1 crystals has little effect on its structural form. HKUST-1, in its
unadulterated form, serves as an excellent choice of candidate for gas storage
applications. With our new observation that the presence of the Pd nanoparticle
does not affect its structure, these nanoparticles act to provide additional active
sites for gas solute particles to bind to and hence react, depending on the nature
of the solute. These additional sites help explain the outstanding capacity of the
nanocomposite for significant capability for hydrogen gas uptake.
4.4 Conclusions
COF-506, with its unique property of being an elastic and stretchable porous
material, is emerging as a promising member of a new class of woven porous
(COF/MOF) materials. Its inherent flexibility, coupled with its ability to accom-
modate mechanical stress, make COF-506 and any of its future derivatives po-
tentially interesting materials to design for a pre-designed functionality. Using
the model developed for COF-505, we predicted COF-506 to be a more prac-
tical nanomaterial, both in terms of its flexibility and porosity, than COF-505.
The pore size distribution of metallated COF-506 was dominated by small pores
111
(80% lie in the 6-10 Å range of diameter), although a few (5%) can be up to 20
Å in diameter, as well as uncovering the locations of these pores. To conduct
diffusional studies in COF-506, we used the exact same set of parameters as for
COF-505 and demonstrated that, in contrast to COF-505, COF-506 showed re-
moval of the doubly interpenetrated layer in the extremely dense COF-506-Cu
weave significantly accelerates the diffusion of small molecules within its ma-
trix.
In addition, we analyzed the morphology of a high performing MOF called
HKUST-1 and Pd nanoparticle composite which has shown to provide excep-
tional uptake of hydrogen gas capacity. We used Zeo++ software to fully char-
acterize the internal volume of the composite and assess its pore size distribu-
tion. The algorithm takes into account both the accessible surface area and the
presence of solute molecules of the internal pore cavity to meaningfully gener-
ate a distribution. Majority (∼ 80%) of the pores exhibit a size range between
11 to 13 Å in addition to some smaller cavities of size ∼ 6 Å. This distribution
provides an ideal platform for small solute molecules like hydrogen to diffuse
through and deposit within its porous network.
112
CHAPTER 5
CONCLUSIONS AND FUTURE DIRECTIONS
5.1 Accomplishments
Organic materials are increasingly being deployed for applications such as gas
storage by adsorption and as active layers in a wide range of new electronic
technologies. They offer significant advantages, most notably being solution-
processable, lightweight, and flexible. With judicious functionalization, these
materials can be tuned to be suitable for incorporation into flexible displays,
biological or chemical sensors, wearable electronics, and for the capture of dan-
gerous materials, including chemical warfare agents. Single crystals of organic
semiconductors are particularly interesting to study because they offer simpli-
fications related to their lack of grain boundaries and their long-range periodic
order with the presence of minimal defects. These crystals, therefore, provide
an ideal platform for examining structure-property relationships, as well as the
ability to uncover the fundamental physics governing the behavior of organic
semiconductors. Organic frameworks, on the other hand, offer access to materi-
als exhibiting high specific surface area and pore volume. Framework materials
offer proven ability for high uptake capacity of solute gases like hydrogen and
methane, far exceeding the uptake capacity of some more traditional adsorbents
like zeolites and activated carbons.
In order to explain the above extraordinary properties of organic materials,
there is an increasing need to study these materials at a molecular and even
atomic scale. While there has been continuous development in experimental
techniques to characterize micro- and nano- scale morphologies, they are in-
113
herently slow. Computational techniques can offer powerful and accelerated
research capabilities to provide atomic-level insight. In this dissertation, we
have shown the use of all-atom Molecular Dynamics, Bayesian optimization-
based Machine Learning, and ab initio Density Functional Theory techniques
to accelerate the investigation of organic systems at an atomic/molecular level.
With an educated selection, combination, and validation of such models, these
computational methods can not only provide complementary studies to explain
experimental observations but also drive the design of new organic molecules
with tailored properties, thereby cutting down the time for experimental trial
and error in the laboratory.
Using such computational approaches, we briefly summarize the main ac-
complishments of this dissertation:
First, we tackled the prediction of polymorphs in small-molecules or-
ganic semiconductors. To do so, we developed an unsupervised Bayesian
optimization-based Machine Learning algorithm to interface with training data
from Molecular Dynamics and efficiently predict the existence and characteris-
tics of both thermodynamically stable and metastable crystalline structures. As
a test case, for a promising commercial target, a diTMS-BTBT organic semicon-
ductor, our algorithm was able to select viable candidates from a pool of hun-
dreds of potential structures by requiring to run simulations for only a quarter
of the possible candidates, resulting in a great reduction in computational cost.
Running a sequential search on complex systems like ours would be close to in-
tractable either computationally or experimentally. In addition, our algorithm
has the benefits of running in a completely self-directed manner, without the
need for user intervention. The accuracy of our method to accurately predict
114
polymorphs is ∼ 1-2 kcal mol-1, which is within the uncertainty of the Molecular
Dynamics data and sufficiently accurate to recommend promising candidates
for experimental investigation. We also found narrow ranges of lattice constants
that were preferred by stable polymorphs, rather than metastable or unstable
combinations of unit cell parameters. Consequently, our machine learning al-
gorithm was able to accurately predict unit cell parameters of both the experi-
mentally found polymorphs of diTMS-BTBT with excellent agreement between
the Bayesian-predicted unit cell parameters and experimental values. We also
suggest that a third polymorph might exist, albeit existing at a significantly
higher energy, and hence would need to be stabilized at higher temperatures
with slower heating-cooling cycles. It is worth noting that our Bayesian model
is highly scalable and can accommodate the inclusion of multiple information
sources or multiple objective functions. The model could also be extended to
study polymorphism in therapeutic applications, such as the pharmaceutical in-
dustry where polymorphs constitute a major issue for reproducibility and con-
trol of properties and therapeutic effectiveness. Finally, our algorithm showed
the existence of a “forest” of other kinetically stabilized polymorphs, separated
by energies well below kT. While these may not be identifiable experimentally
due to the tiny energetic differences between them, the richness of this situation
should cause us to rethink our views on polymorphic existence and character-
istics.
In a complementary study, we uncovered the molecular-scale mechanism
underlying polymorphic transitions in related organic semiconductors. Such
transitions have been posited as potentially ultra-low energy “switches” in
electronic devices, which could be transformational. For this study, we con-
ducted plane wave DFT calculations and investigated two distinct transition
115
pathways: nucleation and growth and cooperative transitions. First, through
a single atomic substitution in the bulky side-chains of BTBT-based organic
semiconductors (from Si to C), we compared the characteristic polymorphic
transitions in two high-performing organic electronic systems: ditBu-BTBT and
diTMS-BTBT. These molecules were recently exposed by experimental studies
by Ying Diao’s group at UIUC to follow either a so-called “cooperative” tran-
sition (e.g., ditBu-BTBT) or a “nucleation and growth” polymorphic transition
pathway (e,g., diTMS-BTBT). Substitution of the carbon atom in bulky side-
chains of ditBu-BTBT with a larger-diameter silicon atom resulted in a signif-
icant, 22%, increase in the bond lengths of diTMS-BTBT which subsequently
altered the molecular packing environment. This change also resulted in the
formation of two pairs of so-called “short contacts” associated with the side-
chains in diTMS-BTBT, in contrast to their absence in ditBu-BTBT. In this study,
we tested our hypothesis of side-chain rotation driving cooperative transitions
in crystals. We used DFT calculations to develop a molecular-level understand-
ing of this phenomenon and simulated energy barriers for side-chain rotation
in the two systems to quantify the drastic difference in the side-chain pack-
ing environments. Before the transition, the rotational energy was ca.18 kcal
mol-1 for the diTMS α form of the side-chain, compared to a much lower ca.3
kcal mol-1 for the ditBu LT (low-temperature) form of the side-chain. After the
transition, the rotational energy was ca.9 kcal mol-1 for the diTMS β side-chain
and ca.2 kcal mol-1 for the ditBu HT (high-temperature) side-chain. These re-
sults confirmed that the significant change in the packing environment caused
by this single-atom change was sufficient to significantly restrict side-chain ro-
tation in diTMS crystals. Using this molecular design rule, we can selectively
leverage the advantages of both types of transition pathways in organic elec-
116
tronic devices. The nucleation and growth mechanism allows kinetic access to
metastable polymorphs at ambient conditions, while a cooperative transition in-
duces what is known as a “function memory” effect, leading to significant and
reversible changes to the charge carrier mobility of thin film transistors. [3] This
opens the door to selectively using a cooperative transition in organic electron-
ics industry to design next-generation smart functional materials that display a
“shape memory” effect.
In a second major thrust of this dissertation, we moved to the study of larger-
scale porous materials, we studied different exemplars of Organic Framework
materials. We characterized and evaluated the properties of novel Covalent Or-
ganic Framework materials and their generally larger-pored “cousins” Metal
Organic Frameworks (COFs and MOFs, respectively) using Molecular Dynam-
ics. Our investigation explored the propensity of both (1) the newly synthe-
sized COF-506 as well as (2) a monolithic HKUST-1 (MOF) functionalized with
Palladium nanoparticles to be suitable for small-molecule gas diffusion within
their densely interwoven matrix of structural entities. For the rare woven COF-
506 material, our results showed that its inherent flexibility (due to its woven
nature) allowed significant diffusion of gas molecules. Despite their investiga-
tion over a wide range of pressures and temperatures (below 400 K), we found
that these common variables have an unremarkable impact on solute gas dif-
fusivity. Self-diffusivity of THF at 300 K and 1 atm was slower by factors of
600 and 60 than diffusion of He and CO2, respectively. However, this obser-
vation is attributed to strong electrostatic interactions between its lone pair of
electrons and copper ions rather than on the size difference between the two
types of molecules. COF-506 could potentially be used as an effective trap for
selective small solute molecules. Using a method developed here, we found
117
that the pore size distribution of metallated COF-506 was dominated by small
pores (80% lie in the 6-10 Å range of diameter), although a few pores (5%) can
be up to 20 Å in diameter. We were also able to uncover the locations of these
pores within the woven matrix. These findings could potentially be exploited in
the design of new molecular-scale woven materials as a means to help improve
either the diffusion or selectivity of separation of solute gases. As an interest-
ing complement for a material possessing a larger scale of pore sizes, we also
characterized the structure of a high-performing functionalized HKUST-1 MOF
embedded with 10 nm-diameter Palladium nanoparticles. We conducted pore
size distribution calculations and found that functionalizing the MOF with Pal-
ladium nanoparticles had a surprisingly small impact on HKUST-1’s pore size
distribution. The majority (∼ 80%) of the pores displayed a size range between
11 to 13 Å in addition to some smaller cavities of size ∼ 6 Å. Our prediction of
the pore size distribution agrees strikingly well with experimental suggestions
that pure HKUST-1 exhibits cavities alternating in size between 14 Å and 11 Å
in diameter, along with smaller pores of 5 Å. [155] This distribution provides an
ideal platform for small solute molecules like hydrogen to diffuse through and
deposit within its porous network, making it an effective choice of material for
high density clean fuel storage.
5.2 Future Work
Below, we suggest some natural extensions of the studies conducted in this the-
sis.
118
5.2.1 Bayesian Optimization for Polymorph Prediction
An Edisonian approach to discovery remains a common approach in scientific
research. However, it is clearly an inefficient and resource-intensive manner in
which to conduct a search within a representative sample space. One such ex-
ample is the case of finding polymorphs in organic semiconductors. In this
thesis and as noted above, we have developed a Bayesian optimization ap-
proach that narrows down this search space from the intractable to the acces-
sible. In our case, we accurately predicted the existence and properties of poly-
morphs in a high-performing organic semiconductor, diTMS-BTBT. We worked
towards a goal of predicting the lattice constant values associated with thermo-
dynamically stable and metastable polymorphs. We varied the input vector in
six dimensional real space as: x = [a, b, c, α, β, γ] and measured the similarity
between these points using the Squared Exponential kernel. This choice of ker-
nel works very well for handling smoothly varying functions like ours. One
way to extend our approach is to predict the atomic arrangement inside the unit
cell of stable and metastable polymorphs. The way to implement this idea is
to formulate a “molecular fingerprint” representation for each atom in order to
depict its neighborhood environment. A suitable kernel would then compare
the environments of each atom with other atoms inside the same inside unit
(local variation) and with atoms in different training set candidate structures
(to model global variations). Naturally, this would constitute a computationally
intensive process since, in our test case for example, the dimensionality of the
model would far exceed six and is proportional to the number of atoms present
inside the simulation cell. A potential choice of the kernel for comparing such
atomic neighborhoods is called Smooth Overlap of Atomic Position (SOAP)
kernel. [156] This kernel measures the structural similarity between candidate
119
structures by encoding regions of atomic geometries using a local expansion of a
Gaussian smeared atomic density. It is formulated as the average squared over-
lap of the smooth atom densities constructed as the superposition of Gaussian
functions. The SOAP kernel is positive-definite and systematically combines a
detailed description of atomic structures. Our model can be extended to gen-
erate local neighborhood environments for each atom which can then be com-
pared using the SOAP kernel to predict the arrangement of atoms inside crystal
cell of stable and metastable polymorphs, in advance of experimental evidence.
Additionally, our model can be extended to studying biological systems, es-
pecially in the pharmaceutical industry, where more than 50% of active com-
pounds are estimated to have more than one polymorphic form. [157, 158]
For example, our model can be incorporated to provide complementary stud-
ies to Bernhard Trout’s group at MIT work for the study of biologically rele-
vant polymorphs, including l-glutamic acid system [159] and enantiotropic p-
aminobenzoic acid [160], for accelerated polymorph discovery.
Our input model can also be expanded to include experimental processing
conditions like temperature, shear, and solvent effect to predict stable poly-
morphs. This can be achieved by developing a linear combination of variable
components. [161] Here we assume that the surrogate model that represents the
energy function of the polymorphs would be a linear combination of lattice con-
stants, temperature, shear, solvent effect which contribute to the total energy in
an additive manner.
∑d
E(x) = αixi + β(xσ) + f (x , xρ) (5.1)
i=1
120
Here the input x is now a ten-dimensional vector, x = [x1, x2, x3, x4, x5, x6,
x7,x8, x9,x10]. x1:d where d = 7 are real values that together indicate the combina-
tion of six lattice parameters and temperature. x8 (xσ in Eq. 5.1) is a real-valued
representation of pressure. x9 and x10 are real-valued descriptors of solvent with
x9 representing the dielectric () and x10 the density (ρ).
αi (i= 1 to 7) is the length scale associated with the contribution to the total
energy made by the combination of lattice constants and temperature. β(xσ = x8)
captures the deviation of E(x) due to the shear. f (x , xρ) captures the contribution
of the solvent to the total energy (x = x9 and xρ= x10). Here we assume that αi
and β are normally distributed as αi ∼ N(µα, σα) and β ∼ N(µβ, σβ), and the
solvent contribution, f , is drawn from an independent Gaussian Process with
prior mean function µ0 and covariance Σ0.
5.2.2 Covalent and Metal Organic Framework Materials
A major obstacle in studying interfacial interactions in complex systems, such as
the HKUST-1 MOF functionalized with relatively large palladium nanoparticles
is the calculation of effective dimensionality and identification of suitable order
parameters describing the low-dimensional intrinsic features to which the sys-
tem dynamics are effectively restrained. Nonlinear dimensionality reduction
techniques like “diffusion maps” [162] can help to understand the interfacial
interaction energy between parts of the MOF and the Pd nanoparticles and un-
cover the preferred orientation by which they align together. The idea here is
to extract dynamically relevant, low-dimensional descriptors of the MOF and
Pd nanoparticle monolith. This technique would allow us to identify the prin-
121
cipal order parameters in the system describing its evolution from one state to
another. The newly found order parameters can then be employed to calculate
the free energy of interaction, allowing us to locate the local minimum energy
pockets on the free energy surface and understand the orientation of the com-
bined MOF/Pd system. In this way, we can also probe the energy surface to
develop an understanding of which constituent elements of MOFs like to be in
proximity to a Pd nanoparticle.
Additionally, the studies to investigate the uptake of solute gases can be en-
hanced using techniques like Nudged Elastic Band (NEB) [163] to find mini-
mum energy pathways for reactions between solute gases and the MOF and
palladium composite. NEB is a powerful method for generating energy barriers
and reaction pathways between known reactants and products. In this method,
multiple geometry optimizations of the intermediate images between the re-
actants and products are performed in parallel such that each optimization is
coupled to the adjacent ones by harmonic constraints, which keeps the struc-
tures geometrically linked. NEB can be used in our system to construct a reac-
tion profile of hydrogen and methane uptake by HKUST-1 and the palladium
nanoparticle composite. Then systematically uncover the reaction coordinates
along with predicting the transition states for hydrogen and methane uptake.
122
APPENDIX A
3D T-SNE AND K-MEANS CLUSTERING OF POLYMORPHS
Alpha phase
Beta phase
200
100
0
100
200
200
100
200 0 2
100 E
t 0 100 N-SNE1100
S
200 200
t-
300
Figure A.1: t-SNE low-dimensional embedding of our six-dimensional data.
Data points are color coded based on the clusters discovered by k-means clus-
tering algorithm. Orange denotes the cluster (polymorph) 1 that contains the
experimentally found stable beta phase, blue indicates cluster (polymorph) 2
that contains the experimentally found metastable alpha phase, and green indi-
cates the derivative polymorph of cluster 1.
123
t-SNE3
APPENDIX B
PYTHON CODE FOR PORE SIZE DISTRIBUTION ALGORITHM
DISCUSSED IN SECTION 4.2.5
1 ### This main code calls other dependent scripts ###
2
3 import numpy as np
4 import os
5 import sys
6 import itertools
7 import math
8 import porefind as pf
9 import adjacentPorefind as apf
10 import biggerPorefind as bpf
11 import overallPorefind as opf
12 import surfacePorefind as spf
13 import fitSpherePore as fsp
14 import makeHistoPore as mhp
15 import makeSphereHisto as msh
16
17 probeRadSize=1.5
18 atomName=’CO2’
19 atomType=[10,11]
20 xyzFilePath=’look.xyz’
21
22 data_directory=’’ #path to data directory
23 figure_directory=’’#path to where plots are stored
24
25 #Find pores
26 poresPositionFile=pf.main(probeRadSize,xyzFilePath,atomType,atomName)
27 if ’%s_poresPosition_%s_probeSize.data’%(atomName,probeRadSize) not
in os.listdir(data_directory) \
124
28 or os.stat(data_directory+’/%s_poresPosition_%s_probeSize.data’%(
atomName,probeRadSize)).st_size==0:
29
30 poresPositionFile=pf.main(probeRadSize,xyzFilePath,atomType,
atomName)
31 os.system(’mv %s_poresPosition_%s_probeSize* ’%(atomName,
probeRadSize)+data_directory)
32 print(’Pores are found\n’)
33 else:
34 poresPositionFile=’%s_poresPosition_%s_probeSize.data’%(atomName,
probeRadSize)
35 print(’Pores were already calculated\n’)
36
37
38
39 #Club pores to find adjacent pores
40 if ’%s_poresAdjacent_%s_probeSize.data’%(atomName,probeRadSize) not
in os.listdir(data_directory) or \
41 os.stat(data_directory+’/%s_poresAdjacent_%s_probeSize.data’%(
atomName,probeRadSize)).st_size==0:
42
43 poresAdjacentFile=apf.main(data_directory+’/’+poresPositionFile,
probeRadSize,atomName)
44 os.system(’mv %s ’%poresAdjacentFile+data_directory)
45 print(’Adjacent Pores are found\n’)
46 else:
47 poresAdjacentFile=’%s_poresAdjacent_%s_probeSize.data’%(atomName,
probeRadSize)
48 print(’Adjacent Pores were already calculated\n’)
49
50
51
125
52 #Club adjacent pores to make bigger pores
53 if ’%s_poresBigger_%s_probeSize.data’%(atomName,probeRadSize) not in
os.listdir(data_directory) or \
54 os.stat(data_directory+’/%s_poresBigger_%s_probeSize.data’%(atomName,
probeRadSize)).st_size==0:
55
56 poresBiggerFile=bpf.main(data_directory+’/’+poresAdjacentFile,
probeRadSize,atomName)
57 os.system(’mv %s ’%poresBiggerFile+data_directory)
58 print(’Bigger Pores are found\n’)
59 else:
60 poresBiggerFile=’%s_poresBigger_%s_probeSize.data’%(atomName,
probeRadSize)
61 print(’Bigger Pores were already calculated\n’)
62
63
64
65 #Club bigger pores to find overall pores
66 filesOverAll=[v for v in os.listdir(data_directory) if ’%
s_poresOverall_%s_probeSize’%(atomName,probeRadSize) in v]
67 if len(filesOverAll)==0 or 0 in [os.stat(data_directory+’/’+f).
st_size for f in filesOverAll]:
68
69 poresOverallFiles=opf.main(data_directory+’/’+poresBiggerFile,
probeRadSize,atomName,data_directory)
70 os.system(’mv %s_poresOverall_%s_probeSize* ’%(atomName,
probeRadSize)+data_directory)
71 print(’Overall Pores are found\n’)
72 else:
73 poresOverallFiles=filesOverAll
74 print(’Overall Pores were already calculated\n’)
75
126
76
77
78 #Find outer surface coordinates of overall pores
79 filesOuterSurface=[v for v in os.listdir(data_directory) if ’
_OuterSurface’ in v]
80 if len(filesOuterSurface)==0 or 0 in [os.stat(data_directory+’/’+f).
st_size for f in filesOuterSurface]:
81
82 poresSurfaceFiles=spf.main(poresOverallFiles,poresPositionFile,
probeRadSize,atomName,data_directory)
83 os.system(’mv *_OuterSurface* ’+data_directory)
84 print(’Surface of Pores are found\n’)
85 else:
86 poresSurfaceFiles=filesOuterSurface
87 print(’Surface of Pores were already calculated\n’)
88
89
90
91 #Fit spheres inside the outer surface
92 filesRadius=[v for v in os.listdir(data_directory) if ’
_RadiusInformation’ in v]
93 if len(filesRadius)==0 or 0 in [os.stat(data_directory+’/’+f).st_size
for f in filesRadius]:
94
95 poresRadiusFiles=fsp.main(poresSurfaceFiles,probeRadSize,
data_directory,atomName)
96 os.system(’mv *_RadiusInformation* ’+data_directory)
97 print(’Radius of spheres are found\n’)
98 else:
99 poresRadiusFiles=filesRadius
100 print(’Radius are already calculated\n’)
101
127
102
103
104 #Plot Histogram of radii
105 filesHistoPlot=[v for v in os.listdir(figure_directory) if ’histo’ in
v]
106 if len(filesHistoPlot)==0 or 0 in [os.stat(data_directory+’/’+f).
st_size for f in filesHistoPlot]:
107 poresRadiusFiles=[’CO2_RadiusInformation_1.5_probeSize_4.data’]
108 poresRadiusFiles=[’CO2_RadiusInformation_1.5_probeSize_4_Unique.data’
]
109 mhp.main(poresRadiusFiles,probeRadSize,data_directory,atomName)
110 os.system(’mv *histo* ’+figure_directory)
111 print(’Histograms are found\n’)
112 else:
113 print(’Histograms were already calculated\n’)
114
115
116
117 #Plot Spheres of Histogram
118 filesSpherePlot=[v for v in os.listdir(figure_directory) if ’_sphere’
in v]
119 if len(filesSpherePlot)==0 or 0 in [os.stat(data_directory+’/’+f).
st_size for f in filesSpherePlot]:
120 poresRadiusFiles=[’CO2_RadiusInformation_1.5_probeSize_4.data’]
121 poresRadiusFiles=[’CO2_RadiusInformation_1.5_probeSize_4_Unique.data’
]
122 msh.main(poresRadiusFiles,probeRadSize,data_directory,atomName)
123 os.system(’mv *_sphere* ’+figure_directory)
124 print(’Spheres are drawn\n’)
125 else:
126 print(’Spheres were already drawn’)
128
1 ### porefind.py: This code finds pores of probeSizeRad for one
timestep #####
2
3 import numpy as np
4 import os
5 import sys
6 import itertools
7 import math
8
9 def distance(probe,coords):
10 for i in range(len(coords)):
11 coord=np.array([coords[i][1],coords[i][2],coords[i][3]])
12 dist=np.linalg.norm(coord-probe)
13 coords[i][4]=dist
14 coordSortedDist=coords[coords[:,4].argsort()]
15 return(coordSortedDist)
16
17 #probeCoords,atomCoords,atomicRadii,probeRadSize
18 def poreFindCalc(probeCoords,coords,atomicRadii,probeRadSize):
19 poresPosition=[]
20 maxCutoff=max(np.array([atomicRadii[j+1][1] for j in range(9)])+
probeRadSize+1)
21 print(’MaxCutOff’,maxCutoff)
22 timesLoopRan=0
23 for i in range(len(probeCoords)):
24
25 probe=probeCoords[i]
26 coordSortedDist=distance(probe,coords)
27
28 for j in range(len(coordSortedDist)):
29 timesLoopRan+=1
30 atomicRadius=atomicRadii[int(coordSortedDist[j][0])][1]
129
31 cutoff=probeRadSize+atomicRadius+1
32
33 if coordSortedDist[j][4]<=cutoff:
34 break
35 elif coordSortedDist[j][4]>1.5*maxCutoff:
36 poresPosition.append([probeCoords[i][0],probeCoords[i][1],
probeCoords[i][2]])
37 break
38
39 print(’Number of times the loop actually ran’,timesLoopRan)
40 return poresPosition
41
42 def main(probeRadSize,xyzFilePath,atomType,atomName):
43 filePath=xyzFilePath
44 timestep=0
45 atomCoords=[]
46 xyzfile=open(filePath)
47 for row in xyzfile:
48 if len(row.split())==3 and row.split()[2]==’5000000’:#timestep
for which the data was colledted
49 timestep=1
50 continue
51 if len(row.split())==4 and int(row.split()[0]) not in atomType
and timestep==1:
52 atomCoords.append([row.split()[0],float(row.split()[1]),float(
row.split()[2]),float(row.split()[3]),0.0])
53 if timestep==1 and len(row.split())!=4:
54 timestep+=1
55 break
56 xyzfile.close()
57 atomCoords=np.array(atomCoords)
58 atomCoords=atomCoords.astype(float)
130
59 print(np.shape(atomCoords))
60 # Define unique elements excluding solutes
61 atomtype_list=[’C’, ’H’, ’N’, ’Cu’, ’C’, ’C’, ’N’, ’B’, ’F’]
62 unique_elements,counts_elements = np.unique(np.array([x[0] for x in
atomCoords]),return_counts=True)
63
64 atomicRadii={1:[’C’,0.77],2:[’H’,0.31],3:[’N’,0.70],4:[’Cu’
,1.173],5:[’C’,0.77],6:[’C’,0.77],7:[’N’,0.70],\
65 8:[’B’,0.80],9:[’F’,0.64]}
66
67 # Define Pore Mesh
68 xlow,xhigh=(float(min([x[1] for x in atomCoords])),float(max([x[1]
for x in atomCoords])))
69 ylow,yhigh=(float(min([x[2] for x in atomCoords])),float(max([x[2]
for x in atomCoords])))
70 zlow,zhigh=(float(min([x[3] for x in atomCoords])),float(max([x[3]
for x in atomCoords])))
71
72 print(’\nMin and max dimensions’,’X’,xlow,xhigh,’Y’,ylow,yhigh,’Z’,
zlow,zhigh)
73
74 #Probe Coords Meghgrid 1
75 probeCoords1=np.array([[i*probeRadSize,j*probeRadSize,k*
probeRadSize] for i,j,k in itertools.product(\
76 range(int(math.ceil(xlow/probeRadSize)),int(math.ceil(xhigh/
probeRadSize)),probeRadSize),range(int(math.ceil(ylow/probeRadSize
)),int(math.ceil(yhigh/probeRadSize)),\
77 probeRadSize),range(int(math.ceil(zlow/probeRadSize)),int(math.
ceil(zhigh/probeRadSize)),probeRadSize))])
78
79
80 #Probe Coords Meshgrid 2
131
81 probeCoords2=np.array([[i*probeRadSize,j*probeRadSize,k*
probeRadSize] for i,j,k in itertools.product(\
82 range(int(math.ceil((xlow+1)/probeRadSize)),int(math.ceil((xhigh
+1)/probeRadSize)),probeRadSize),range(int(math.ceil((ylow+1)/
probeRadSize)),int(math.ceil((yhigh+1)/probeRadSize)),\
83 probeRadSize), range(int(math.ceil((zlow+1)/probeRadSize)), int
(math.ceil((zhigh+1)/probeRadSize)),probeRadSize))])
84
85 print(np.shape(probeCoords1), np.shape(probeCoords2))
86 print(probeCoords1)
87 print(probeCoords2)
88 probeCoords = np.concatenate((probeCoords1, probeCoords2))
89 print(’Shape of ProbeCoords’,np.shape(probeCoords))
90 print(’Max times this loop will run’,len(probeCoords)*len(
atomCoords),’\n’)
91 poresPosition=poreFindCalc(probeCoords,atomCoords,atomicRadii,
probeRadSize)
92
93 temp=sys.stdout
94 filename=’%s_poresPosition_%s_probeSize.data’%(atomName,
probeRadSize)
95 os.system(’rm %s’%(filename))
96 gfile=open(filename,’a’)
97 print(’Number of individual pores’,len(poresPosition))
98 for i in range(len(poresPosition)):
99 gfile.write(’%s %s %s %s\n’%(i+1,poresPosition[i][0],
poresPosition[i][1],poresPosition[i][2]))
100 gfile.close()
101 return(filename,fileNameFinal)
132
1 #### adjacentPorefind.py: This code clubs the adjacent pores found by
porefind.py together ####
2
3 import numpy as np
4 import os
5 import sys
6 import itertools
7 import math
8
9 def read(poresPositionFile,probeRadSize):
10 smallPores=[]
11 porefile=open(poresPositionFile)
12 for p in porefile:
13 if len(p.split())==4:
14 smallPores.append([int(p.split()[0]),float(p.split()[1]),float(
p.split()[2]),float(p.split()[3]),probeRadSize])
15 print(’Shape of individual pores’,np.shape(smallPores))
16 smallPores=np.array(smallPores)
17 return smallPores
18
19 def distance(testPore,otherPores):
20 distMatrix=[]
21 pore=np.array([testPore[1],testPore[2],testPore[3]])
22 for i in range(len(otherPores)):
23 coord=np.array([otherPores[i][1],otherPores[i][2],otherPores[i
][3]])
24 dist=np.linalg.norm(coord-pore)
25 distMatrix.append([otherPores[i][0],otherPores[i][1],otherPores[i
][2],otherPores[i][3],dist])
26 distMatrix=np.array(distMatrix)
27 distMatrix=distMatrix[distMatrix[:,4].argsort()]
28
133
29 return(distMatrix)
30
31
32 def clubPores(smallPores,probeRadSize):
33 adjacentPores=[]
34 for testPore in smallPores:
35 distSortedMatrix=distance(testPore,smallPores)
36 j=0
37 neighbors=[]
38 while distSortedMatrix[j][4]<=probeRadSize:
39 neigh=int(distSortedMatrix[j][0])
40 neighbors.append([neigh,smallPores[neigh-1][1],smallPores[neigh
-1][2],smallPores[neigh-1][3]])
41 j+=1
42 adjacentPores.append([int(testPore[0]),neighbors])
43 print(’Shape of adjcent pore’,np.shape(adjacentPores))
44 return adjacentPores
45
46
47 def writeToFile(adjacentPores,probeRadSize,atomName):
48 temp=sys.stdout
49 filename=’%s_poresAdjacent_%s_probeSize.data’%(atomName,
probeRadSize)
50 os.system(’rm %s’%(filename))
51 gfile=open(filename,’a’)
52 for i in range(len(adjacentPores)):
53 gfile.write(’%s %s\n’%(adjacentPores[i][0],adjacentPores[i
][1]))
54 gfile.close()
55 return(filename)
56
57 ##########################
134
58 def main(poresPositionFile,probeRadSize,atomName):
59 smallPores=read(poresPositionFile,probeRadSize)
60 adjacentPores=clubPores(smallPores,probeRadSize)
61 filename=writeToFile(adjacentPores,probeRadSize,atomName)
62 return(filename)
135
1 #### biggerPorefind.py: This code combines the adjacent pores found
by adjacentPorefind.py into bigger bubbles ####
2
3 import numpy as np
4 import math
5 import os
6 import sys
7 import matplotlib.pyplot as plt
8 from mpl_toolkits.mplot3d import Axes3D
9
10 def read(poresAdjacentFile):
11 fileAdjacentPores=open(poresAdjacentFile)
12 bubble=[]
13 neighboursPore=[]
14 singlePore=[]
15
16 for line in fileAdjacentPores:
17 try:
18 fragmentPore=line[line.find("[[")+len("[["):line.find("],")]
19 singlePore.append([float(fragmentPore.split()[k][:-1]) for k in
range(len(fragmentPore.split()))])
20
21 except ValueError:
22 fragmentPore=line[line.find("[[")+len("[["):line.find("]]")+1]
23 print(fragmentPore.split(),[float(fragmentPore.split()[k][:-1])
for k in range(len(fragmentPore.split()))])
24 singlePore.append([float(fragmentPore.split()[k][:-1]) for k in
range(len(fragmentPore.split()))])
25 neighboursPore.append([])
26 bubble.append([singlePore[-1],[]])
27 continue
28
136
29 line=line.replace(fragmentPore.split()[0][:-1]+’ [[’+
fragmentPore+’],’,"")
30 neighbours=[]
31 while len(line.split())!=2:
32 fragNeigh=line[line.find(’[’)+len(’[’):line.find(’]’)]
33 neighbours.append([float(fragNeigh.split()[k][:-1]) for k in
range(len(fragNeigh.split()))])
34 if len(line.split())==4:
35 break
36 line=line.replace(’[’+fragNeigh+’], ’,"")
37 neighboursPore.append(neighbours)
38 bubble.append([singlePore[-1],neighbours])
39 return(bubble,singlePore,neighboursPore)
40
41
42 ### Calculation of a bigger pore 2
43 def clubPores(probeRadSize,bubble,singlePore,neighboursPore,atomName)
:
44 temp=sys.stdout
45 filename=’%s_poresBigger_%s_probeSize.data’%(atomName,probeRadSize)
46 os.system(’rm %s’%(filename))
47 gfile=open(filename,’a’)
48 checked=[]
49 clubbedPores2=[]
50
51 for i in range(len(bubble)):
52 biggerPores=[]
53 if singlePore[i][0] not in checked:
54 checked.append(singlePore[i][0])
55 biggerPores.append(singlePore[i][0])
56 gfile.write(’%s ’%(singlePore[i][0]))
57 pore1=neighboursPore[i]
137
58
59 for j in range(len(pore1)):
60 checked.append(pore1[j][0])
61 biggerPores.append(pore1[j][0])
62 gfile.write(’%s ’%(pore1[j][0]))
63
64 pore2=neighboursPore[int(pore1[j][0])-1]
65 for k in range(len(pore2)):
66 if pore2[k][0] not in biggerPores:
67 biggerPores.append(pore2[k][0])
68 gfile.write(’%s ’%(pore2[k][0]))
69 gfile.write(’\n’)
70 clubbedPores2.append(biggerPores)
71 gfile.close()
72 print(’Shape of bigger pores’,np.shape(clubbedPores2))
73 return(filename)
74
75 def main(poresAdjacentfile,probeRadSize,atomName):
76 bubble,singlePore,neighboursPore=read(poresAdjacentfile)
77 filename=clubPores(probeRadSize,bubble,singlePore,neighboursPore,
atomName)
78 return(filename)
138
1 #### overallPorefind.py: This code combines the contiguous bubbles
found by biggerPorefind.py ####
2
3 import numpy as np
4 import math
5 import os
6 import sys
7
8 def calc(poresBiggerFile,probeRadSize,atomName,data_directory):
9 clubbedPores2=[]
10 fileClubbedPore2=open(poresBiggerFile)
11 for line in fileClubbedPore2:
12 if line.strip()!=’’ and len(line.split())!=1:
13 clubbedPores2.append([float(w) for w in line.split()])
14 print(’Shape of Bigger Pores’,np.shape(clubbedPores2),’\n’)
15 #clubbedPores contains the list of a bigger bubble made of adjacent
and their neighbour pores
16 ### clubbing of final pores
17 overlap=[5,10,25,30,40,50]
18 filesToReturn=[]
19 run=True
20 timesLoopRun=0
21 while run is True and timesLoopRun<len(overlap):
22 timesLoopRun+=1
23 print(’Overlap’,overlap[timesLoopRun-1])
24 print(’Loop run #’,timesLoopRun)
25 if ’%s_poresOverall_%s_probeSize_%s.data’%(atomName,probeRadSize,
timesLoopRun-1) in os.listdir(data_directory):
26 clubbedPores=[]
27 for line in fread:
28 if line.strip()!=’’:
29 clubbedPores.append([float(w) for w in line.split()])
139
30 else:
31 clubbedPores=np.array(clubbedPores2)
32 print(’Shape of clubbedPores’,np.shape(clubbedPores))
33 clubbedPores2=[]
34 checked=[]
35 countPore=[]
36 temp=sys.stdout
37 filename=’%s_poresOverall_%s_probeSize_%s.data’%(atomName,
probeRadSize,timesLoopRun)
38 filesToReturn.append(filename)
39 os.system(’rm %s’%(filename))
40 gfile1=open(filename,’a’)
41
42 for k in range(len(clubbedPores)):
43 checkPore=clubbedPores[k]
44 if checkPore[0] not in checked:
45 checked.append(checkPore[0])
46 count=0
47 addPore=[]
48 for value in [v for v in clubbedPores if v!=checkPore]:
49 if len(np.intersect1d(checkPore,value))>=overlap[
timesLoopRun-1]:
50 checked.append(value[0])
51 addPore.append([n for n in value if n not in checkPore])
52 count+=1
53
54 addPore=[j for sub in addPore for j in sub]
55 addPore=np.unique(addPore)
56 addPore=[v for v in addPore if v not in checkPore]
57 flat_list=[item for sublist in [checkPore,addPore] for item
in sublist]
58 for elem in flat_list:
140
59 gfile1.write(’%s ’%elem)
60 gfile1.write(’\n\n’)
61 clubbedPores2.append(flat_list)
62 print(’Shape of clubbedPores2’,np.shape(clubbedPores2),’\n’)
63 gfile1.close()
64 if sum(countPore)==0:
65 run=False
66 print(’run’,run)
67
68 temp=sys.stdout
69 filenameFinal=’%s_poresClubAll_%s_probeSize.data’%(atomName,
probeRadSize)
70 os.system(’rm %s’%(filenameFinal))
71 gfile=open(filenameFinal,’a’)
72
73 for k in range(len(clubbedPores2)):
74 aPore=clubbedPores2[k]
75 for j in range(len(aPore)):
76 gfile.write(’%s ’%aPore[j])
77 gfile.write(’\n’)
78 gfile.close()
79 os.system(’mv %s_poresClubAll_%s_probeSize.data ’%(atomName,
probeRadSize)+data_directory)
80 return(filesToReturn)
81
82 def main(poresBiggerFile,probeRadSize,atomName,data_directory):
83 filenames=calc(poresBiggerFile,probeRadSize,atomName,data_directory
)
84 return(filenames)
141
1 #### surfacePorefind.py: This code finds outer surface coordinates of
overall pores found by overallPorefind.py ####
2 import numpy as np
3 import os
4 import sys
5 import itertools
6 import math
7
8 def sphere(filewithPores,filewithPoresCoords,probeRadSize,atomName,
data_directory):
9 r=probeRadSize
10 pores=[]
11 fileOverAll=open(data_directory+’/’+filewithPores)
12 for line in fileOverAll:
13 if line.strip()!=’’:
14 pores.append([float(w) for w in line.split()])
15 print(’Shape of OverAll Pores’,np.shape(pores))
16
17 coords=[]
18 fileCoords=open(data_directory+’/’+filewithPoresCoords)
19 for line in fileCoords:
20 coords.append([int(line.split()[0]),float(line.split()[1]),float(
line.split()[2]),float(line.split()[3])])
21 print(’Shape of All Pores Coords’,np.shape(coords))
22
23 temp=sys.stdout
24 filename=filewithPores[:-5]+’_OuterSurface.data’
25 os.system(’rm %s’%(filename))
26 gfile=open(filename,’a’)
27 poreNum=-1
28 coordsOuter=[[] for k in range(len(pores))]
29 #This array will store coordinates of outer sphere of overall pores
142
30
31 for pore in pores:
32 poreNum+=1
33 pore=np.unique(pore)
34 cirCen=[]
35 for num in pore:
36 num=int(num)
37 cirCen.append([coords[num-1][0],coords[num-1][1],coords[num
-1][2],coords[num-1][3]])
38
39 coordsSphere=[[] for m in range(len(cirCen))]
40 index=0
41 for c in cirCen:
42 u,v=np.mgrid[0:2*np.pi:10j,0:np.pi:5j]
43 x=np.cos(u)*np.sin(v)*r
44 y=np.sin(u)*np.sin(v)*r
45 z=np.cos(v)*r
46 x=x+c[1]
47 y=y+c[2]
48 z=z+c[3]
49 for a,b in zip(u,v):
50 for a1,b1 in zip(a,b):
51 x1=np.cos(a1)*np.sin(b1)*r+c[1]
52 y1=np.sin(a1)*np.sin(b1)*r+c[2]
53 z1=np.cos(b1)*r+c[3]
54 coordsSphere[index].append([x1,y1,z1])
55 index+=1
56
57 for j in range(len(coordsSphere)):
58 checkPore=coordsSphere[j]
59 for k in range(len(checkPore)):
60 check=0
143
61 for value in [v for v in pore if v!=pore[j]]:
62 if np.linalg.norm(np.array(coordsSphere[j][k])-cirCen[[x[0]
for x in cirCen].index(int(value))][1:])<r:
63 check+=1
64 break
65 if check==0:
66 coordsOuter[poreNum].append([coordsSphere[j][k][0],
coordsSphere[j][k][1],coordsSphere[j][k][2]])
67
68 gfile.write(’Pore %s\n’%(poreNum+1))
69 for b in coordsOuter[poreNum]:
70 gfile.write(’%s %s %s\n’%(b[0],b[1],b[2]))
71 gfile.write(’\n\n’)
72 gfile.close()
73 return(filename)
74
75 def main(fileOverAllPores,filePoreCoords,probeRadSize,atomName,
data_directory):
76 filenamesOuterSurface=[]
77 for fileOverAll in fileOverAllPores:
78 fileOuterSurface=sphere(fileOverAll,filePoreCoords,probeRadSize,
atomName,data_directory)
79 filenamesOuterSurface.append(fileOuterSurface)
80 return(filenamesOuterSurface)
144
1 #### fitSpherePore.py: This code fits sphere inside surface of pores
found by surfacePorefind.py and calculate their radius ####
2
3 import numpy as np
4 import os
5 import sys
6 import math
7
8 def sphereFit(FileOuterSurface,probeRadSize,data_directory,atomName):
9 poreIndex=0
10 outerSurfaceAll=[]
11 fileOuter=open(data_directory+’/’+FileOuterSurface)
12
13 for line in fileOuter:
14 if line.strip()!=’’:
15 if len(line.split())==2:
16 poreIndex+=1
17 if poreIndex!=1:
18 outerSurfaceAll.append(poreOuterSurface)
19 poreOuterSurface=[]
20 if len(line.split())==3:
21 poreOuterSurface.append([float(line.split()[0]),float(line.
split()[1]),float(line.split()[2])])
22 print(’Shape of outerSurfaceAll’,np.shape(outerSurfaceAll))
23
24 temp=sys.stdout
25 filename=’%s_RadiusInformation_%s_probeSize_%s.data’%(atomName,
probeRadSize,FileOuterSurface[31:32])
26 os.system(’rm %s’%(filename))
27 gfile=open(filename,’a’)
28 #Calculating sphere inside all pores
29 for j in range(len(outerSurfaceAll)):
145
30 spX=np.array([x[0] for x in outerSurfaceAll[j]])
31 spY=np.array([x[1] for x in outerSurfaceAll[j]])
32 spZ=np.array([x[2] for x in outerSurfaceAll[j]])
33 A=np.zeros((len(spX),4))
34 A[:,0]=spX*2
35 A[:,1]=spY*2
36 A[:,2]=spZ*2
37 A[:,3]=1
38 f=np.zeros((len(spX),1))
39 f[:,0]=(spX*spX)+(spY*spY)+(spZ*spZ)
40 C,residules,rank,singval=np.linalg.lstsq(A,f,rcond=None)
41 t = (C[0]*C[0])+(C[1]*C[1])+(C[2]*C[2])+C[3]
42 radius = math.sqrt(t)
43 gfile.write(’Pore %s %s %s %s %s \n’%(j+1,C[0],C[1],C[2],
radius))
44 gfile.close()
45 return(filename)
46
47 def main(poresSurfaceFiles,probeRadSize,data_directory,atomName):
48 filenameRadius=[]
49 for files in poresSurfaceFiles:
50 filesRadius=sphereFit(files,probeRadSize,data_directory,atomName)
51 filenameRadius.append(filesRadius)
52 return(filenameRadius)
146
1 #### makeHistoPore.py: This code creates histogram of the radius of
the spheres generated by fitSpherePore.py ####
2
3 import os
4 import numpy as np
5 import matplotlib.pyplot as plt
6
7 def histo(files,probeRadSize,data_directory,atomName):
8 fileHisto=open(data_directory+’/’+files)
9 radius=[]
10 for line in fileHisto:
11 if len(line.split())==6 and float(line.split()[5])>3:
12 radius.append(2*float(line.split()[5]))
13
14 histoName=’V_Unique_’+atomName+’_histo_’+str(probeRadSize)+’
_probeRadSize_’+files[36:37]+’_Time_woDensity.pdf’
15 plt.figure(figsize=(8,6))
16 plt.hist(radius,bins=np.arange(min(radius),max(radius)+1),facecolor
=’blue’,alpha=0.75,edgecolor=’k’,density=False)
17 plt.xlabel(r’Diameter ($\AA$)’,{’color’:’k’,’fontsize’:20})
18 plt.ylabel(’Count’,{’color’:’k’,’fontsize’:20})
19 plt.xticks(color=’k’,fontsize=20)
20 plt.yticks(color=’k’,fontsize=20)
21 plt.gca().yaxis.grid(True)
22
23 histoName=’V_Unique_’+atomName+’_histo_’+str(probeRadSize)+’
_probeRadSize_’+files[36:37]+’_Time_wDensity.pdf’
24 plt.figure(figsize=(8,6))
25 plt.hist(radius,bins=np.arange(min(radius),max(radius)+1),facecolor
=’blue’,alpha=0.75,edgecolor=’k’,density=True)
26 plt.xlabel(r’Diameter ($\AA$)’,{’color’:’k’,’fontsize’:20})
27 plt.ylabel(’Probability’,{’color’:’k’,’fontsize’:20})
147
28 plt.xticks(color=’k’,fontsize=20)
29 plt.yticks(color=’k’,fontsize=20)
30 plt.gca().yaxis.grid(True)
31 plt.show()
32
33 def main(poresRadiusFiles,probeRadSize,data_directory,atomName):
34 radiusOverall=[[] for k in range(len(poresRadiusFiles))]
35 for files in poresRadiusFiles:
36 histo(files,probeRadSize,data_directory,atomName)
148
1 #### makeSphereHisto.py: This code creates spheres using radius from
the histogram at their actual location in the simulation box ####
2
3 import os
4 import numpy as np
5 import matplotlib.pyplot as plt
6 from mpl_toolkits.mplot3d import Axes3D
7
8 def sphere(files,probeRadSize,data_directory,atomName):
9 fileSphere=open(data_directory+’/’+files)
10 cir=[]
11 r=[]
12 for line in fileSphere:
13 if len(line.split())==6:
14 cir.append([float(line.split()[2][1:-1]),float(line.split()
[3][1:-1]),float(line.split()[4][1:-1])])
15 r.append(float(line.split()[5]))
16
17 sphereName=’V_Unique_’+atomName+’_sphere_’+str(probeRadSize)+’
_probeRadSize_’+files[36:37]+’_thTime.pdf’
18 coords=[[] for k in range(len(cir))]
19
20 fig=plt.figure(figsize=(16,10))
21 ax=fig.add_subplot(111,projection=’3d’)
22 ax.set_xlabel(r’x ($\AA$)’,fontsize=16)
23 ax.set_ylabel(r’y ($\AA$)’,fontsize=16)
24 zlabel=ax.set_zlabel(r’z ($\AA$)’,fontsize=16)
25
26 rmax=max(r)
27 for i in range(len(cir)):
28 constant=max(r[i]/rmax,0.5)
29 u,v=np.mgrid[0:2*np.pi:constant*20j,0:np.pi:constant*10j]
149
30 x=np.cos(u)*np.sin(v)*r[i]
31 y=np.sin(u)*np.sin(v)*r[i]
32 z=np.cos(v)*r[i]
33 x=x+cir[i][0]
34 y=y+cir[i][1]
35 z=z+cir[i][2]
36 ax.plot_wireframe(x,y,z,color=’r’)
37
38 ax.view_init(23,70)
39 plt.show()
40 def main(poresRadiusFiles,probeRadSize,data_directory,atomName):
41 for files in poresRadiusFiles:
42 sphere(files,probeRadSize,data_directory,atomName)
150
BIBLIOGRAPHY
[1] Ying Diao, Kristina M. Lenn, Wen Ya Lee, Martin A. Blood-Forsythe,
Jie Xu, Yisha Mao, Yeongin Kim, Julia A. Reinspach, Steve Park, Alán
Aspuru-Guzik, Gi Xue, Paulette Clancy, Zhenan Bao, and Stefan C B
Mannsfeld. Understanding polymorphism in organic semiconductor thin
films through nanoconfinement. Journal of the American Chemical Society,
136(49):17046–17057, 2014.
[2] Hyunjoong Chung and Ying Diao. Polymorphism as an emerging de-
sign strategy for high performance organic electronics. J. Mater. Chem. C,
4(18):3915–3933, 2016.
[3] Hyunjoong Chung, Shanwen Chen, Nikita Sengar, Daniel W Davies, Guil-
laume Garbay, Yves H Geerts, Paulette Clancy, and Ying Diao. Single
Atom Substitution Alters the Polymorphic Transition Mechanism in Or-
ganic Electronic Crystals. Chemistry of Materials, 31(21):9115–9126, 11 2019.
[4] Donald R Jones, Matthias Schonlau, and William J Welch. Efficient Global
Optimization of Expensive Black-Box Functions. Journal of Global Opti-
mization, 13(4):455–492, 1998.
[5] Yaset Miguel Acevedo, Nikita Sengar, Meng Min, and Paulette Clancy.
A Transferable Molecular Model of Woven Covalent Organic Framework
Materials. ACS Applied Materials & Interfaces, 9 2020.
[6] J M Haile. Molecular dynamics simulation : elementary methods, 1992.
[7] John E Anthony. Organic electronics: Addressing challenges. Nature Ma-
terials, 13(8):773–775, 8 2014.
[8] V Subramanian, J M J Frechet, P C Chang, D C Huang, J B Lee, S E Molesa,
A R Murphy, D R Redinger, and S K Volkman. Progress Toward Devel-
opment of All-Printed RFID Tags: Materials, Processes, and Devices. Pro-
ceedings of the IEEE, 93(7):1330–1338, 2005.
[9] Embracing the organics world. Nature Materials, 12(7):591, 2013.
[10] Gerwin H Gelinck, H Edzer A Huitema, Erik van Veenendaal, Eugenio
Cantatore, Laurens Schrijnemakers, Jan B P H van der Putten, Tom C T
Geuns, Monique Beenhakkers, Jacobus B Giesbers, Bart-Hendrik Huis-
man, Eduard J Meijer, Estrella Mena Benito, Fred J Touwslager, Albert W
151
Marsman, Bas J E van Rens, and Dago M de Leeuw. Flexible active-matrix
displays and shift registers based on solution-processed organic transis-
tors. Nature Materials, 3(2):106–110, 2004.
[11] John A Rogers and Zhenan Bao. Printed plastic electronics and paperlike
displays. Journal of Polymer Science Part A: Polymer Chemistry, 40(20):3327–
3334, 2002.
[12] George Malliaras and Richard Friend. An Organic Electronics Primer.
Physics Today, 58(5):53–58, 2005.
[13] Scott Himmelberger and Alberto Salleo. Engineering semiconduct-
ing polymers for efficient charge transport. MRS Communications,
5(3):383–395, 2015.
[14] M Pope and C Swenberg. Electronic Processes in Organic Crystals and
Polymers. 1999.
[15] Chengliang Wang, Huanli Dong, Wenping Hu, Yunqi Liu, and Daoben
Zhu. Semiconducting π-conjugated systems in field-effect transistors: a
material odyssey of organic electronics. Chemical reviews, 112(4):2208–
2267, 4 2012.
[16] Zhenan Bao, Yi Feng, Ananth Dodabalapur, V R Raju, and Andrew J
Lovinger. High-Performance Plastic Transistors Fabricated by Printing
Techniques. Chemistry of Materials, 9(6):1299–1301, 6 1997.
[17] Hagen Klauk. Organic Electronics: Materials, Manufacturing and Applica-
tions. 2006.
[18] Henning Sirringhaus. 25th Anniversary Article: Organic Field-Effect
Transistors: The Path Beyond Amorphous Silicon. Advanced Materials,
26(9):1319–1335, 2014.
[19] Terence L Threlfall. Analysis of organic polymorphs. A review. Analyst,
120(10):2435–2460, 1995.
[20] Walter C McCrone. Polymorphism. Physics and chemistry of the organic
solid state, 2:725–767, 1965.
[21] Masahiko Ando, Tom B Kehoe, Makoto Yoneya, Hiroyuki Ishii, Masahiro
Kawasaki, Claudia M Duffy, Takashi Minakata, Richard T Phillips, and
152
Henning Sirringhaus. Evidence for charge-trapping inducing polymor-
phic structural-phase transition in pentacene. Advanced materials (Deerfield
Beach, Fla.), 27(1):122—129, 1 2015.
[22] Stefano Bergantin, Massimo Moret, Gernot Buth, and Francesca P A Fab-
biani. Pressure-Induced Conformational Change in Organic Semiconduc-
tors: Triggering a Reversible Phase Transition in Rubrene. The Journal of
Physical Chemistry C, 118(25):13476–13483, 6 2014.
[23] Sheng Bi, Zhengran He, Jihua Chen, and Dawen Li. Solution-grown
small-molecule organic semiconductor with enhanced crystal alignment
and areal coverage for organic thin film transistors. AIP Advances,
5(7):77170, 2015.
[24] Linus Pithan, Caterina Cocchi, Hannes Zschiesche, Christopher Weber,
Anton Zykov, Sebastian Bommel, Steven J Leake, Peter Schäfer, Claudia
Draxl, and Stefan Kowarik. Light Controls Polymorphism in Thin Films
of Sexithiophene. Crystal Growth & Design, 15(3):1319–1324, 3 2015.
[25] Yongbo Yuan, Gaurav Giri, Alexander L Ayzner, Arjan P Zoombelt, Stefan
C B Mannsfeld, Jihua Chen, Dennis Nordlund, Michael F Toney, Jinsong
Huang, and Zhenan Bao. Ultra-high mobility transparent organic thin
film transistors grown by an off-centre spin-coating method. Nature com-
munications, 5:3005, 2014.
[26] Anna M Hiszpanski, Robin M Baur, Bumjung Kim, Noah J Tremblay,
Colin Nuckolls, Arthur R Woll, and Yueh-Lin Loo. Tuning Polymorphism
and Orientation in Organic Semiconductor Thin Films via Post-deposition
Processing. Journal of the American Chemical Society, 136(44):15749–15756,
11 2014.
[27] Oana D Jurchescu, Devin A Mourey, Sankar Subramanian, Sean R Parkin,
Brandon M Vogel, John E Anthony, Thomas N Jackson, and David J Gund-
lach. Effects of polymorphism on charge transport in organic semiconduc-
tors. Phys. Rev. B, 80(8):85201, 8 2009.
[28] Joel Bernstein. Crystal growth, polymorphism and structure-property
relationships in organic crystals. Journal of Physics D: Applied Physics,
26(8B):B66, 1993.
[29] Bernhard Wedl, Roland Resel, Günther Leising, Birgit Kunert, Ingo Salz-
mann, Martin Oehzelt, Norbert Koch, Antje Vollmer, Steffen Duhm,
153
Oliver Werzer, Gabin Gbabode, Michele Sferrazza, and Yves Geerts. Crys-
tallisation kinetics in thin films of dihexyl-terthiophene: the appearance
of polymorphic phases. RSC Adv., 2(10):4404–4414, 2012.
[30] Basab Chattopadhyay, Christian Ruzié, Roland Resel, and Yves
Henri Geerts. Substrate-induced phases: transition from a liquid-
crystalline to a plastic crystalline phase via nucleation initiated by the
substrate. Liquid Crystals, 41(3):302–309, 3 2014.
[31] Benedikt Schrode, Andrew O F Jones, Roland Resel, Natalia Bedoya,
Robert Schennach, Yves Henri Geerts, Christian Ruzié, Michele Sfer-
razza, Aldo Brillante, Tommaso Salzillo, and Elisabetta Venuti. Substrate-
Induced Phase of a Benzothiophene Derivative Detected by Mid-Infrared
and Lattice Phonon Raman Spectroscopy. Chemphyschem : a European jour-
nal of chemical physics and physical chemistry, 19(8):993–1000, 4 2018.
[32] Hiromi Minemawari, Toshikazu Yamada, Hiroyuki Matsui, Jun Ya Tsut-
sumi, Simon Haas, Ryosuke Chiba, Reiji Kumai, and Tatsuo Hasegawa.
Inkjet printing of single-crystal films. Nature, 475(7356):364–367, 2011.
[33] Hideaki Ebata, Takafumi Izawa, Eigo Miyazaki, Kazuo Takimiya,
Masaaki Ikeda, Hirokazu Kuwabara, and Tatsuto Yui. Highly Solu-
ble [1]Benzothieno[3,2-b]benzothiophene (BTBT) Derivatives for High-
Performance, Solution-Processed Organic Field-Effect Transistors. Journal
of the American Chemical Society, 129(51):15732–15733, 12 2007.
[34] Yanfeng Yue, Yunchao Li, Zhonghe Bi, Gabriel M Veith, Craig A Bridges,
Bingkun Guo, Jihua Chen, David R Mullins, Sumedh P Surwade, Shan-
non M Mahurin, Hongjun Liu, M Parans Paranthaman, and Sheng Dai.
A POM–organic framework anode for Li-ion battery. J. Mater. Chem. A,
3(45):22989–22995, 2015.
[35] Yan Zhang, Shihui Yang, Xiaoya Chang, Huinan Guo, Yunwei Li, Mengy-
ing Wang, Weinqin Li, Lifang Jiao, and Yijing Wang. MOF based on a
longer linear ligand: electrochemical performance{,} reaction kinetics{,}
and use as a novel anode material for sodium-ion batteries. Chem. Com-
mun., 54(83):11793–11796, 2018.
[36] Hiroyasu Furukawa, Kyle E Cordova, Michael O’Keeffe, and Omar M
Yaghi. The Chemistry and Applications of Metal-Organic Frameworks.
Science, 341(6149), 2013.
[37] Weigang Lu, Zhangwen Wei, Zhi-Yuan Gu, Tian-Fu Liu, Jinhee Park, Jihye
154
Park, Jian Tian, Muwei Zhang, Qiang Zhang, Thomas Gentle III, Math-
ieu Bosch, and Hong-Cai Zhou. Tuning the structure and function of
metal–organic frameworks via linker design. Chem. Soc. Rev., 43(16):5561–
5593, 2014.
[38] Beatriz Seoane, Sonia Castellanos, Alla Dikhtiarenko, Freek Kapteijn, and
Jorge Gascon. Multi-scale crystal engineering of metal organic frame-
works. Coordination Chemistry Reviews, 307:147–187, 2016.
[39] Volodymyr Bon, Negar Kavoosi, Irena Senkovska, Philipp Müller, Jana
Schaber, Dirk Wallacher, Daniel M Többens, Uwe Mueller, and Stefan
Kaskel. Tuning the flexibility in MOFs by SBU functionalization. Dalton
Trans., 45(10):4407–4415, 2016.
[40] Arijit Halder and Debajyoti Ghoshal. Structure and properties of dy-
namic metal–organic frameworks: a brief accounts of crystalline-to-
crystalline and crystalline-to-amorphous transformations. CrystEng-
Comm, 20(10):1322–1345, 2018.
[41] Tse Nga Ng, David E. Schwartz, Leah L. Lavery, Gregory L. Whiting,
Beverly Russo, Brent Krusor, Janos Veres, Per Broms, Lars Herlogsson,
Naveed Alam, Olle Hagel, Jakob Nilsson, and Christer Karlsson. Scal-
able printed electronics: An organic decoder addressing ferroelectric non-
volatile memory. Scientific Reports, 2:1–7, 2012.
[42] Tsuyoshi Sekitani, Koichiro Zaitsu, Yoshiaki Noguchi, Kiyoshiro Ishibe,
Makoto Takamiya, Takayasu Sakurai, and Takao Someya. Printed non-
volatile memory for a sheet-type communication system. IEEE Transac-
tions on Electron Devices, 56(5):1027–1035, 2009.
[43] Hsin-Rong Tseng and Chun-Liang Lin. Organic photovoltaic cell.
183(1986):1–19, 2011.
[44] Guillaume Schweicher, Yoann Olivier, Vincent Lemaur, and Yves Henri.
What Currently Limits Charge Carrier Mobility in Crystals of Molecular
Semiconductors ? pages 595–620, 2014.
[45] Jean Pierre Brog, Claire Lise Chanez, Aurelien Crochet, and Katharina M.
Fromm. Polymorphism, what it is and how to identify it: A systematic
review. RSC Advances, 3(38):16905–16931, 2013.
[46] Gregg T. Beckham, Baron Peters, Cı́ndy Starbuck, Narayan Varı́ankaval,
and Bernhardt L. Trout. Surface-mediated nucleation in the solid-state
155
polymorph transformation of terephthalic acid. Journal of the American
Chemical Society, 129(15):4714–4723, 2007.
[47] Anthony M Campeta, Brian P Chekal, Yuriy A Abramov, Paul A Meenan,
Mark J Henson, Bing Shi, Robert A Singer, and Keith R Horspool. De-
velopment of a Targeted Polymorph Screening Approach for a Complex
Polymorphic and Highly Solvating API. Journal of Pharmaceutical Sciences,
99(9):3874–3886, 2010.
[48] Ranko M Vrcelj, John N Sherwood, Alan R Kennedy, Hugh G Gallagher,
and Thomas Gelbrich. Polymorphism in 2-4-6 Trinitrotoluene 2003. pages
2–7, 2003.
[49] Guillaume Schweicher, Vincent Lemaur, Claude Niebel, Christian Ruzié,
Ying Diao, Osamu Goto, Wen Ya Lee, Yeongin Kim, Jean Baptiste Ar-
lin, Jolanta Karpinska, Alan R. Kennedy, Sean R. Parkin, Yoann Olivier,
Stefan C.B. Mannsfeld, Jérôme Cornil, Yves H. Geerts, and Zhenan
Bao. Bulky end-capped [1]Benzothieno[3,2-b]benzothiophenes: Reach-
ing high-mobility organic semiconductors by fine tuning of the crystalline
solid-state Order. Advanced Materials, 27(19):3066–3072, 2015.
[50] Hyunjoong Chung, Dmytro Dudenko, Fengjiao Zhang, Gabriele D’Avino,
Christian Ruzié, Audrey Richard, Guillaume Schweicher, Jérôme Cornil,
David Beljonne, Yves Geerts, and Ying Diao. Rotator side chains trigger
cooperative transition for shape and function memory effect in organic
semiconductors. Nature Communications, 9(1):1–12, 2018.
[51] Ze Fan Yao, Jie Yu Wang, and Jian Pei. Control of π-π Stacking via Crys-
tal Engineering in Organic Conjugated Small Molecule Crystals. Crystal
Growth and Design, 18(1):7–15, 2018.
[52] Chris J. Pickard and R. J. Needs. Ab initio random structure searching.
Journal of Physics Condensed Matter, 23(5), 2011.
[53] Chris J. Pickard and R. J. Needs. High-pressure phases of silane. Physical
Review Letters, 97(4):1–4, 2006.
[54] Matthew Habgood, Isaac J. Sugden, Andrei V. Kazantsev, Claire S. Ad-
jiman, and Constantinos C. Pantelides. Efficient handling of molecular
flexibility in ab initio generation of crystal structures. Journal of Chemical
Theory and Computation, 11(4):1957–1969, 2015.
156
[55] Isaac Sugden, Claire S. Adjiman, and Constantinos C. Pantelides. Accu-
rate and efficient representation of intramolecular energy in ab initio gen-
eration of crystal structures. I. Adaptive local approximate models. Acta
Crystallographica Section B: Structural Science, Crystal Engineering and Mate-
rials, 72(6):864–874, 2016.
[56] Kirkpatrick S, Gelatt C, and Vecchi M. Optimization by Simulated An-
nealing. Science, 220(4598):671–680, 1983.
[57] John Kendrick, Frank J J Leusen, Marcus A. Neumann, and Jacco Van
De Streek. Progress in crystal structure prediction. Chemistry - A European
Journal, 17(38):10736–10744, 2011.
[58] David J. Wales and Jonathan P.K. Doye. Global optimization by basin-
hopping and the lowest energy structures of Lennard-Jones clusters con-
taining up to 110 atoms. Journal of Physical Chemistry A, 101(28):5111–5116,
1997.
[59] Maximilian Amsler and Stefan Goedecker. Crystal structure prediction
using the minima hopping method. Journal of Chemical Physics, 133(22),
2010.
[60] Stefan Goedecker. Minima hopping: An efficient search method for the
global minimum of the potential energy surface of complex molecular
systems. Journal of Chemical Physics, 120(21):9911–9917, 2004.
[61] Artem R. Oganov and Colin W. Glass. Crystal structure prediction using
ab initio evolutionary techniques: Principles and applications. Journal of
Chemical Physics, 124(24), 2006.
[62] Artem R. Oganov, Andriy O. Lyakhov, and Mario Valle. How evolution-
ary crystal structure prediction works-and why. Accounts of Chemical Re-
search, 44(3):227–237, 2011.
[63] T Lookman, P V Balachandran, D Xue, G Pilania, T Shearman, J Theiler,
J E Gubernatis, J Hogden, K Barros, E Bennaim, and F J Alexander. Infor-
mation Science for Materials Discovery and Design, volume 225. 2016.
[64] Prasanna V. Balachandran, Dezhen Xue, James Theiler, John Hogden, and
Turab Lookman. Adaptive Strategies for Materials Design using Uncer-
tainties. Scientific Reports, 6(1):19660, 2016.
157
[65] Shenghong Ju, Takuma Shiga, Lei Feng, Zhufeng Hou, Koji Tsuda, and
Junichiro Shiomi. Designing nanostructures for phonon transport via
Bayesian optimization. Physical Review X, 7(2):1–10, 2017.
[66] Rebecca K Lindsey, Laurence E Fried, and Nir Goldman. ChIMES: A Force
Matched Potential with Explicit Three-Body Interactions for Molten Car-
bon. Journal of Chemical Theory and Computation, 13(12):6222–6229, 12 2017.
[67] Jonathan Vandermause, Steven B Torrisi, Simon Batzner, Yu Xie, Lixin
Sun, Alexie M Kolpak, and Boris Kozinsky. On-the-fly active learning
of interpretable Bayesian force fields for atomistic rare events. npj Compu-
tational Materials, 6(1):20, 2020.
[68] Anton O. Oliynyk, Lawrence A. Adutwum, Brent W. Rudyk, Harshil
Pisavadia, Sogol Lotfi, Viktor Hlukhyy, James J Harynuk, Arthur Mar,
and Jakoah Brgoch. Disentangling Structural Confusion through Machine
Learning: Structure Prediction and Polymorphism of Equiatomic Ternary
Phases ABC. Journal of the American Chemical Society, page jacs.7b08460,
2017.
[69] T Mueller, A Kusne, and R Ramprasad. Science : Recent Progress. Reviews
in Computational Chemistry, 29(i), 2016.
[70] Tomoki Yamashita, Nobuya Sato, Hiori Kino, Takashi Miyake, Koji Tsuda,
and Tamio Oguchi. PHYSICAL REVIEW MATERIALS 2 , 013803 (
2018 ) Crystal structure prediction accelerated by Bayesian optimization.
013803:1–6, 2018.
[71] William L Jorgensen, David S Maxwell, and Julian Tirado-Rives. De-
velopment and Testing of the OLPS All-Atom Force Field on Conforma-
tional Energetics and Properties of Organic Liquids. J. Am. Chem. Soc.,
118(15):11225–11236, 1996.
[72] Stephen L Mayo, Barry D Olafson, and William A Goddard. DREIDING: a
generic force field for molecular simulations. The Journal of Physical Chem-
istry, 94(26):8897–8909, 12 1990.
[73] A D MacKerell, D Bashford, M Bellott, R L Dunbrack, J D Evanseck, M J
Field, S Fischer, J Gao, H Guo, S Ha, D Joseph-McCarthy, L Kuchnir,
K Kuczera, F T K Lau, C Mattos, S Michnick, T Ngo, D T Nguyen, B Prod-
hom, W E Reiher, B Roux, M Schlenkrich, J C Smith, R Stote, J Straub,
M Watanabe, J Wiórkiewicz-Kuczera, D Yin, and M Karplus. All-Atom
158
Empirical Potential for Molecular Modeling and Dynamics Studies of Pro-
teins. The Journal of Physical Chemistry B, 102(18):3586–3616, 4 1998.
[74] Donald R Jones. A Taxonomy of Global Optimization Methods Based on
Response Surfaces. Journal of Global Optimization, 21(4):345–383, 2001.
[75] Simon Streltsov and Pirooz Vakili. A Non-myopic Utility Function for
Statistical Global Optimization Algorithms. Journal of Global Optimization,
14(3):283–298, 1999.
[76] Norman L Allinger, Young H Yuh, and Jenn Huei Lii. Molecular me-
chanics. The MM3 force field for hydrocarbons. 1. Journal of the American
Chemical Society, 111(23):8551–8566, 11 1989.
[77] TINKER Molecular Modeling, 2015.
[78] S Plimpton. Fast Parallel Algorithms for Short – Range Molecular Dynam-
ics, 1995.
[79] Peter I. Frazier and Jialei Wang. Bayesian optimization for materials de-
sign. pages 1–21, 2015.
[80] Jonas Mockus. Application of Bayesian approach to numerical methods of
global and stochastic optimization. Journal of Global Optimization, 4(4):347–
365, 6 1994.
[81] Frank Hutter, Holger H Hoos, and Kevin Leyton-brown. Sequential
Model-Based Optimization for General Algorithm Configuration (ex-
tended version). Learning and Intelligent Optimization (LION), 2011.
[82] Stuart P Lloyd. Least squares quantization in pcm. IEEE Transactions on
Information Theory, 28:129–137, 1982.
[83] J Macqueen. Some methods for classification and analysis of multivariate
observations. In In 5-th Berkeley Symposium on Mathematical Statistics and
Probability, pages 281–297, 1967.
[84] Y Linde, A Buzo, and R Gray. An Algorithm for Vector Quantizer Design.
IEEE Trans. Commun., 28:84–95, 1980.
[85] Peter J Rousseeuw. Silhouettes: A graphical aid to the interpretation and
159
validation of cluster analysis. Journal of Computational and Applied Mathe-
matics, 20:53–65, 1987.
[86] L J P van der Maaten and G E Hinton. Visualizing High-Dimensional Data
Using t-SNE. 2008.
[87] S Kullback and R A Leibler. On information and sufficiency. Ann. Math.
Stat., 22:79–86, 1951.
[88] Ryan N. Gutenkunst, Joshua J. Waterfall, Fergal P. Casey, Kevin S. Brown,
Christopher R. Myers, and James P. Sethna. Universally sloppy param-
eter sensitivities in systems biology models. PLoS Computational Biology,
3(10):1871–1878, 2007.
[89] Frank H. Herbstein. How precise are measurements of unit-cell dimen-
sions from single crystals? Acta Crystallographica Section B Structural Sci-
ence, 56(4):547–557, 2000.
[90] Denis Gentili, Massimo Gazzano, Manuela Melucci, Derek Jones, and
Massimiliano Cavallini. Polymorphism as an additional functionality of
materials for technological applications at surfaces and interfaces. Chem.
Soc. Rev., 48(9):2502–2517, 2019.
[91] Jihua Chen, Ming Shao, Kai Xiao, Adam J Rondinone, Yueh-Lin Loo, Paul
R C Kent, Bobby G Sumpter, Dawen Li, Jong K Keum, Peter J Diemer,
John E Anthony, Oana D Jurchescu, and Jingsong Huang. Solvent-type-
dependent polymorphism and charge transport in a long fused-ring or-
ganic semiconductor. Nanoscale, 6(1):449–456, 2014.
[92] Yuri Mnyukh. Mechanism and Kinetics of Phase Transitions and Other
Reactions in Solids. American Journal of Condensed Matter Physics, 3:89–
103, 1 2013.
[93] Miguel A Garcia-Garibay. Molecular crystals on the move: from single-
crystal-to-single-crystal photoreactions to molecular machinery. Ange-
wandte Chemie (International ed. in English), 46(47):8945–8947, 2007.
[94] Qi Jiang, Alexander G Shtukenberg, Michael D Ward, and Chunhua Hu.
Non-Topotactic Phase Transformations in Single Crystals of β-Glycine.
Crystal Growth & Design, 15(6):2568–2573, 6 2015.
[95] Durga Prasad Karothu, James Weston, Israel Tilahun Desta, and Panče
160
Naumov. Shape-Memory and Self-Healing Effects in Mechanosalient
Molecular Crystals. Journal of the American Chemical Society, 138(40):13298–
13306, 10 2016.
[96] Manas K Panda, Tomče Runčevski, Subash Chandra Sahoo, Alexei A Be-
lik, Naba K Nath, Robert E Dinnebier, and Panče Naumov. Colossal pos-
itive and negative thermal expansion and thermosalient effect in a penta-
morphic organometallic martensite. Nature communications, 5:4811, 9 2014.
[97] Satoshi Takamizawa and Yasuhiro Miyamoto. Superelastic Organic Crys-
tals. Angewandte Chemie International Edition, 53(27):6970–6973, 2014.
[98] P Hohenberg and W Kohn. Inhomogeneous Electron Gas. Phys. Rev.,
136(3B):B864–B871, 11 1964.
[99] Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra,
Roberto Car, Carlo Cavazzoni, Davide Ceresoli, Guido L Chiarotti, Mat-
teo Cococcioni, Ismaila Dabo, Andrea Dal Corso, Stefano de Gironcoli,
Stefano Fabris, Guido Fratesi, Ralph Gebauer, Uwe Gerstmann, Christos
Gougoussis, Anton Kokalj, Michele Lazzeri, Layla Martin-Samos, Nicola
Marzari, Francesco Mauri, Riccardo Mazzarello, Stefano Paolini, Alfredo
Pasquarello, Lorenzo Paulatto, Carlo Sbraccia, Sandro Scandolo, Gabriele
Sclauzero, Ari P Seitsonen, Alexander Smogunov, Paolo Umari, and Re-
nata M Wentzcovitch. QUANTUM ESPRESSO: a modular and open-
source software project for quantum simulations of materials. Journal of
Physics: Condensed Matter, 21(39):395502, 9 2009.
[100] Hailian Li, Mohamed Eddaoudi, M O’Keeffe, and O M Yaghi. Design
and synthesis of an exceptionally stable and highly porous metal-organic
framework. Nature, 402(6759):276–279, 1999.
[101] Hideki Hayashi, Adrien P Côté, Hiroyasu Furukawa, Michael O’Keeffe,
and Omar M Yaghi. Zeolite A imidazolate frameworks. Nature Materials,
6(7):501–506, 2007.
[102] Bert M Weckhuysen and Jihong Yu. Recent advances in zeolite chemistry
and catalysis., 10 2015.
[103] Tian Tian, Zhixin Zeng, Diana Vulpe, Mirian E. Casco, Giorgio Divitini,
Paul A. Midgley, Joaquin Silvestre-Albero, Jin Chong Tan, Peyman Z.
Moghadam, and David Fairen-Jimenez. A sol-gel monolithic metal-
organic framework with enhanced methane uptake. Nature Materials,
17(2):174–179, 2018.
161
[104] Diego A Gómez-Gualdrón, Yamil J Colón, Xu Zhang, Timothy C
Wang, Yu-Sheng Chen, Joseph T Hupp, Taner Yildirim, Omar K Farha,
Jian Zhang, and Randall Q Snurr. Evaluating topologically diverse
metal–organic frameworks for cryo-adsorbed hydrogen storage. Energy
Environ. Sci., 9(10):3279–3289, 2016.
[105] Tania Rodenas, Ignacio Luz, Gonzalo Prieto, Beatriz Seoane, Hozanna
Miro, Avelino Corma, Freek Kapteijn, Francesc X Llabrés I Xamena, and
Jorge Gascon. Metal-organic framework nanosheets in polymer compos-
ite materials for gas separation., 1 2015.
[106] Thomas M McDonald, Jarad A Mason, Xueqian Kong, Eric D Bloch, David
Gygi, Alessandro Dani, Valentina Crocellà, Filippo Giordanino, Samuel O
Odoh, Walter S Drisdell, Bess Vlaisavljevich, Allison L Dzubak, Roberta
Poloni, Sondre K Schnell, Nora Planas, Kyuho Lee, Tod Pascal, Liwen F
Wan, David Prendergast, Jeffrey B Neaton, Berend Smit, Jeffrey B Kor-
tright, Laura Gagliardi, Silvia Bordiga, Jeffrey A Reimer, and Jeffrey R
Long. Cooperative insertion of CO2 in diamine-appended metal-organic
frameworks. Nature, 519(7543):303–308, 3 2015.
[107] Peyman Z Moghadam, Joshua F Ivy, Ravi K Arvapally, Antonio M
Dos Santos, John C Pearson, Li Zhang, Emmanouil Tylianakis, Pritha
Ghosh, Iain W H Oswald, Ushasree Kaipa, Xiaoping Wang, Angela K
Wilson, Randall Q Snurr, and Mohammad A Omary. Adsorption and
molecular siting of CO(2), water, and other gases in the superhydropho-
bic, flexible pores of FMOF-1 from experiment and simulation. Chemical
science, 8(5):3989–4000, 5 2017.
[108] Jian-Rong Li, Ryan J Kuppler, and Hong-Cai Zhou. Selective gas adsorp-
tion and separation in metal-organic frameworks. Chemical Society reviews,
38(5):1477–1504, 5 2009.
[109] Jiewei Liu, Lianfen Chen, Hao Cui, Jianyong Zhang, Li Zhang, and
Cheng-Yong Su. Applications of metal-organic frameworks in hetero-
geneous supramolecular catalysis. Chemical Society reviews, 43(16):6011–
6061, 8 2014.
[110] Joseph E Mondloch, Michael J Katz, William C Isley III, Pritha Ghosh,
Peilin Liao, Wojciech Bury, George W Wagner, Morgan G Hall, Jared B
DeCoste, Gregory W Peterson, Randall Q Snurr, Christopher J Cramer,
Joseph T Hupp, and Omar K Farha. Destruction of chemical warfare
agents using metal–organic frameworks. Nature Materials, 14:512, 3 2015.
162
[111] Sophie E Miller, Michelle H Teplensky, Peyman Z Moghadam, and
David Fairen-Jimenez. Metal-organic frameworks as biosensors for
luminescence-based detection and imaging. Interface focus, 6(4):20160027,
8 2016.
[112] Lauren E Kreno, Kirsty Leong, Omar K Farha, Mark Allendorf, Richard P
Van Duyne, and Joseph T Hupp. Metal-organic framework materials as
chemical sensors. Chemical reviews, 112(2):1105–1125, 2 2012.
[113] Jared B DeCoste and Gregory W Peterson. Metal-organic frameworks for
air purification of toxic chemicals. Chemical reviews, 114(11):5695–5727, 6
2014.
[114] N. Scott Bobbitt, Matthew L. Mendonca, Ashlee J. Howarth, Timur Is-
lamoglu, Joseph T. Hupp, Omar K. Farha, and Randall Q. Snurr. Metal-
organic frameworks for the removal of toxic industrial chemicals and
chemical warfare agents. Chemical Society Reviews, 46(11):3357–3385, 2017.
[115] Bethany M Connolly, David G Madden, Andrew E H Wheatley,
and David Fairen-Jimenez. Shaping the Future of Fuel: Monolithic
Metal–Organic Frameworks for High-Density Gas Storage. Journal of the
American Chemical Society, 142(19):8541–8549, 5 2020.
[116] ARPA-E. DE-FOA-0000672: Methane opportunities for vehicular energy.
[117] Qi-Long Zhu and Qiang Xu. Metal–organic framework composites. Chem.
Soc. Rev., 43(16):5468–5512, 2014.
[118] Hai-Long Jiang, Bo Liu, Ya-Qian Lan, Kentaro Kuratani, Tomoki Akita,
Hiroshi Shioyama, Fengqi Zong, and Qiang Xu. From metal-organic
framework to nanoporous carbon: toward a very high surface area and
hydrogen uptake. Journal of the American Chemical Society, 133(31):11854–
11857, 8 2011.
[119] Chunjing Lu, Teng Ben, Shixian Xu, and Shilun Qiu. Electrochem-
ical Synthesis of a Microporous Conductive Polymer Based on a
Metal–Organic Framework Thin Film. Angewandte Chemie International
Edition, 53(25):6454–6458, 2014.
[120] Anthony Shoji Hall, Atsushi Kondo, Kazuyuki Maeda, and Thomas E
Mallouk. Microporous Brookite-Phase Titania Made by Replication of
a Metal–Organic Framework. Journal of the American Chemical Society,
135(44):16276–16279, 11 2013.
163
[121] Jack D Evans, Christopher J Sumby, and Christian J Doonan. Post-
synthetic metalation of metal–organic frameworks. Chem. Soc. Rev.,
43(16):5933–5951, 2014.
[122] Yuanbiao Huang, Zujin Lin, and Rong Cao. Palladium Nanoparticles
Encapsulated in a Metal–Organic Framework as Efficient Heterogeneous
Catalysts for Direct C2 Arylation of Indoles. Chemistry – A European Jour-
nal, 17(45):12706–12712, 2011.
[123] Xueshen Wang, Qunqing Li, Jing Xie, Zhong Jin, Jinyong Wang, Yan
Li, Kaili Jiang, and Shoushan Fan. Fabrication of Ultralong and Electri-
cally Uniform Single-Walled Carbon Nanotubes on Clean Substrates 2009.
pages 1–5, 2009.
[124] Adrien P Côté, Annabelle I Benin, Nathan W Ockwig, Michael
O&#039;Keeffe, Adam J Matzger, and Omar M Yaghi. Porous, Crystalline,
Covalent Organic Frameworks. Science, 310(5751):1166 LP – 1170, 11 2005.
[125] Enquan Jin, Mizue Asada, Qing Xu, Sasanka Dalapati, Matthew A
Addicoat, Michael A Brady, Hong Xu, Toshikazu Nakamura, Thomas
Heine, Qiuhong Chen, and Donglin Jiang. Two-dimensional sp2 car-
bon{\textendash}conjugated covalent organic frameworks. Science,
357(6352):673–676, 2017.
[126] Eric L Spitler and William R Dichtel. Lewis acid-catalysed formation of
two-dimensional phthalocyanine covalent organic frameworks. Nature
Chemistry, 2(8):672–677, 2010.
[127] Yanpei Song, Qi Sun, Briana Aguila, and Shengqian Ma. Opportunities
of Covalent Organic Frameworks for Advanced Applications. Advanced
Science, 6(2):1801410, 2019.
[128] Hani M El-Kaderi, Joseph R Hunt, José L Mendoza-Cortés, Adrien P
Côté, Robert E Taylor, Michael O’Keeffe, and Omar M Yaghi. Designed
synthesis of 3D covalent organic frameworks. Science (New York, N.Y.),
316(5822):268–272, 4 2007.
[129] Yuzhong Liu, Yanhang Ma, Yingbo Zhao, Xixi Sun, Felipe Gándara, Hi-
royasu Furukawa, Zheng Liu, Hanyu Zhu, Chenhui Zhu, Kazutomo Sue-
naga, Peter Oleynikov, Ahmad S. Alshammari, Xiang Zhang, Osamu
Terasaki, and Omar M. Yaghi. Weaving of organic threads into a crys-
talline covalent organic framework. Science, 351(6271):365–369, 2016.
164
[130] Yingwei Li and Ralph T Yang. Hydrogen storage in metal-organic and
covalent-organic frameworks by spillover. AIChE Journal, 54(1):269–279,
2008.
[131] Jin-Tao Yu, Zhe Chen, Junliang Sun, Zhi-Tang Huang, and Qi-Yu Zheng.
Cyclotricatechylene based porous crystalline material: Synthesis and ap-
plications in gas storage. J. Mater. Chem., 22(12):5369–5373, 2012.
[132] San-Yuan Ding, Jia Gao, Qiong Wang, Yuan Zhang, Wei-Guo Song,
Cheng-Yong Su, and Wei Wang. Construction of Covalent Organic Frame-
work for Catalysis: Pd/COF-LZU1 in Suzuki–Miyaura Coupling Reac-
tion. Journal of the American Chemical Society, 133(49):19816–19822, 12 2011.
[133] Sasanka Dalapati, Shangbin Jin, Jia Gao, Yanhong Xu, Atsushi Nagai, and
Donglin Jiang. An Azine-Linked Covalent Organic Framework. Journal of
the American Chemical Society, 135(46):17310–17313, 11 2013.
[134] Yuzhong Liu, Yanhang Ma, Jingjing Yang, Christian S Diercks, Nobumichi
Tamura, Fangying Jin, and Omar M Yaghi. Molecular Weaving of Co-
valent Organic Frameworks for Adaptive Guest Inclusion. Journal of the
American Chemical Society, 140(47):16015–16019, 2018.
[135] Brian T Koo and Paulette Clancy. Towards optimal packing and diffu-
sion of fullerene molecules in the Pc-PBBA covalent organic framework.
Molecular Simulation, 40(1-3):58–70, 1 2014.
[136] Anastasios I Skoulidas and David S Sholl. Self-Diffusion and Transport
Diffusion of Light Gases in Metal-Organic Framework Materials Assessed
Using Molecular Dynamics Simulations. The Journal of Physical Chemistry
B, 109(33):15760–15768, 8 2005.
[137] Anastasios I Skoulidas and David S Sholl. Molecular Dynamics Simu-
lations of Self-Diffusivities, Corrected Diffusivities, and Transport Diffu-
sivities of Light Gases in Four Silica Zeolites To Assess Influences of Pore
Shape and Connectivity. The Journal of Physical Chemistry A, 107(47):10132–
10141, 11 2003.
[138] Markus K Dahlgren, Patric Schyman, Julian Tirado-Rives, and William L
Jorgensen. Characterization of Biaryl Torsional Energetics and its Treat-
ment in OPLS All-Atom Force Fields. Journal of Chemical Information and
Modeling, 53(5):1191–1199, 5 2013.
165
[139] Michael J Robertson, Julian Tirado-Rives, and William L Jorgensen. Im-
proved Peptide and Protein Torsional Energetics with the OPLS-AA Force
Field. Journal of Chemical Theory and Computation, 11(7):3499–3509, 7 2015.
[140] Smit Berend Frenkel Daan. Understanding molecular simulation from algo-
rithms to applications.
[141] G Ciccotti, D Frenkel, and Eds. I.R. McDonalds. Simulation of Liquids and
Solids. North-Holland, Amsterdam, 1987.
[142] Hongwei Zeng, Yu Liu, and Honglai Liu. Adsorption and diffusion of
CO2 and CH4 in covalent organic frameworks: an MC/MD simulation
study. Molecular Simulation, 44(15):1244–1251, 10 2018.
[143] Yingzhi Zeng, Qianxiao Li, and Kewu Bai. Prediction of interstitial diffu-
sion activation energies of nitrogen, oxygen, boron and carbon in bcc, fcc,
and hcp metals using machine learning. Computational Materials Science,
144:232–247, 2018.
[144] Tian Ge Wang, Jin Jin Cao, and Xiao Fan Gou. Activation energy of oxygen
diffusion: A possible indicator of supercurrents through YBa 2 Cu 3 O 7
grain boundaries. Applied Surface Science, 480(June 2018):765–769, 2019.
[145] D D Do, L F Herrera, and H D Do. A new method to determine pore size
and its volume distribution of porous solids having known atomistic con-
figuration. Journal of colloid and interface science, 328(1):110–119, 12 2008.
[146] V M Burlakov, Y Tsukahara, G A D Briggs, and A P Sutton. Monte Carlo
Simulation of Vapour Deposition of Nonstoichiometric Amorphous Silica.
MRS Proceedings, 648:P6.41, 2000.
[147] Alexander Schoedel, Zhe Ji, and Omar M Yaghi. The role of metal–organic
frameworks in a carbon-neutral energy cycle. Nature Energy, 1(4):16034,
2016.
[148] Dalal Alezi, Youssef Belmabkhout, Mikhail Suyetin, Prashant M Bhatt,
Łukasz J Weseliński, Vera Solovyeva, Karim Adil, Ioannis Spanopou-
los, Pantelis N Trikalitis, Abdul-Hamid Emwas, and Mohamed Ed-
daoudi. MOF Crystal Chemistry Paving the Way to Gas Storage Needs:
Aluminum-Based soc-MOF for CH4, O2, and CO2 Storage. Journal of the
American Chemical Society, 137(41):13308–13318, 10 2015.
166
[149] N Scott Bobbitt, Matthew L Mendonca, Ashlee J Howarth, Timur Islam-
oglu, Joseph T Hupp, Omar K Farha, and Randall Q Snurr. Metal–organic
frameworks for the removal of toxic industrial chemicals and chemical
warfare agents. Chem. Soc. Rev., 46(11):3357–3385, 2017.
[150] Peyman Z. Moghadam, Sven M.J. Rogge, Aurelia Li, Chun-Man Chow,
Jelle Wieme, Noushin Moharrami, Marta Aragones-Anglada, Gareth
Conduit, Diego A. Gomez-Gualdron, Veronique Van Speybroeck, and
David Fairen-Jimenez. Structure-Mechanical Stability Relations of Metal-
Organic Frameworks via Machine Learning. Matter, 1(1):219–234, 2019.
[151] Russell E Morris and Paul S Wheatley. Gas storage in nanoporous ma-
terials. Angewandte Chemie (International ed. in English), 47(27):4966–4981,
2008.
[152] Avelino Corma. From Microporous to Mesoporous Molecular Sieve Ma-
terials and Their Use in Catalysis. Chemical Reviews, 97(6):2373–2420, 10
1997.
[153] Patricia Horcajada, Tamim Chalati, Christian Serre, Brigitte Gillet, Cather-
ine Sebrie, Tarek Baati, Jarrod F Eubank, Daniela Heurtaux, Pas-
cal Clayette, Christine Kreuz, Jong-San Chang, Young Kyu Hwang,
Veronique Marsaud, Phuong-Nhi Bories, Luc Cynober, Sophie Gil, Gérard
Férey, Patrick Couvreur, and Ruxandra Gref. Porous metal-organic-
framework nanoscale carriers as a potential platform for drug delivery
and imaging. Nature materials, 9(2):172–178, 2 2010.
[154] Thomas F Willems, Chris H Rycroft, Michaeel Kazi, Juan C Meza, and
Maciej Haranczyk. Algorithms and tools for high-throughput geometry-
based analysis of crystalline porous materials. Microporous and Mesoporous
Materials, 149(1):134–141, 2012.
[155] Juergen Getzschmann, Irena Senkovska, Dirk Wallacher, Michael Tovar,
David Fairen-Jimenez, Tina Düren, Jasper M. Van Baten, Rajamani Kr-
ishna, and Stefan Kaskel. Methane storage mechanism in the metal-
organic framework Cu 3(btc)2: An in situ neutron diffraction study. Mi-
croporous and Mesoporous Materials, 136(1-3):50–58, 2010.
[156] Albert P Bartók, Risi Kondor, and Gábor Csányi. On representing chemi-
cal environments. Phys. Rev. B, 87(18):184115, 5 2013.
[157] Piotr H. Karpinski. Polymorphism of active pharmaceutical ingredients.
Chemical Engineering and Technology, 29(2):233–237, 2006.
167
[158] Kaisar Raza. Polymorphism: The Phenomenon Affecting the Performance
of Drugs. SOJ Pharmacy & Pharmaceutical Sciences, 2014.
[159] Tsai-Ta C Lai, Steven Ferguson, Laura Palmer, Bernhardt L Trout, and Al-
lan S Myerson. Continuous Crystallization and Polymorph Dynamics
in the l-Glutamic Acid System. Organic Process Research & Development,
18(11):1382–1390, 11 2014.
[160] Tsai-Ta C Lai, Jan Cornevin, Steven Ferguson, Nahan Li, Bernhardt L
Trout, and Allan S Myerson. Control of Polymorphism in Continuous
Crystallization via Mixed Suspension Mixed Product Removal Systems
Cascade Design. Crystal Growth & Design, 15(7):3374–3382, 7 2015.
[161] Henry C. Herbol, Weici Hu, Peter Frazier, Paulette Clancy, and Matthias
Poloczek. Efficient search of compositional space for hybrid or-
ganic–inorganic perovskites via Bayesian optimization. npj Computational
Materials, 4(1):1–7, 2018.
[162] R R Coifman, S Lafon, A B Lee, M Maggioni, B Nadler, F Warner, and S W
Zucker. Geometric diffusions as a tool for harmonic analysis and structure
definition of data: diffusion maps. Proceedings of the National Academy of
Sciences of the United States of America, 102(21):7426–31, 5 2005.
[163] Hannes Jónsson, Greg Mills, and Karsten W. Jacobsen. Nudged elastic
band method for finding minimum energy paths of transitions. In Classi-
cal and Quantum Dynamics in Condensed Phase Simulations, pages 385–404.
168